LinuxCommandLibrary

dvc-freeze

Record current DVC project dependencies and versions

TLDR

Freeze one or more specified stages

$ dvc freeze [stage_name1 stage_name2 ...]
copy

SYNOPSIS

dvc freeze [path_or_name]

PARAMETERS

path_or_name
    The path to the .dvc file representing the stage, or the name of the stage defined in the dvc.yaml file.

DESCRIPTION

The dvc freeze command is part of Data Version Control (DVC), a tool for versioning machine learning models and data. It allows users to mark a specific DVC stage as "frozen" within the dvc.yaml file. When a stage is frozen, DVC will no longer attempt to resolve or update its dependencies (specified in the deps field of the stage definition) when the dvc update command is executed. This means that even if a dependency (like a data file or a script) changes externally, DVC will ignore those changes for the frozen stage, ensuring that subsequent reproductions of that stage use the exact versions of dependencies that were present when it was frozen. This is crucial for maintaining reproducibility, fixing pipeline behavior, and ensuring consistent results across different environments or experiments by preventing unintended updates to input data or code.

CAVEATS

Freezing a stage prevents dvc update from automatically fetching newer versions of its dependencies. To incorporate new dependency versions, you must first unfreeze the stage using dvc unfreeze, then run dvc update, and potentially re-freeze it. The change made to dvc.yaml by dvc freeze must be committed to Git to persist the frozen state within your repository's version control.

IMPACT ON <I>DVC.YAML</I> AND <I>DVC.LOCK</I>

When you run dvc freeze, it adds or modifies the frozen: true property for the specified stage within your dvc.yaml file. This flag is an instruction to DVC itself. It tells DVC to skip checking for updates for that stage's dependencies when dvc update is invoked. It's important to distinguish this from the dvc.lock file: dvc.lock stores the precise checksums of dependencies and outputs for each stage. dvc freeze itself does not directly modify dvc.lock; rather, it changes DVC's behavior towards how it processes that stage's entry in dvc.lock during dependency resolution. When dvc repro is run on a frozen stage, it will still use the checksums recorded in dvc.lock to ensure that the stage is reproduced with the exact versions of its inputs, regardless of external changes.

HISTORY

DVC was created in 2017 to bring Git-like version control to data and machine learning pipelines. The concept of stages and managing their dependencies has always been core to DVC. The dvc freeze command was introduced as a fundamental mechanism to provide fine-grained control over the reproducibility flow, allowing users to stabilize specific parts of their pipelines. Its functionality has evolved alongside DVC's file formats, adapting from individual .dvc files to the modern centralized dvc.yaml for stage definitions, reinforcing its role in building robust and stable data science projects.

SEE ALSO

dvc unfreeze(1), dvc update(1), dvc repro(1), dvc run(1), dvc.yaml(5), dvc.lock(5)

Copied to clipboard