LinuxCommandLibrary

dvc-checkout

Restore data tracked by DVC

TLDR

Checkout the latest version of all target files and directories

$ dvc checkout
copy

Checkout the latest version of a specified target
$ dvc checkout [target]
copy

Checkout a specific version of a target from a different Git commit/tag/branch
$ git checkout [commit_hash|tag|branch] [target] && dvc checkout [target]
copy

SYNOPSIS

dvc checkout [-h] [-q | -v | -V] [-f] [targets [targets ...]]

PARAMETERS

-h, --help
    Show help message and exit.

-q, --quiet
    Suppress non-error messages.

-v, --verbose
    Enable verbose status messages.

-V, --version
    Display DVC version info.

-f, --force
    Overwrite modified files in workspace.

targets
    Optional paths to data/metrics/params/plots (default: all tracked).

DESCRIPTION

dvc checkout is a key command in Data Version Control (DVC), an open-source tool for versioning data, ML models, metrics, and experiments alongside Git-tracked code. It restores the workspace to match the state defined in DVC files (.dvc, dvc.yaml, etc.) by copying files from the local cache to their target paths.

Primarily, it handles large data artifacts not stored in Git, using lightweight pointers instead. Run after git clone or git pull on DVC repos to materialize data. By default, it updates all tracked data, metrics, params, and plots. Specify targets like directories or files for selective checkout.

Unlike git checkout, it doesn't alter Git-tracked files but focuses on DVC-managed content. Safe by default (no overwrite), but -f forces it. Essential for reproducible ML workflows, ensuring data integrity across team clones.

Integrates with pipelines: after dvc repro, it ensures outputs match cache. Verbosity options aid debugging cache mismatches.

CAVEATS

Does not fetch from remotes; run dvc pull first if cache is missing/outdated. Ignores untracked files.

EXAMPLES

dvc checkout (all tracked)
dvc checkout model.pkl data/ (specific)
dvc checkout -f (force overwrite).

EXIT CODES

0: success
1: checkout failed (e.g., cache miss)
2: bad arguments.

HISTORY

Introduced in DVC v0.6 (2017) by Iterative.ai; evolved with pipeline support in v0.74+, cache v2 in v2.0 (2020). Widely used in ML reproducibility.

SEE ALSO

dvc pull(1), dvc add(1), git checkout(1)

Copied to clipboard