LinuxCommandLibrary

dvc-fetch

Download data or models tracked by DVC

TLDR

Fetch the latest changes from the default remote upstream repository (if set)

$ dvc fetch
copy

Fetch changes from a specific remote upstream repository
$ dvc fetch [[-r|--remote]] [remote_name]
copy

Fetch the latest changes for a specific target/s
$ dvc fetch [target/s]
copy

Fetch changes for all branch and tags
$ dvc fetch [[-a|--all-branches]] [[-T|--all-tags]]
copy

Fetch changes for all commits
$ dvc fetch [[-A|--all-commits]]
copy

SYNOPSIS

dvc fetch [TARGETS] [-A] [-j N] [-f] [-q] [-v]

PARAMETERS

--help (-h)
    Show help message and exit.


--quiet (-q)
    Suppress non-essential output.


--verbose (-v)
    Display more detailed output.


--all (-A)
    Fetch all data tracked by DVC.


--jobs JOBS (-j)
    Number of parallel jobs (default: CPU count).


--force (-f)
    Overwrite local cache files.


--run-cache
    Fetch run cache (experimental).


--with-stats
    Print JSON stats to stdout.


TARGETS
    Pipeline stages, .dvc files, dirs, or outputs (optional; defaults to all).


DESCRIPTION

The dvc fetch command downloads data from the configured DVC remote storage into the local DVC cache directory (.dvc/cache). It updates the cache without modifying workspace files or .dvc metadata files. This is useful for prefetching large datasets separately from checkout operations, enabling efficient CI/CD pipelines or selective data pulls.

Specify targets like pipeline stages, .dvc files, directories, or outputs to fetch specific data. Without targets, it fetches all tracked data by default with -A. It verifies file integrity using hashes and supports multiple remotes.

Unlike dvc pull, it skips workspace updates, making it faster for cache-only syncs. Ideal for shared environments where data is versioned but not always checked out.

CAVEATS

Requires initialized DVC repo and configured remote (dvc remote add). Does not update workspace or .dvc files—use dvc checkout or dvc pull for that. Fails if local cache is ahead or inconsistent without -f. Large datasets may require significant disk space.

TARGETS EXAMPLES

dvc fetch model.dvc fetches specific file.
dvc fetch -A fetches everything.
dvc fetch train fetches pipeline stage.

CACHE LOCATION

Data stored in .dvc/cache using MD5 hashes for deduplication and verification.

HISTORY

Introduced in DVC v0.21 (2018) by Iterative.ai to separate cache fetching from checkout. Evolved with multi-remote support (v1.0, 2019) and parallel jobs (v1.10, 2020). Now integral for reproducible ML pipelines.

SEE ALSO

dvc pull(1), dvc push(1), dvc checkout(1), dvc remote(1)

Copied to clipboard