LinuxCommandLibrary

dvc-add

Track files or directories with DVC

TLDR

Add a single target file to the index

$ dvc add [path/to/file]
copy

Add a target directory to the index
$ dvc add [path/to/directory]
copy

Recursively add all the files in a given target directory
$ dvc add --recursive [path/to/directory]
copy

Add a target file with a custom .dvc filename
$ dvc add --file [custom_name.dvc] [path/to/file]
copy

SYNOPSIS

dvc add [options] targets...

PARAMETERS

-f, --force
    Overwrite existing .dvc files if present

-R, --recursive
    Recursively add all files in directories

--dry
    Print actions without modifying files

--external
    Add external data (not in workspace)

--glob
    Treat targets as glob patterns

--name NAME
    Specify output .dvc file name

-q, --quiet
    Suppress non-error messages

-v, --verbose
    Enable verbose output

-h, --help
    Show help message and exit

DESCRIPTION

dvc add adds data files or directories to a DVC (Data Version Control) project for versioning alongside code in Git. It copies the specified targets to DVC's internal cache (typically .dvc/cache), computes checksums, and generates lightweight .dvc metadata files. These .dvc files contain info like MD5 hash, size, and path, and are automatically staged for Git commit.

The command ensures reproducible data pipelines by decoupling large datasets from Git repos. After adding, use dvc push to upload cache to remote storage (e.g., S3, GCS). To share, others run dvc pull to restore data from cache.

Example: dvc add data/model.pkl creates data/model.pkl.dvc and moves the file to cache. For dirs: dvc add dataset/. Symlinks are resolved to actual files. Ideal for ML models, datasets, and outputs exceeding Git limits.

CAVEATS

Does not track symlinks (resolves them); data moved to cache loses original workspace copy unless --external; requires initialized DVC repo.

OUTPUT FILES

Generates <target>.dvc (git-tracked) and stores data in .dvc/cache (ignored by git).

POST-ADD STEPS

Run git add <target>.dvc and dvc push to complete versioning.

HISTORY

Introduced in DVC v0.1 (2018) by Iterative.ai; evolved for ML reproducibility, with caching optimizations in v2+.

SEE ALSO

dvc(1), dvc-push(1), dvc-pull(1), git-add(1)

Copied to clipboard