dvc-add
Track files or directories with DVC
TLDR
Add a single target file to the index
Add a target directory to the index
Recursively add all the files in a given target directory
Add a target file with a custom .dvc filename
SYNOPSIS
dvc add [options] targets...
PARAMETERS
-f, --force
Overwrite existing .dvc files if present
-R, --recursive
Recursively add all files in directories
--dry
Print actions without modifying files
--external
Add external data (not in workspace)
--glob
Treat targets as glob patterns
--name NAME
Specify output .dvc file name
-q, --quiet
Suppress non-error messages
-v, --verbose
Enable verbose output
-h, --help
Show help message and exit
DESCRIPTION
dvc add adds data files or directories to a DVC (Data Version Control) project for versioning alongside code in Git. It copies the specified targets to DVC's internal cache (typically .dvc/cache), computes checksums, and generates lightweight .dvc metadata files. These .dvc files contain info like MD5 hash, size, and path, and are automatically staged for Git commit.
The command ensures reproducible data pipelines by decoupling large datasets from Git repos. After adding, use dvc push to upload cache to remote storage (e.g., S3, GCS). To share, others run dvc pull to restore data from cache.
Example: dvc add data/model.pkl creates data/model.pkl.dvc and moves the file to cache. For dirs: dvc add dataset/. Symlinks are resolved to actual files. Ideal for ML models, datasets, and outputs exceeding Git limits.
CAVEATS
Does not track symlinks (resolves them); data moved to cache loses original workspace copy unless --external; requires initialized DVC repo.
OUTPUT FILES
Generates <target>.dvc (git-tracked) and stores data in .dvc/cache (ignored by git).
POST-ADD STEPS
Run git add <target>.dvc and dvc push to complete versioning.
HISTORY
Introduced in DVC v0.1 (2018) by Iterative.ai; evolved for ML reproducibility, with caching optimizations in v2+.


