dvc-add

Track files or directories with DVC

TLDR

Add a single target file to the index

$ dvc add [path/to/file]

Add a target directory to the index

$ dvc add [path/to/directory]

Recursively add all the files in a given target directory

$ dvc add --recursive [path/to/directory]

Add a target file with a custom .dvc filename

$ dvc add --file [custom_name.dvc] [path/to/file]

-f, --force
    Overwrite existing .dvc files if present

-R, --recursive
    Recursively add all files in directories

--dry
    Print actions without modifying files

--external
    Add external data (not in workspace)

--glob
    Treat targets as glob patterns

--name NAME
    Specify output .dvc file name

-q, --quiet
    Suppress non-error messages

-v, --verbose
    Enable verbose output

-h, --help
    Show help message and exit

DESCRIPTION

dvc add adds data files or directories to a DVC (Data Version Control) project for versioning alongside code in Git. It copies the specified targets to DVC's internal cache (typically .dvc/cache), computes checksums, and generates lightweight .dvc metadata files. These .dvc files contain info like MD5 hash, size, and path, and are automatically staged for Git commit.

The command ensures reproducible data pipelines by decoupling large datasets from Git repos. After adding, use dvc push to upload cache to remote storage (e.g., S3, GCS). To share, others run dvc pull to restore data from cache.

Example: dvc add data/model.pkl creates data/model.pkl.dvc and moves the file to cache. For dirs: dvc add dataset/. Symlinks are resolved to actual files. Ideal for ML models, datasets, and outputs exceeding Git limits.

dvc-add

Track files or directories with DVC

TLDR

SYNOPSIS

PARAMETERS

DESCRIPTION

CAVEATS

OUTPUT FILES

POST-ADD STEPS

HISTORY

SEE ALSO