dvc
Manage machine learning experiments and data
TLDR
Initialize a new DVC project
Configure a remote storage location
Add one or more data files or directories to tracking
Show project status
Upload tracked files to remote storage
Download tracked files from remote storage
Display help
Display version
SYNOPSIS
dvc [-h] [-V] [--cd <DIR>] [-q] [-v <LEVEL>] [<SUBCOMMAND>] [<ARGS>]
PARAMETERS
-h, --help
Show help message and exit
-V, --version
Show program's version and exit
--cd <DIR>
Change to directory DIR before command execution
-q, --quiet
Suppress all output except errors (same as -v 0)
-v, --verbose [<LEVEL>]
More output verbosity (LEVEL from 2 to 10; default 1, same as -q for 0)
DESCRIPTION
DVC (Data Version Control) is an open-source command-line tool designed for data scientists and ML engineers to version data, models, and experiments like Git versions code.
It solves key challenges in ML workflows: tracking large datasets and models without bloating Git repos, reproducible pipelines via dependency graphs, and efficient experiment management. DVC stores pointers to data in Git and keeps actual files in remote storages (S3, GCS, Azure, SSH, etc.) or local cache.
Core workflow: dvc init sets up .dvc dir; dvc add data.csv tracks and hashes file; dvc push uploads to remote; dvc run -o model.pkl -m metrics.json script.py defines pipeline stages with inputs/outputs; dvc repro rebuilds only changed stages using cache.
Integrates seamlessly with Git: commit DVC files to Git, data stays external. Supports metrics/plots viewing (dvc metrics show), experiments (dvc exp), and params (dvc params). Free, no vendor lock-in, used by thousands in production ML.
CAVEATS
Requires Git repository; install via pip install dvc or package managers. Large data needs remote storage configured (dvc remote add). Not a core Linux utility.
COMMON SUBCOMMANDS
dvc init: Initialize DVC repo.
dvc add FILE: Track data/model.
dvc push: Upload cache to remote.
dvc pull: Download from remote.
dvc repro: Reproduce pipelines.
INSTALLATION
pip install dvc or conda install -c conda-forge dvc. For remotes: pip install dvc[s3], etc.
HISTORY
Developed by Iterative.ai; first release July 2017 (v0.1). Evolved from MLflow needs, now v3.x with studio integration. Widely adopted in MLOps communities.
SEE ALSO
git(1)


