dvc-diff
Show changes between DVC tracked data versions
TLDR
Compare DVC tracked files from different Git commits, tags, and branches w.r.t the current workspace
Compare the changes in DVC tracked files from 1 Git commit to another
Compare DVC tracked files, along with their latest hash
Compare DVC tracked files, displaying the output as JSON
Compare DVC tracked files, displaying the output as Markdown
SYNOPSIS
dvc diff [REV] [TO_REV] [OPTIONS]
PARAMETERS
-h, --help
Show help and exit
-v, --verbose
Shows more info on the command progress (stdout/stderr)
-q, --quiet
Suppress output
--all
Show all files including unchanged ones
--cd <DIR>
Change current directory before command execution
--external
Show changes for external outputs
--json
Output results in JSON format
--namelist-only
Only output names of changed files
--new
Only output new files
--old
Only output deleted files
--path <PATH>
Limit diff to this path (can be used more than once)
--show-md
Output results in markdown format
--sort <SORT>
Sort output by column name: path, size, hash
--types <TYPE>
Limit diff to these types: data, model, plot, params
DESCRIPTION
The dvc diff command displays differences in DVC-tracked data, models, plots, and params between two DVC trees, such as the workspace, specific revisions, or branches. It compares file sizes, hashes, and metadata without loading large datasets into memory, making it efficient for ML pipelines.
By default, it shows changes from the current workspace (HEAD) against the last committed state. You can specify revisions like dvc diff main feature-branch to compare branches.
Output includes added, modified, deleted files grouped by type (data, model, etc.), with details on path, size, and hash. Use --json for programmatic parsing or --show-md for markdown tables. Filters like --types data or --path foo/bar narrow results.
Ideal for reviewing data drifts, model updates, or experiment comparisons in Git-DVC repos. It integrates with CI/CD for automated checks.
CAVEATS
Requires a DVC repository (dvc init). Compares cache metadata only; pull data/models first for accuracy. No support for non-DVC files.
EXAMPLES
dvc diff HEAD main
dvc diff --types model --json
dvc diff v1.0 workspace --show-md --path models/
OUTPUT FORMAT
Groups by status (Added/Modified/Deleted) and type. Columns: Path, Size, Hash.
HISTORY
Introduced in DVC v0.20 (2018) by Iterative.ai (formerly Planetsig). Evolved with support for multi-stage pipelines and external outputs in v1.x+ (2020). Widely used in MLOps since DVC v2.0 (2021).


