LinuxCommandLibrary

dvc-diff

Show changes between DVC tracked data versions

TLDR

Compare DVC tracked files from different Git commits, tags, and branches w.r.t the current workspace

$ dvc diff [commit_hash/tag/branch]
copy

Compare the changes in DVC tracked files from 1 Git commit to another
$ dvc diff [revision1] [revision2]
copy

Compare DVC tracked files, along with their latest hash
$ dvc diff --show-hash [commit]
copy

Compare DVC tracked files, displaying the output as JSON
$ dvc diff --show-json --show-hash [commit]
copy

Compare DVC tracked files, displaying the output as Markdown
$ dvc diff --show-md --show-hash [commit]
copy

SYNOPSIS

dvc diff [REV] [TO_REV] [OPTIONS]

PARAMETERS

-h, --help
    Show help and exit

-v, --verbose
    Shows more info on the command progress (stdout/stderr)

-q, --quiet
    Suppress output

--all
    Show all files including unchanged ones

--cd <DIR>
    Change current directory before command execution

--external
    Show changes for external outputs

--json
    Output results in JSON format

--namelist-only
    Only output names of changed files

--new
    Only output new files

--old
    Only output deleted files

--path <PATH>
    Limit diff to this path (can be used more than once)

--show-md
    Output results in markdown format

--sort <SORT>
    Sort output by column name: path, size, hash

--types <TYPE>
    Limit diff to these types: data, model, plot, params

DESCRIPTION

The dvc diff command displays differences in DVC-tracked data, models, plots, and params between two DVC trees, such as the workspace, specific revisions, or branches. It compares file sizes, hashes, and metadata without loading large datasets into memory, making it efficient for ML pipelines.

By default, it shows changes from the current workspace (HEAD) against the last committed state. You can specify revisions like dvc diff main feature-branch to compare branches.

Output includes added, modified, deleted files grouped by type (data, model, etc.), with details on path, size, and hash. Use --json for programmatic parsing or --show-md for markdown tables. Filters like --types data or --path foo/bar narrow results.

Ideal for reviewing data drifts, model updates, or experiment comparisons in Git-DVC repos. It integrates with CI/CD for automated checks.

CAVEATS

Requires a DVC repository (dvc init). Compares cache metadata only; pull data/models first for accuracy. No support for non-DVC files.

EXAMPLES

dvc diff HEAD main
dvc diff --types model --json
dvc diff v1.0 workspace --show-md --path models/

OUTPUT FORMAT

Groups by status (Added/Modified/Deleted) and type. Columns: Path, Size, Hash.

HISTORY

Introduced in DVC v0.20 (2018) by Iterative.ai (formerly Planetsig). Evolved with support for multi-stage pipelines and external outputs in v1.x+ (2020). Widely used in MLOps since DVC v2.0 (2021).

SEE ALSO

dvc status(1), dvc metrics diff(1), git diff(1), dvc checkout(1)

Copied to clipboard