LinuxCommandLibrary

dvc-diff

Show changes between DVC tracked data versions

TLDR

Compare DVC tracked files from different Git commits, tags, and branches w.r.t the current workspace

$ dvc diff [commit_hash/tag/branch]
copy

Compare the changes in DVC tracked files from 1 Git commit to another
$ dvc diff [revision1] [revision2]
copy

Compare DVC tracked files, along with their latest hash
$ dvc diff --show-hash [commit]
copy

Compare DVC tracked files, displaying the output as JSON
$ dvc diff --show-json --show-hash [commit]
copy

Compare DVC tracked files, displaying the output as Markdown
$ dvc diff --show-md --show-hash [commit]
copy

SYNOPSIS

dvc diff [] [] []

PARAMETERS

--all
    Show all stages in the DVC project (default).

--diff-filter=<[ADMRTUXB*]>
    Filter diff by the type of change.

--granular
    Show diffs in a more granular way.

--old
    Show only old paths.

--new
    Show only new paths.

-q, --quiet
    Suppress any output.

-h, --help
    Show help message and exit.

-v, --verbose
    Increase verbosity level.

-o, --out
    Write output to a file.

--targets
    Limit command scope to these DVC-files or directories with DVC-files.


    Base revision to compare against (commit, tag, branch).


    Head revision to compare (commit, tag, branch).

DESCRIPTION

dvc diff compares two DVC repositories, commits, or tags, displaying the differences in tracked data and pipelines. It helps track changes in data science projects by identifying which data files, dependencies, or outputs have been modified between different versions. This command is essential for understanding the impact of code or data changes on the project's overall state, ensuring reproducibility, and facilitating collaborative development. The output is formatted similarly to standard `git diff`, focusing on the differences in DVC-tracked files and dependencies. It allows you to see the changes to data files, metrics, parameters, or any other outputs defined in your dvc.yaml files. This is crucial for debugging, understanding the evolution of your data, and for code review processes.

CAVEATS

Requires a DVC repository to be initialized and configured correctly. The command's effectiveness relies on the accuracy and completeness of the dvc.yaml files in tracking dependencies and outputs.

EXAMPLE USAGE

To compare the current state with the latest committed version: dvc diff
To compare between two specific commits: dvc diff commit1 commit2
To only show added data files: dvc diff --diff-filter=A

SEE ALSO

git diff(1), dvc status(1), dvc dag(1)

Copied to clipboard