LinuxCommandLibrary

dvc-dag

Visualize the DVC pipeline as a graph

TLDR

Visualize the entire pipeline

$ dvc dag
copy

Visualize the pipeline stages up to a specified target stage
$ dvc dag [target]
copy

Export the pipeline in the dot format
$ dvc dag --dot > [path/to/pipeline.dot]
copy

SYNOPSIS

dvc dag [OPTIONS] [TARGETS...]

PARAMETERS

-h, --help
    Show the command's help message and exit.

-o OUT, --out OUT
    Specify a file path to write the output graph instead of standard output.

--dot
    Output the dependency graph in Graphviz DOT format. This format can be processed by external tools like Graphviz to render visual graphs (e.g., as SVG or PNG images).

--mermaid
    Output the dependency graph in Mermaid JS diagram syntax. This format is suitable for embedding diagrams in Markdown files or web pages that support Mermaid rendering.

--targets [TARGETS ...]
    Filter the pipeline graph to only show stages that produce the specified target outputs. Multiple targets can be provided.

--full
    Display all stages in the graph, including those that might not currently exist as DVC files but are defined as part of the pipeline structure.

DESCRIPTION

The dvc-dag command is a core utility within Data Version Control (DVC) designed to visualize the dependency graph of a DVC project. It allows users to understand the relationships between different data files, code scripts, and machine learning models defined as stages in dvc.yaml files. By printing the dependency graph, dvc-dag helps in debugging pipelines, optimizing workflows, and ensuring that changes in one part of the project correctly trigger updates in dependent components. It primarily outputs the graph structure to standard output, with options to format it for external visualization tools like Graphviz (DOT format) or directly for Mermaid diagrams, making it highly versatile for documentation and analysis.

CAVEATS

To visually render the Graphviz DOT output (e.g., into an image), you typically need to install an external tool like Graphviz and use its dot command. For example:
dvc dag --dot | dot -Tsvg -o pipeline.svg.
The displayed graph reflects the state defined in your dvc.yaml and dvc.lock files, not necessarily the current execution state of your data or code.

VISUALIZATION EXAMPLE WITH GRAPHVIZ

After generating a DOT file using dvc dag --dot, you can render it into a visual image. For instance, to create an SVG image and open it:
dvc dag --dot > pipeline.dot
dot -Tsvg pipeline.dot -o pipeline.svg
xdg-open pipeline.svg (on Linux)
Similarly, for Mermaid output, you can paste the generated text into a Mermaid-compatible viewer or Markdown file.

HISTORY

DVC (Data Version Control) was open-sourced in 2017 by Iterative.ai, aiming to bring Git-like versioning capabilities to machine learning projects and large datasets. The dvc-dag command, or its equivalent functionality, has been a fundamental part of DVC's toolkit from its early stages. It's crucial for understanding the data and model pipelines that DVC manages, evolving with DVC's core capabilities to support various output formats and filtering options, making pipeline visualization more accessible and powerful.

SEE ALSO

dvc repro(1), dvc pipeline show(1), dot(1)

Copied to clipboard