dvc-gc
Clean unused DVC cache files
TLDR
SYNOPSIS
dvc gc [options]
DESCRIPTION
dvc gc removes unused files from the DVC cache, freeing disk space. At least one scope option (-w, -a, -T, -A, --all-experiments, -n, --rev, or --date) must be specified to define which data to keep.The cache accumulates files from all tracked versions. Garbage collection identifies and removes files no longer referenced by any specified commits, branches, or tags.The cloud option (-c) extends cleaning to remote storage, removing files not needed by the specified scope.
PARAMETERS
-w, --workspace
Keep files used in current workspace.-a, --all-branches
Keep files used in all Git branch tips.-T, --all-tags
Keep files used in all Git tags.-A, --all-commits
Keep files used in all Git commits.--all-experiments
Keep files used in all experiments.-c, --cloud
Also garbage collect in remote storage in addition to local cache.-r NAME, --remote NAME
Target a specific remote for garbage collection.-n NUM, --num NUM
Keep data from the last NUM commits (default: 1).--rev COMMIT
Keep data files from a specified Git commit.--date YYYY-MM-DD
Keep cached data from commits after the specified date.--not-in-remote
Keep data not present in remote storage.-f, --force
Skip confirmation prompts.-j NUM, --jobs NUM
Number of concurrent jobs for cloud operations.--dry
Preview what would be deleted without executing.-p PATHS, --projects PATHS
Include specified projects when sharing a cache directory.
CAVEATS
Irreversible operation - removed cache files must be re-downloaded or re-computed. Consider keeping all branches for collaboration. Cloud gc may affect other users' access to data. A scope option is required; running without one produces an error.
HISTORY
dvc gc implements garbage collection for DVC caches, similar to git gc but for versioned data files, enabling storage management in ML projects.
