dvc-destroy
Remove DVC tracked data and metadata
TLDR
Destroy the current project
Force destroy the current project
SYNOPSIS
dvc destroy [-h | --help] [-f | --force] [-c | --cache] [path ...]
PARAMETERS
path ...
(Optional) Specifies one or more DVC-tracked files or directories to destroy. If no paths are provided, dvc destroy operates on all DVC-tracked files within the current directory.
-f, --force
(Optional) Skips confirmation prompts, allowing for non-interactive execution. Use with caution as it can lead to irreversible data loss, especially when combined with --cache.
-c, --cache
(Optional) In addition to removing the data from the working tree, this flag also removes the corresponding data from the DVC cache. This action is irreversible and permanently deletes the data from the cache, freeing up storage space.
-h, --help
(Optional) Displays a help message for the command and exits.
DESCRIPTION
dvc destroy is a command in Data Version Control (DVC) used to remove DVC-tracked files and directories from the working tree. It provides a way to "untrack" or clean up data that was previously managed by DVC. When executed without specific paths, it removes all DVC-tracked files within the current directory. Optionally, with the --cache flag, it can also delete the corresponding data from the DVC cache, freeing up storage space. This command is particularly useful for cleaning up a workspace, removing outdated data, or starting fresh with specific files without having to manually delete and then clean the DVC cache. It's important to note that dvc destroy primarily affects the working tree and the DVC cache; it does not automatically modify or remove the .dvc files themselves, nor does it interact with Git to remove these .dvc files from version control.
CAVEATS
- Irreversibility: Using the --cache flag permanently deletes data from the DVC cache, which cannot be easily recovered without re-adding the original data source.
- Working Tree vs. Cache: By default, dvc destroy only removes data from the working tree. To also remove it from the DVC cache, the --cache flag must be explicitly used.
- Git Integration: This command does not automatically remove the .dvc files (the small metadata files that track the data) from Git. After running dvc destroy, you typically need to manually run git rm <file.dvc> and commit the changes to fully "untrack" the data from your repository's version control.
- Scope: It only operates on paths currently tracked by DVC.
DIFFERENCE FROM <I>DVC REMOVE</I>
While both commands relate to removing DVC-tracked items, their purposes differ significantly. dvc remove is used to untrack data from DVC by deleting the corresponding .dvc file and updating the .gitignore. It's designed to remove data from DVC's tracking system entirely, akin to a permanent untracking. In contrast, dvc destroy primarily removes the actual data from the working tree and optionally from the DVC cache, without touching the .dvc metadata file. It's more about cleaning up the local workspace and cache while preserving the tracking information in the .dvc file for potential future dvc pull operations.
TWO-STEP UNTRACKING PROCESS
To completely remove a DVC-tracked file from your working tree, DVC cache, and Git history, a common workflow involves two steps: first, use dvc destroy --cache [path] to remove the data from your workspace and cache; then, use git rm [path].dvc to remove the DVC metadata file from Git's tracking, followed by a git commit. This ensures that the data is no longer present locally or in the DVC cache, and the reference to it is removed from your Git repository.
HISTORY
The dvc destroy command was introduced as part of the DVC (Data Version Control) ecosystem, which began its development around 2017. DVC aims to bring Git-like version control to data and models, addressing the challenges of managing large datasets in machine learning and data science projects. dvc destroy provides a necessary utility for cleaning up or resetting the state of DVC-tracked data in a workspace without affecting the DVC metadata files (.dvc files) or Git repository. Its evolution reflects the need for granular control over DVC-managed data, allowing users to remove data from their local working copy and optionally from the shared DVC cache, independent of their Git history.
SEE ALSO
dvc add(1), dvc remove(1), dvc gc(1), git rm(1)