LinuxCommandLibrary

dolt-gc

Garbage collect dolt repository data

TLDR

Clean up unreferenced data from the repository

$ dolt gc
copy

Initiate a faster but less thorough garbage collection process
$ dolt gc [[-s|--shallow]]
copy

SYNOPSIS

dolt gc [--auto] [--prune=mode] [--expire=time] [--dry-run]

PARAMETERS

--auto
    Runs garbage collection in an 'automatic' mode. This mode is typically less aggressive and designed for periodic, background execution. It might skip some pruning that a manual run would perform, making it suitable for automated maintenance tasks.

--prune=mode
    Determines how aggressively unreachable objects are pruned.
Possible modes include:
now: Prunes all unreachable objects immediately.
expire: (Default) Prunes objects that are older than the time specified by --expire.
never: Does not prune any unreachable objects, effectively just cleaning up loose objects without removing old history.

--expire=time
    Specifies the minimum age for unreachable objects to be considered for pruning when --prune=expire is used. Objects older than this time will be removed.
Time can be specified in formats like '2.weeks', '3.months', '5.days', '1.hour'. The default is often 2 weeks.

--dry-run
    Performs a simulated run of the garbage collection without actually deleting any objects. It shows what would be removed, allowing users to verify the impact before committing to the operation. This is highly recommended for understanding the effects of different pruning strategies.

DESCRIPTION

The dolt gc command, short for 'garbage collection', is used to clean up unreferenced objects within a Dolt repository.

Dolt, being a version-controlled database, keeps a complete history of all data changes. Over time, as branches are merged, rebased, or deleted, and as new commits are made, some data objects (like old versions of tables or rows) may become 'unreachable' – meaning they are no longer part of any current branch, tag, or accessible via the reflog (after a certain expiry period). These unreachable objects can accumulate and consume significant disk space.

dolt gc identifies these unreachable objects and removes them from the repository, thereby freeing up disk space and making the repository more efficient. It's analogous to git gc in Git. While generally safe, as it only removes truly unreferenced data, understanding its parameters is important to control its aggressiveness.

CAVEATS

While dolt gc is designed to be safe and only remove truly unreachable data, users should be aware:
- If you rely on the reflog to recover very recent, unpushed work (e.g., deleted branches) and run dolt gc --prune=now, that data might be permanently lost if it falls outside the reflog's expiry period or explicit reachability.
- For very large repositories, running dolt gc can be a resource-intensive operation, consuming CPU and I/O. It's advisable to run it during periods of low activity if performance is critical.
- Ensure all important data is pushed to a remote if you intend to prune aggressively on local repositories that might not contain all remote branches or tags.

HOW IT WORKS

dolt gc operates by identifying all data objects that are currently referenced by any branch, tag, active reflog entry, or commit. Any data object not reachable through these references is considered 'unreachable'. Depending on the --prune mode and --expire time, these unreachable objects are then safely removed from the repository, compacting its storage.

AUTOMATIC VS. MANUAL EXECUTION

The --auto flag makes dolt gc suitable for automatic background maintenance, performing less aggressive pruning. Manual runs without --auto (or with specific --prune flags) are typically more thorough and are useful when significant disk space needs to be reclaimed after extensive history rewriting or branch deletion.

HISTORY

Dolt is a relatively modern, Git-like version-controlled database. The gc (garbage collection) command is a fundamental feature inherited from version control systems like Git, crucial for maintaining repository efficiency and managing disk space. It has been an integral part of Dolt since its early development, reflecting the need to manage object storage and historical data effectively within a database context. Its design closely mirrors git gc, adapting the principles of object reachability and pruning to the Dolt data model.

SEE ALSO

git gc(1), dolt reflog(1), dolt commit(1), dolt branch(1), dolt tag(1), dolt pull(1), dolt push(1)

Copied to clipboard