LinuxCommandLibrary

gc

Remove untracked git files

TLDR

Count nodes and edges in a file

$ gc [path/to/file.dot]
copy

Count only [n]odes
$ gc -n [path/to/file.dot]
copy

Count only [e]dges
$ gc -e [path/to/file.dot]
copy

Count [c]onnected components
$ gc -c [path/to/file.dot]
copy

Display help
$ gc -?
copy

SYNOPSIS

git gc [options]

PARAMETERS

--aggressive
    Packs all loose objects into a single packfile with an aggressive packing strategy, which can result in a smaller packfile but takes significantly longer. Useful for long-term storage or highly optimized repositories.

--auto
    Runs garbage collection only if the value of the gc.auto configuration variable (or the default threshold) indicates that it's due. This is the default behavior when Git triggers gc automatically.

--prune=date
    Prunes (deletes) all unreachable objects older than the specified date. If not specified, the default is 2 weeks. Use --prune=now to prune all unreachable objects immediately.

--quiet
    Suppresses all progress messages and reports during the garbage collection process, only showing errors.

--force
    Forces gc to run even if gc.auto threshold is not met. Useful when you want to explicitly run a full garbage collection regardless of Git's auto-detection.

--no-prune
    Disables pruning of unreachable objects. Useful if you want to repack objects but retain all unreachable objects.

DESCRIPTION

The gc command in Git (invoked as git gc) performs garbage collection on the current repository. Its primary function is to clean up unnecessary files and pack Git repository data, which helps in optimizing performance and saving disk space.

When you commit, branch, or perform other operations in Git, some objects (like old commits, deleted branches, or unreachable objects) might become 'loose' or unreferenced. git gc identifies and removes these unreferenced objects and packs remaining loose objects into packfiles. Packfiles are more efficient for storage and retrieval, especially for large numbers of objects. This process makes the repository smaller and faster for future operations.

While often run automatically by Git after certain operations (like git commit or git merge), it can also be invoked manually for thorough cleanup or specific optimization needs. It's a fundamental maintenance command for any Git user.

CAVEATS

Running git gc --aggressive on very large repositories can consume significant CPU and memory resources and take a long time to complete. It is generally not recommended to run git gc while other Git operations that might create new objects or modify refs (e.g., pushes, fetches, merges) are in progress, as it could lead to inconsistencies or data loss if not handled carefully.

AUTOMATIC GARBAGE COLLECTION

Git often runs git gc --auto automatically after certain operations to keep the repository in good shape without explicit user intervention. This behavior is controlled by configuration variables like gc.auto (which sets the threshold for loose objects before auto-gc runs) and gc.autoPackLimit (for controlling the number of packfiles).

LOOSE OBJECTS VS. PACKFILES

Git stores objects (blobs, trees, commits, tags) either as 'loose objects' (individual files) or within 'packfiles' (compressed archives of multiple objects). Loose objects are easier to create quickly, but packfiles are more efficient for storage and network transfer. git gc's main task is to convert loose objects into packfiles and consolidate existing packfiles, improving overall repository efficiency.

HISTORY

Garbage collection has been an integral part of Git since its early days. As a distributed version control system designed for efficiency and integrity, the ability to maintain repository health and optimize storage was crucial. The git gc command evolved to manage the lifecycle of Git objects, ensuring that repositories remain performant and free of unnecessary clutter, reflecting Git's continuous focus on robust and efficient data management.

SEE ALSO

Copied to clipboard