LinuxCommandLibrary

git-sizer

Analyze Git repository size and structure

TLDR

Report only statistics that have a level of concern greater than 0

$ git-sizer
copy

Report all statistics
$ git-sizer -v
copy

See additional options
$ git-sizer -h
copy

SYNOPSIS

git-sizer [options] [pathspec...]

PARAMETERS

-v, --verbose
    Enable verbose output with details.

--max-children=n
    Max children per item in lists (default: 10).

--branches[=n]
    List n largest branches by tip size (default: 10).

--blobs[=n]
    List n largest blobs by size (default: 10).

--trees[=n]
    List n largest trees by size (default: 10).

--commits[=n]
    List n largest commits by reachable size (default: 10).

--depth[=n]
    List n deepest paths (default: 10).

--paths[=n]
    List n longest paths (default: 10).

--all
    Enable all list categories: branches, blobs, trees, commits, depth, paths.

--json
    Output in JSON format.

--csv
    Output in CSV format.

--tsv
    Output in TSV format.

--no-prune
    Skip pruning unreachable objects (faster, less accurate).

--no-write
    Don't write temporary files.

-h, --help
    Show help.

--version
    Show version.

DESCRIPTION

git-sizer is a specialized tool for inspecting large Git repositories, revealing what contributes to their disk usage and complexity. It scans the repository's object database, packfiles, and references to compute key metrics: total size, object counts (commits, trees, blobs, tags), reference counts, and average/median/max values.

Core outputs include rankings of the 10 largest (by default) items in categories like blobs (e.g., large binaries), trees (directory snapshots), commits (by reachable object size), branches (by tip commit size), deepest paths (max history depth), and longest paths (filesystem path length). This helps diagnose bloat from accumulated history, duplicate data, or inefficient structures.

Run it on bare repos or worktrees; it prunes unreachable objects by default for accuracy. Verbose mode adds details; scripted outputs (JSON/CSV/TSV) enable automation. Ideal for maintainers of repos exceeding gigabytes, guiding actions like git filter-repo, shallow clones, or monorepo splits. Fast even on terabyte-scale repos via efficient traversal.

CAVEATS

Slow on enormous repos without --no-prune; requires read access to .git/objects; paths must be Git repos or worktrees; doesn't modify repo.

INSTALLATION

go install github.com/github/git-sizer@latest
Requires Go 1.16+; builds standalone binary.

EXAMPLE OUTPUT

git-sizer . → Shows summary table, e.g.,
9.2 GiB total size
2.3M blobs → then top-10 lists if verbose.

HISTORY

Developed by GitHub engineer George Burgess IV in 2019 to analyze massive repos like tensorflow/tensorflow. Open-sourced under Apache-2.0; actively maintained with Go rewrites for speed. Usage surged with monorepo growth.

SEE ALSO

Copied to clipboard