bfg
Remove large files from Git history
TLDR
Remove a file with sensitive data but leave the latest commit untouched
Remove all text mentioned in the specified file wherever it can be found in the repository's history
SYNOPSIS
java -jar bfg.jar [options] <repository-path>
Examples:
Remove all blobs bigger than 100 megabytes:java -jar bfg.jar --strip-blobs-bigger-than 100M my-repo.git
Remove specific file paths from history, protecting latest commit:java -jar bfg.jar --strip-blobs-with-filepaths secret.txt,passwords/ my-repo.git
Remove all big files and their commit history (use with caution!):java -jar bfg.jar --all-and-co --force my-repo.git
PARAMETERS
--strip-blobs-bigger-than
Removes all blobs larger than the specified size (e.g., 100M, 1G).
--strip-blobs-with-filepaths
Removes blobs found at paths listed in the provided file, or comma-separated paths. Wildcards like `*.zip` are supported.
--strip-blobs-with-regex
Removes blobs matching the given regular expression.
--no-blob-protection
By default, BFG protects blobs in your latest commit. This option disables that protection.
--all-and-co
Removes ALL instances of target files/blobs, even if they are present in protected commits. Use with extreme caution.
--force
Required to run BFG. It's a reminder that BFG performs destructive, history-rewriting operations.
--dry-run
Processes the repository without making any changes, showing what would be done.
--debug
Enables verbose debug logging.
DESCRIPTION
The BFG Repo-Cleaner, often simply referred to as bfg, is a powerful and fast tool for cleaning up Git repository history. Unlike the traditional git-filter-branch, BFG is designed to be significantly quicker and simpler to use, especially for large repositories.
Its primary purpose is to remove unwanted large files, sensitive data, or any specified blobs from a Git repository's history, effectively rewriting the repository's commit history. It operates directly on the Git object database, making it very efficient.
BFG is not a native Linux command but rather a Java application distributed as an executable JAR file. It's an indispensable tool for maintaining lean repositories, particularly after accidentally committing large files or sensitive information that needs to be purged from the entire history. After running BFG, it's crucial to perform additional Git cleanup steps and ensure all collaborators re-clone the repository due to the rewritten history.
CAVEATS
The BFG Repo-Cleaner is not a standard Linux command-line utility; it requires Java to be installed on your system.
Destructive Operation: BFG permanently rewrites Git history. It is highly recommended to create a complete backup of your repository before running BFG.
Bare Repository: BFG must be run on a bare Git repository (e.g., `my-repo.git`), not a working copy (e.g., `my-repo/.git`).
Post-BFG Steps: After running BFG, critical `git` commands (`git reflog expire --expire=now --all` and `git gc --prune=now`) are necessary to truly remove the old, unwanted data.
Collaboration Impact: After cleaning, all collaborators must delete their old local clones and re-clone the repository from scratch.
<B>POST-BFG CLEANUP</B>
After BFG completes, the old, unwanted objects are still referenced by the Git 'reflog'. To completely remove them and reclaim disk space, you must navigate into the cleaned bare repository and execute:git reflog expire --expire=now --all
followed by:git gc --prune=now
This step is crucial for permanent data removal and repository size reduction.
<B>BARE REPOSITORY REQUIREMENT</B>
The BFG must be run against a bare repository. A bare repository is essentially the `.git` directory without the working tree. You can create one from a full clone using:git clone --mirror your-repo-url.git
or by copying the `.git` directory:cp -R your-working-repo/.git your-bare-repo.git
This ensures BFG operates on the raw Git database without interference from local files or branches that might be checked out.
HISTORY
The BFG Repo-Cleaner was created by Robert P. R. for Atlassian, with its first public release around 2014. It was developed as a modern, high-performance alternative to Git's built-in `git-filter-branch` command, which was often slow and complex for large repositories. BFG gained rapid popularity due to its significant speed advantage and simpler command-line interface, making the process of purging sensitive data or large files from Git history much more accessible and efficient.
SEE ALSO
git-filter-branch(1), git(1), git-reflog(1), git-gc(1), java(1)