LinuxCommandLibrary

bfg

Remove large files from Git history

TLDR

Remove a file with sensitive data but leave the latest commit untouched

$ bfg --delete-files [file_with_sensitive_data]
copy

Remove all text mentioned in the specified file wherever it can be found in the repository's history
$ bfg --replace-text [path/to/file.txt]
copy

SYNOPSIS

bfg [options] [repository]

PARAMETERS

--delete-files <filename/glob-expression>
    Delete all files matching the given filename or glob expression from the repository history.

--delete-folders <foldername>
    Delete all folders (and their contents) matching the given folder name from the repository history.

--delete-larger-than <size>
    Delete files larger than the specified size (e.g., '10M' for 10 megabytes) from the repository history.

--strip-blobs-bigger-than <size>
    Strip blobs larger than the specified size (e.g., '10M' for 10 megabytes) from the repository history. This is similar to `--delete-larger-than`, but it only removes the content of the blobs, leaving the commits intact.

--delete-objects <filename>
    Delete objects from the history specified within the given file name.

--convert-to-git-lfs <filename/glob-expression>
    Convert files matching the given filename or glob expression to Git LFS.

--no-blob-protection
    Disable blob protection (use with caution!). By default, the BFG protects blobs from being rewritten or deleted if they're referenced by the current commit. Disabling this protection allows you to remove blobs even if they're in the current commit.

--help
    Display help information.

--version
    Display the BFG version.

DESCRIPTION

The BFG Repo-Cleaner is a simpler, faster alternative to `git filter-branch` for cleaning up Git repositories. It's specifically designed for removing large or problematic files from a repository's history, such as accidentally committed passwords, large binaries, or sensitive data.

The BFG rewrites the repository's history, permanently removing the specified files and making the commits that contained them no longer accessible. This drastically reduces repository size and improves performance, especially for very large repositories. It operates by identifying and removing references to the unwanted data throughout the Git object database. The BFG is generally faster than `git filter-branch` because it's optimized for this specific task. The cleanup process should be followed by a `git gc --prune=now --aggressive` on all clones of the repository after the rewrite to fully reclaim the space.

Important: As BFG rewrites git history, it is crucial to communicate the rewrite to all users of the repository and ensure they understand how to update their local clones safely to avoid conflicts and data loss.

CAVEATS

The BFG rewrites Git history, which can be disruptive. Ensure all team members are aware of the process and know how to update their local repositories. Back up your repository before running the BFG. Properly configured `gc` parameters must be configured post rewrite, after push, to see performance and space reclamation benefits.

USAGE EXAMPLES

Removing a specific file:
bfg --delete-files id_rsa my-repo.git

Removing all files larger than 10MB:
bfg --delete-larger-than 10M my-repo.git

Converting all *.psd files to Git LFS:
bfg --convert-to-git-lfs '*.psd' my-repo.git

POST-CLEANUP STEPS

After running the BFG, you need to perform the following steps:
1. Force-push the cleaned repository to the remote: git push --force --all and git push --force --tags
2. Run garbage collection in the repository to completely remove unreachable commits: git gc --prune=now --aggressive
3. Instruct all team members to rebase their local repositories: git fetch --all; git reset --hard origin/master (or the appropriate branch).

HISTORY

The BFG was developed to provide a faster and simpler alternative to `git filter-branch` for cleaning up Git repositories, particularly for removing large files and sensitive data from history. It focuses on specific cleanup tasks and avoids the complexity of `git filter-branch` for those use cases. It's widely used to reduce repository size and improve Git performance.

SEE ALSO

git filter-branch(1), git gc(1), git lfs(1)

Copied to clipboard