git-filter-branch
Rewrite repository history
SYNOPSIS
git filter-branch [options] [rev-list options...]
PARAMETERS
--env-filter
--tree-filter
--index-filter
--parent-filter
--msg-filter
--commit-filter
--tag-name-filter
--subdirectory-filter
Rewrites the history to make the specified
--prune-empty
Removes commits that become empty after the filter process (e.g., if all changes are filtered out).
-f, --force
Forces git-filter-branch to run even if a refs/original/ namespace already exists from a previous run.
--original
Specifies an alternative namespace to store the original references. By default, it's refs/original/.
Standard Git revision list options (e.g., --all,
DESCRIPTION
git-filter-branch is a powerful command-line tool for rewriting Git repository history. It iterates through all or a specified subset of commits, allowing you to modify them based on various criteria. Common use cases include removing sensitive data or large files from history, changing author information for specific commits, altering commit messages, or extracting a subdirectory into a new, standalone repository.
While highly flexible, git-filter-branch works by rewriting the entire history, which means it creates new commit IDs for all affected commits and subsequent ones. This makes it a destructive operation that should be used with extreme caution, especially on shared repositories. Its power lies in its ability to apply scriptable filters to every commit in the specified range. For many modern use cases, the newer and often faster git filter-repo is recommended as an alternative.
CAVEATS
History rewriting is a destructive operation that changes commit IDs. NEVER use git-filter-branch on a repository that is publicly shared or that others have cloned, unless you can coordinate with all collaborators to ensure they re-clone or rebase their work on the new history. Failure to do so will lead to significant repository divergence and merge conflicts. The command can be very slow for large repositories and complex filters. It stores a backup of original refs in refs/original/, which can consume significant disk space until manually deleted. Always back up your repository before running this command.
PERFORMANCE CONSIDERATIONS
git-filter-branch can be extremely slow, particularly when using --tree-filter, as it checks out every commit's contents. For operations that only modify the index or metadata, --index-filter or --env-filter are significantly faster. For large repositories or complex filtering tasks, consider using git filter-repo, which is designed for better performance.
POST-REWRITING CLEANUP
After successfully running git-filter-branch and pushing the rewritten history (which usually requires a git push --force-with-lease or git push --force), it's crucial to clean up your local repository. This involves:
1. Deleting the backup refs: git update-ref -d refs/original/refs/heads/master (and for other branches/tags)
2. Running garbage collection: git reflog expire --expire=now --all && git gc --prune=now
These steps remove the old, unreferenced objects and significantly reduce repository size.
HISTORY
git-filter-branch was one of the earliest and most versatile tools available in Git for advanced history manipulation. It emerged from the need to address common repository issues like removing inadvertently committed large files or sensitive data, which cannot be undone by simple reverts. For many years, it was the go-to utility for complex history rewrites. However, its performance (especially on large repositories) and sometimes intricate scripting requirements led to the development of faster and more user-friendly alternatives like git filter-repo, which is written in Python and often preferred today. Despite this, git-filter-branch remains a core Git command, still functional for specific, nuanced use cases or when external tools are not an option.
SEE ALSO
git filter-repo(1), git rebase(1), git gc(1), git push(1)