LinuxCommandLibrary

git-filter-branch

Rewrite repository history

SYNOPSIS

git filter-branch [options] [rev-list options...]

PARAMETERS

--env-filter
     is evaluated for each commit, allowing modification of environment variables that affect the commit (e.g., GIT_AUTHOR_NAME, GIT_COMMITTER_EMAIL).

--tree-filter
     is run after the checkout of the tree for the commit. It allows modification of the files in the working directory before a new commit is created. Useful for adding, removing, or modifying files directly.

--index-filter
     is run after the index is populated for the commit. It allows modification of the index without checking out the files, making it generally faster than --tree-filter. Useful for removing files from history (e.g., git rm --cached --ignore-unmatch filename).

--parent-filter
     is evaluated for each commit to modify its parent list. Use with extreme care as it can easily corrupt history.

--msg-filter
     is run for each commit message, allowing its content to be modified (e.g., using sed to replace text).

--commit-filter
     is run for each commit, receiving the original commit ID on standard input. It allows full control over the commit object creation (e.g., skipping commits).

--tag-name-filter
     is evaluated for each tag reference, allowing its name to be modified or the tag to be removed (by outputting an empty string).

--subdirectory-filter
    Rewrites the history to make the specified the new project root. Effectively extracts a subdirectory into its own repository.

--prune-empty
    Removes commits that become empty after the filter process (e.g., if all changes are filtered out).

-f, --force
    Forces git-filter-branch to run even if a refs/original/ namespace already exists from a previous run.

--original
    Specifies an alternative namespace to store the original references. By default, it's refs/original/.

...
    Standard Git revision list options (e.g., --all, , ) to specify which commits to process. If omitted, it processes all refs.

DESCRIPTION

git-filter-branch is a powerful command-line tool for rewriting Git repository history. It iterates through all or a specified subset of commits, allowing you to modify them based on various criteria. Common use cases include removing sensitive data or large files from history, changing author information for specific commits, altering commit messages, or extracting a subdirectory into a new, standalone repository.

While highly flexible, git-filter-branch works by rewriting the entire history, which means it creates new commit IDs for all affected commits and subsequent ones. This makes it a destructive operation that should be used with extreme caution, especially on shared repositories. Its power lies in its ability to apply scriptable filters to every commit in the specified range. For many modern use cases, the newer and often faster git filter-repo is recommended as an alternative.

CAVEATS

History rewriting is a destructive operation that changes commit IDs. NEVER use git-filter-branch on a repository that is publicly shared or that others have cloned, unless you can coordinate with all collaborators to ensure they re-clone or rebase their work on the new history. Failure to do so will lead to significant repository divergence and merge conflicts. The command can be very slow for large repositories and complex filters. It stores a backup of original refs in refs/original/, which can consume significant disk space until manually deleted. Always back up your repository before running this command.

PERFORMANCE CONSIDERATIONS

git-filter-branch can be extremely slow, particularly when using --tree-filter, as it checks out every commit's contents. For operations that only modify the index or metadata, --index-filter or --env-filter are significantly faster. For large repositories or complex filtering tasks, consider using git filter-repo, which is designed for better performance.

POST-REWRITING CLEANUP

After successfully running git-filter-branch and pushing the rewritten history (which usually requires a git push --force-with-lease or git push --force), it's crucial to clean up your local repository. This involves:
1. Deleting the backup refs: git update-ref -d refs/original/refs/heads/master (and for other branches/tags)
2. Running garbage collection: git reflog expire --expire=now --all && git gc --prune=now
These steps remove the old, unreferenced objects and significantly reduce repository size.

HISTORY

git-filter-branch was one of the earliest and most versatile tools available in Git for advanced history manipulation. It emerged from the need to address common repository issues like removing inadvertently committed large files or sensitive data, which cannot be undone by simple reverts. For many years, it was the go-to utility for complex history rewrites. However, its performance (especially on large repositories) and sometimes intricate scripting requirements led to the development of faster and more user-friendly alternatives like git filter-repo, which is written in Python and often preferred today. Despite this, git-filter-branch remains a core Git command, still functional for specific, nuanced use cases or when external tools are not an option.

SEE ALSO

git filter-repo(1), git rebase(1), git gc(1), git push(1)

Copied to clipboard