git-filter-repo
Rewrite repository history to filter content
TLDR
Replace a sensitive string in all files
Extract a single folder, keeping history
Remove a single folder, keeping history
Move everything from sub-folder one level up
SYNOPSIS
git-filter-repo [options]
PARAMETERS
--dry-run
Show what would be done without actually modifying the repository.
--force
Override various safety checks, such as operating on a dirty repository or overwriting existing backups.
--path
Filter by file paths; only include commits that touch these specified paths.
--invert-paths
Invert the meaning of --path, effectively excluding the specified paths.
--strip-blobs-bigger-than
Remove blobs (files) larger than the specified size from history.
--replace-text
Replace occurrences of text across all files in the repository's history.
--commit-callback
Execute a custom Python command or script on each commit object being processed.
--blob-callback
Execute a custom Python command or script on each blob (file content) object.
--subdirectory-filter
Extract a subdirectory into the root of the new repository, discarding other content.
--to-subdirectory-filter
Move all contents of the current repository into a new subdirectory.
--mailmap
Apply a .mailmap file to clean up and normalize author/committer identities.
--preserve-empty
Keep commits that become empty after filtering, instead of dropping them.
DESCRIPTION
git-filter-repo is a sophisticated tool for rewriting Git repository history, designed as a modern, safer, and faster alternative to the older git filter-branch command. It allows users to perform complex history manipulation tasks such as removing sensitive files or data from all commits, excising large binary files accidentally committed, splitting or combining repositories, and modifying commit metadata like author or committer information.
Written in Python, it prioritizes safety by default, often creating backups and refusing to operate on dirty repositories, thereby reducing the risk of accidental data loss during the rewriting process. Its flexibility comes from a powerful filtering engine that can apply various transformations to commits, trees, blobs, and references.
CAVEATS
- Destructive Operation: Rewrites history, changing commit SHAs. Always back up your repository before use.
- Requires Git 2.22+: Depends on newer Git features for optimal performance and safety.
- Python Dependency: As it's written in Python, a compatible Python interpreter is required to run the command.
- Shared Repositories: Not suitable for public repositories where history has already been shared. All collaborators will need to re-clone or force update their repositories.
SAFETY FEATURES
git-filter-repo incorporates several robust safety mechanisms. It automatically creates a backup of the original repository state in a filter-repo/ directory before performing any changes. It also refuses to operate on a dirty working directory by default, preventing accidental data loss, and requires explicit use of the --force option for potentially dangerous operations.
CALLBACKS
A powerful and flexible aspect of git-filter-repo is its extensive callback system. Users can provide custom Python scripts (callbacks) that are executed for various Git objects such as blobs, trees, commits, tags, references, and even individual path entries. This allows for highly specific, complex, and programmable history transformations that are not covered by the standard command-line options, enabling fine-grained control over the rewriting process.
HISTORY
git-filter-repo was created by Eli Schwartz (newren), a Git contributor, as a modern and more efficient successor to the older, often problematic git filter-branch. Recognizing the shortcomings of git filter-branch (slow, complex, prone to errors), this new tool was developed to leverage modern Git features and a robust Python-based architecture. First publicly released around 2020, it quickly gained prominence due to its speed, built-in safety features (like automatic backups and dirty repository checks), and a flexible callback system, becoming the recommended tool for complex history rewriting tasks by the Git community.
SEE ALSO
git(1), git-filter-branch(1), git-rebase(1), git-reflog(1), git-gc(1)