LinuxCommandLibrary

git-filter-repo

Rewrite repository history to filter content

TLDR

Replace a sensitive string in all files

$ git filter-repo --replace-text <(echo '[find]==>[replacement]')
copy

Extract a single folder, keeping history
$ git filter-repo --path [path/to/folder]
copy

Remove a single folder, keeping history
$ git filter-repo --path [path/to/folder] --invert-paths
copy

Move everything from sub-folder one level up
$ git filter-repo --path-rename [path/to/folder/:]
copy

SYNOPSIS

git-filter-repo [options]

PARAMETERS

--dry-run
    Show what would be done without actually modifying the repository.

--force
    Override various safety checks, such as operating on a dirty repository or overwriting existing backups.

--path
    Filter by file paths; only include commits that touch these specified paths.

--invert-paths
    Invert the meaning of --path, effectively excluding the specified paths.

--strip-blobs-bigger-than
    Remove blobs (files) larger than the specified size from history.

--replace-text
    Replace occurrences of text across all files in the repository's history.

--commit-callback
    Execute a custom Python command or script on each commit object being processed.

--blob-callback
    Execute a custom Python command or script on each blob (file content) object.

--subdirectory-filter


    Extract a subdirectory into the root of the new repository, discarding other content.

--to-subdirectory-filter
    Move all contents of the current repository into a new subdirectory.

--mailmap
    Apply a .mailmap file to clean up and normalize author/committer identities.

--preserve-empty
    Keep commits that become empty after filtering, instead of dropping them.

DESCRIPTION

git-filter-repo is a sophisticated tool for rewriting Git repository history, designed as a modern, safer, and faster alternative to the older git filter-branch command. It allows users to perform complex history manipulation tasks such as removing sensitive files or data from all commits, excising large binary files accidentally committed, splitting or combining repositories, and modifying commit metadata like author or committer information.

Written in Python, it prioritizes safety by default, often creating backups and refusing to operate on dirty repositories, thereby reducing the risk of accidental data loss during the rewriting process. Its flexibility comes from a powerful filtering engine that can apply various transformations to commits, trees, blobs, and references.

CAVEATS

  • Destructive Operation: Rewrites history, changing commit SHAs. Always back up your repository before use.
  • Requires Git 2.22+: Depends on newer Git features for optimal performance and safety.
  • Python Dependency: As it's written in Python, a compatible Python interpreter is required to run the command.
  • Shared Repositories: Not suitable for public repositories where history has already been shared. All collaborators will need to re-clone or force update their repositories.

SAFETY FEATURES

git-filter-repo incorporates several robust safety mechanisms. It automatically creates a backup of the original repository state in a filter-repo/ directory before performing any changes. It also refuses to operate on a dirty working directory by default, preventing accidental data loss, and requires explicit use of the --force option for potentially dangerous operations.

CALLBACKS

A powerful and flexible aspect of git-filter-repo is its extensive callback system. Users can provide custom Python scripts (callbacks) that are executed for various Git objects such as blobs, trees, commits, tags, references, and even individual path entries. This allows for highly specific, complex, and programmable history transformations that are not covered by the standard command-line options, enabling fine-grained control over the rewriting process.

HISTORY

git-filter-repo was created by Eli Schwartz (newren), a Git contributor, as a modern and more efficient successor to the older, often problematic git filter-branch. Recognizing the shortcomings of git filter-branch (slow, complex, prone to errors), this new tool was developed to leverage modern Git features and a robust Python-based architecture. First publicly released around 2020, it quickly gained prominence due to its speed, built-in safety features (like automatic backups and dirty repository checks), and a flexible callback system, becoming the recommended tool for complex history rewriting tasks by the Git community.

SEE ALSO

Copied to clipboard