LinuxCommandLibrary

git-obliterate

Permanently remove sensitive data from Git history

TLDR

Erase the existence of specific files

$ git obliterate [file_1 file_2 ...]
copy

Erase the existence of specific files between 2 commits
$ git obliterate [file_1 file_2 ...] -- [commit_hash_1]..[commit_hash_2]
copy

SYNOPSIS

git obliterate [options] <pathspec>...
git obliterate [options] --regexp <pattern>...

PARAMETERS

--verbose
    Show verbose output during the obliteration process.

--dry-run
    Perform a trial run without making any actual changes to the repository.

--force
    Bypass confirmation prompts before modifying the repository history.

--regexp <pattern>
    Interpret the provided pathspecs as regular expressions for matching files/directories.

--invert-regexp
    Invert the match for regular expressions, obliterating everything except what matches the pattern.

--index
    Include items that currently exist only in the Git index (staging area).

--add-all
    Add all untracked files to the index before processing, effectively including them in the history scan.

--pack-all
    After rewriting history, pack all Git objects into a single pack file to optimize repository size.

--prune
    Immediately prune unreferenced Git objects from the repository after rewriting history.

--no-gc
    Do not run git gc (garbage collection) automatically at the end of the process.

--git-dir <path>
    Specify the path to the .git directory if it's not in the current working directory.

--work-tree <path>
    Specify the path to the working tree if it's not the current working directory.

<pathspec>...
    One or more paths to files or directories to be obliterated from the repository history.

DESCRIPTION

git-obliterate is a powerful, non-standard Python script designed for the extreme task of irreversibly removing specified data (files, blobs, directories, etc.) from a Git repository's entire history. Unlike git filter-branch or tools like BFG Repo-Cleaner, git-obliterate aims for maximum obliteration, tracking content across renames and moves, and ensuring all references to the targeted data are purged from every commit, tree, and blob object. This makes it suitable for situations where sensitive or unwanted data must be completely erased from a repository, such as leaked credentials or large files. However, its destructive nature means it must be used with extreme caution and only on a cloned repository, as it rewrites history significantly. After obliteration, all collaborators must re-clone the repository.

CAVEATS

git-obliterate is an extremely destructive tool that irreversibly rewrites a repository's history. It must be used with utmost caution, preferably on a fresh clone or a backup of the repository. All collaborators must discard their existing local repositories and re-clone after the obliteration process. It is a non-standard script and requires separate installation. It does not affect data that might be cached by remote hosting services (e.g., GitHub pull request caches).

INSTALLATION

git-obliterate is not bundled with Git. It typically needs to be downloaded as a standalone Python script (e.g., from its GitHub repository) and placed in a directory that is included in your system's PATH, or executed directly using python git-obliterate.py.

USAGE BEST PRACTICES

Always make a full backup or work on a fresh clone of the repository before running git-obliterate. Use the --dry-run option extensively to preview changes. Communicate clearly with all repository collaborators, as they will need to re-clone the repository from scratch after the history rewrite.

HISTORY

git-obliterate is a non-standard utility, typically implemented as a Python script, developed by Jonathon D. Smith. It emerged as a more aggressive and thorough alternative to standard Git tools like git filter-branch for scenarios demanding the complete and irreversible eradication of specific content from a Git repository's history. Its design focuses on deep content tracking, even across renames and moves, ensuring that all references to targeted data are purged from every Git object. It gained attention for its ability to handle complex history rewriting tasks that other tools might miss, particularly when dealing with sensitive data leaks or large, unwanted files.

SEE ALSO

git-filter-branch(1), git-gc(1), git-reflog(1), BFG Repo-Cleaner (external tool)

Copied to clipboard