LinuxCommandLibrary

git-fast-import

Import data into Git repository efficiently

SYNOPSIS

git fast-import [options]

PARAMETERS

--depth <n>
    Limit the import to a shallow history, including only the last n commits per branch.

--export-marks=<file>
    Export the generated marks to a specified file, useful for resuming or chaining imports.

--import-marks=<file>
    Import marks from a specified file, allowing the re-use of object IDs from previous imports.

--force
    Allow fast-import to run even if the repository is not empty, potentially overwriting existing references.

--dry-run
    Process the input stream but do not write any Git objects or update references.

--show-stats
    Print performance statistics at the end of the import operation.

--max-pack-size=<n>
    Set the maximum size for generated pack files during the import process.

--big-file-threshold=<n>
    Optimize handling of large files by treating them as 'big files' if their size exceeds n bytes.

--cat-blob-fd=<fd>
    Use the specified file descriptor for cat-blob requests, enabling an external process to provide blob content.

--refmap=<refmap>
    Apply a reference mapping to translate incoming reference names (e.g., refs/heads/*:refs/remotes/origin/*).

--progress
    Display progress messages to standard error during the import.

--quiet
    Suppress non-essential output, including progress messages.

DESCRIPTION

git fast-import is a highly optimized Git command designed for efficiently importing large amounts of historical data from other version control systems into a Git repository. It operates by reading a specific, stream-based 'fast-import' protocol from standard input, which describes the repository's history in terms of branches, commits, file content, and references. This protocol is designed to be very simple and parseable, allowing for extremely fast ingestion of data, bypassing many of the overheads associated with creating commits and objects one by one.

It's not a direct converter from other VCSs; rather, it's the target for converters (like git-svn or custom scripts) that translate a foreign VCS's history into the fast-import stream format. This makes git fast-import an indispensable tool for migrating large, complex repositories from systems like SVN, Mercurial, or Perforce into Git, preserving the full commit history, tags, and branches. Its stream-based nature also allows for handling histories larger than available RAM, as data is processed incrementally.

CAVEATS

git fast-import expects a specific, stream-based input format. It is not a direct conversion tool for other VCSs; you need a pre-processor or script (like git-svn) to generate the fast-import stream.

While highly optimized, importing extremely large repositories (gigabytes of data, millions of commits) can still be memory and CPU intensive, especially during the packing phase.

Error handling for malformed input streams can be challenging, as the process is stream-based and errors might only appear much later.

It typically works best on an empty or newly initialized repository to avoid conflicts with existing history, though --force can override this.

THE FAST-IMPORT PROTOCOL

git fast-import operates on a meticulously defined text-based protocol read from standard input. This protocol consists of commands like blob, commit, tag, reset, checkpoint, and progress, which describe objects (blobs, trees, commits, tags) and how they relate to the repository's history and references. Each command has a specific syntax for defining content, metadata (author, committer, date), parents, and target references. The simplicity and efficiency of this protocol are key to git fast-import's performance, as it avoids the overhead of complex data structures or object validation during the initial ingestion phase, allowing for raw data to be streamed directly into Git's object database.

TYPICAL WORKFLOW

A common workflow for using git fast-import involves three main steps:
1. Extraction and Transformation: Use a custom script or a specialized tool (e.g., svn2git, hg-fast-export) to extract the history from the source VCS (e.g., SVN, Mercurial) and transform it into the git fast-import stream format. This often involves handling author mappings, branch/tag naming conventions, and large file strategies.
2. Import: Pipe the generated fast-import stream directly into git fast-import in an empty or newly initialized Git repository. For example: cat fast-import-stream.txt | git fast-import.
3. Post-Import Optimization: After the import is complete, it's highly recommended to run git gc --prune=now and git repack -ad to optimize the repository's storage, remove unnecessary objects, and ensure efficient access to the newly imported history.

HISTORY

git fast-import was introduced very early in Git's development, specifically designed to address the critical need for efficient migration of existing version control histories from other systems. Prior to its existence, importing large projects was a slow and often impractical process. Linus Torvalds himself played a significant role in its design, aiming for a protocol that was simple, stateless, and optimized for speed and large datasets. Its stream-oriented nature allows it to handle histories that far exceed available memory, making it a cornerstone for large-scale migrations from systems like CVS, SVN, and Mercurial into Git, thus contributing significantly to Git's widespread adoption in enterprise environments. The fast-import protocol itself has remained remarkably stable and efficient over the years.

SEE ALSO

Copied to clipboard