bup
Create efficient, deduplicated backups
TLDR
Initialize a backup repository in a given local directory
Prepare a given directory before taking a backup
Backup a directory to the repository specifying its name
Show the backup snapshots currently stored in the repository
Restore a specific backup snapshot to a target directory
SYNOPSIS
bup subcommand [options] [arguments]
PARAMETERS
-d, --debug
Enable debugging output for detailed operation tracing.
-h, --help
Show help message for a specific subcommand or the main command.
-v, --verbose
Increase verbosity of output, showing more information about progress.
-V, --version
Display the bup program version information.
--bup-dir=
Specify the bup repository directory. Default is ~/.bup.
--remote=
Connect to a bup server on a remote host via SSH for remote operations.
--remote-path=
Specify the path to the bup executable on the remote host when using --remote.
A specific operation to perform (e.g., 'save', 'restore', 'fs'). Each subcommand has its own set of options and arguments.
DESCRIPTION
bup is a highly efficient and robust backup system for Linux and other Unix-like operating systems. Unlike traditional backup tools, bup leverages the same underlying data structures as Git, specifically Git's packfile format. This unique approach enables bup to perform global deduplication, meaning that if the same data block appears anywhere across different files, different versions of the same file, or even different machines (when backing up to a central bup server), it is stored only once.
This significantly reduces storage requirements and network bandwidth. bup excels at handling very large datasets, often achieving speeds comparable to rsync for incremental backups while providing full historical versions. It supports local backups, remote backups via SSH, and offers various commands for saving, restoring, and managing backup sets. Its primary design goals include speed, efficiency, and data integrity.
CAVEATS
bup's Git-based nature means it can have a steeper learning curve than traditional backup tools. While highly efficient, initial full backups of very large datasets can be memory intensive. It does not provide built-in encryption of the backup repository at rest; external encryption solutions (e.g., gpg) must be used if data privacy is a concern for the stored backups.
HOW BUP ACHIEVES DEDUPLICATION
bup achieves global deduplication by breaking files into fixed-size or variable-size chunks. Each chunk is then hashed, and this hash is used as its unique identifier. If a chunk with the same hash already exists in the bup repository (from any file, any version), a new copy is not stored; instead, a reference to the existing chunk is saved. This process is similar to how Git stores objects, allowing for extremely efficient storage, especially when backing up multiple versions of files or datasets with significant redundancy.
HISTORY
bup was created by Avery Pennarun, with initial development starting around 2010. Its core innovation was applying Git's efficient storage model (specifically, content-addressable storage and packfiles) to general-purpose backups. This allowed for features like global deduplication and easy management of historical versions, which were previously difficult or inefficient to achieve. It gained popularity in the Linux community for its ability to handle large data sets and provide robust, incremental backups. Development has been community-driven since its inception.