rdiff-backup
Create versioned incremental backups
TLDR
Back up path/to/source to path/to/backup
List incremental backups in repository (location path, local or remote)
Restore from most recent backup
Restore backed up files as they were 3 days ago
SYNOPSIS
rdiff-backup [options] source_directory destination_directory
rdiff-backup [options] --restore source_directory destination_directory
PARAMETERS
--restore
Restores files from a backup repository to a specified destination. Requires a source (the backup repository) and a destination path.
-r
Restores files as they were at a specific point in time. The <time> argument can be an absolute date/time string (e.g., '2023-10-27 10:00:00'), a relative time (e.g., '3D' for 3 days ago, '1W' for 1 week ago), an increment number, or 'now'.
--remove-older-than
Removes all incremental backup data (old file versions) from the repository that are older than the specified <time>. This helps manage disk space.
--force
Forces the backup or restore operation to proceed even if there are warnings or existing conflicts that would normally cause it to abort.
--exclude
Excludes files or directories matching the given glob pattern from the backup. This option can be specified multiple times for different patterns.
--include
Includes files or directories matching the given glob pattern, overriding any previous --exclude directives. This option can also be specified multiple times.
--exclude-device-files
Excludes device files (block and character special files) from the backup to prevent potential issues or large, empty files.
--exclude-special-files
Excludes special files (sockets, fifos, and device files) from the backup.
--preserve-numerical-ids
Preserves numerical User IDs (UIDs) and Group IDs (GIDs) instead of attempting to map them to usernames and group names on the destination system. Useful for maintaining permissions across systems with different user/group databases.
--print-statistics
Prints detailed statistics about the backup or restore operation, including number of files, total size, transfer rates, and changed files.
--server
Runs rdiff-backup in server mode. This is typically invoked automatically by a remote rdiff-backup client over SSH and not meant for direct user interaction.
--remote-schema
Specifies a custom command to use for establishing SSH connections to remote hosts. For example, ssh -C for compression or ssh -p 2222 for a different port.
-v, --verbose
Increases the verbosity level of output. Can be specified multiple times (e.g., -vv) for more detailed information.
--no-compression
Disables compression for the stored diff files in the backup repository. This can reduce CPU usage at the cost of increased disk space.
--no-hard-links
Treats hard links as separate files, rather than preserving their hard link relationship in the backup. This can increase storage but simplify restoration in some scenarios.
--tempdir
Specifies an alternative temporary directory for rdiff-backup to use during its operations. By default, it uses the system's temporary directory.
DESCRIPTION
rdiff-backup is a versatile and efficient backup utility for Linux and other Unix-like systems, widely appreciated for its unique "reverse incremental" backup approach. Unlike traditional incremental backup systems where the initial backup is full and subsequent backups store only changes, rdiff-backup maintains the most recent backup as a full, easily accessible snapshot. Older file versions are then stored as reverse diffs relative to the newest full backup. This design significantly simplifies and speeds up the restoration process for the latest version of files, as no diffs need to be applied.
It leverages the librsync algorithm to compute and store only the differences between files, optimizing disk space usage, especially for files that change little over time. rdiff-backup supports both local and remote backups, using SSH for secure network transfers. This makes it an excellent choice for backing up data to remote servers or network-attached storage. It faithfully preserves file permissions, ownership, symlinks, hard links, and other metadata, ensuring data integrity. Users can easily restore specific file versions or entire directories from any point in time for which backup data exists.
CAVEATS
Disk Space Usage: While rdiff-backup uses diffs, the repository can still grow significantly over time, especially if many files change frequently or if --remove-older-than is not used periodically to prune old increments.
Performance: Backups over slow network connections can be sluggish, particularly during the initial full backup. The librsync algorithm itself consumes CPU resources on both client and server during diff calculation.
Repository Integrity: Directly modifying or deleting files within the rdiff-backup-data directory of the repository can corrupt the backup chain and lead to irretrievable data loss. Always use rdiff-backup commands for managing the repository.
File Permissions and Attributes: While rdiff-backup aims to preserve all file attributes (permissions, ownership, ACLs, etc.), differences in filesystem capabilities or user/group IDs between source and destination systems can sometimes lead to discrepancies.
Requires librsync: rdiff-backup relies on the librsync library for its core differencing functionality, which must be installed on both the client and server for remote operations.
HOW IT WORKS
rdiff-backup operates on a unique "reverse incremental" principle. On the first run, it creates a complete, full backup of the source directory at the destination. For subsequent backups, it compares the current state of the source with the latest full backup already at the destination. Using the librsync algorithm, it calculates only the differences. These differences (reverse diffs) are then stored in a hidden rdiff-backup-data directory. Crucially, the destination's main directory is updated to become a new, full snapshot of the source's latest state. This means the most recent backup is always a readily accessible full copy, while older versions are reconstructed by applying the stored diffs.
RESTORATION PROCESS
Restoring the latest version of a file or an entire directory is extremely fast and straightforward because the most recent backup at the destination is always a full, uncompressed copy. To restore older versions, rdiff-backup intelligently applies the stored reverse diffs in reverse chronological order to reconstruct the file or directory as it existed at the specified historical point in time. This simplifies the restoration logic compared to traditional incremental systems where many diffs might need to be applied to reach the latest state.
REMOTE OPERATION
One of rdiff-backup's powerful features is its ability to perform backups to or from a remote host securely over SSH. When a remote path is specified (e.g., user@host::/path/to/destination), rdiff-backup automatically initiates an SSH connection and invokes an rdiff-backup --server process on the remote machine. This establishes a secure and efficient communication channel, allowing differential data transfers without needing manual configuration of network shares or complex setup on the remote side.
REPOSITORY STRUCTURE
The destination directory for an rdiff-backup repository contains two main parts: the actual, latest full backup of your data, and a hidden directory named rdiff-backup-data. The rdiff-backup-data directory stores all the historical reverse diffs, metadata about each backup session, file listings, and other internal information necessary to reconstruct older versions of files. It is critical to never manually modify or delete anything within the rdiff-backup-data directory, as this can easily corrupt the entire backup history. Management of old increments should always be done via rdiff-backup commands like --remove-older-than.
HISTORY
rdiff-backup was created by Ben Escoto and first released around 2002. It was designed to provide efficient incremental backups with an emphasis on easy restoration of the latest state, a common pain point with traditional incremental backup systems. Its innovation lay in the "reverse incremental" model, where the most recent backup is a full snapshot, and older versions are derived from it using diffs. This design quickly gained popularity for its practicality and robustness, especially combined with its support for remote backups over SSH. Over the years, it has seen continuous development and maintenance, adapting to newer Python versions and operating system environments, maintaining its position as a reliable tool in the Linux backup ecosystem.


