duplicity
Backup and restore directories incrementally
TLDR
Backup a directory via FTPS to a remote machine, encrypting it with a password
Backup a directory to Amazon S3, doing a full backup every month
Delete versions older than 1 year from a backup stored on a WebDAV share
List the available backups
List the files in a backup stored on a remote machine, via SSH
Restore a subdirectory from a GnuPG-encrypted local backup to a given location
SYNOPSIS
duplicity [options] source_directory target_url
duplicity [options] target_url restore_directory
duplicity [options] <command> target_url
Common commands:
full: Perform a full backup.
incremental: Perform an incremental backup (default).
restore: Restore files from a backup.
collection-status: Show status of a backup chain.
cleanup: Remove redundant or incomplete backup sets.
PARAMETERS
--encrypt-key
Specify the GPG key ID for encryption. Essential for secure backups.
--sign-key
Specify the GPG key ID for signing the backup. Ensures data integrity and authenticity.
--exclude
Exclude files or directories matching the specified path pattern from the backup.
--include
Include files or directories matching the specified path pattern, even if previously excluded.
--full-if-older-than
Forces a full backup if the last full backup is older than the specified time (e.g., '1M' for 1 month).
--force
Force the execution of an action, even if it might seem unsafe.
--no-encryption
Disable GPG encryption for the backup. Use with caution, especially for remote storage.
--s3-aws-id
Specify the AWS Access Key ID for S3 backend authentication.
--s3-aws-secret-key
Specify the AWS Secret Access Key for S3 backend authentication.
--ssh-options
Pass specific options directly to the underlying SSH command for SCP/SFTP backends.
--gpg-options
Pass specific options directly to the GnuPG command.
--verbosity
Set the verbosity level of output messages (e.g., 'info', 'warning', 'error').
--dry-run
Simulate the backup or restore operation without making any actual changes.
--file-to-restore
During restore, specify a particular file or directory within the backup to restore.
--time
During restore, restore the state of files as they were at a specific time (e.g., '2023-01-01', 'now - 1D').
--remove-older-than
Remove backup sets older than the specified time from the target. Often used with 'cleanup'.
--list-current-files
List the files present in the latest full or incremental backup set on the target.
DESCRIPTION
duplicity is a powerful, free software backup utility that creates encrypted, digitally signed, incremental archives of files and directories. It leverages the rsync algorithm to efficiently store only the changes between backups, minimizing bandwidth and storage requirements. Backups can be stored on a variety of local or remote file servers, including SSH/SCP, FTP, Amazon S3, Google Drive, WebDAV, and local filesystems. It uses the GnuPG (GPG) framework for robust encryption and signing, ensuring data confidentiality and integrity.
The first backup is a full backup, and subsequent backups are incremental, storing only the differences relative to the previous backup. This chain of backups allows for efficient storage and restoration to any point in time covered by the backup chain. Its versatility and strong security features make it an excellent choice for off-site disaster recovery plans.
CAVEATS
Using duplicity effectively requires careful management of GnuPG keys; loss of encryption keys means permanent data loss. Ensure GPG agent is running and keys are accessible.
For incremental backups, all segments in a chain (from the last full backup) are necessary for a complete restore. Incomplete or corrupted segments can compromise the entire chain.
It typically requires significant temporary disk space on the local machine for caching and processing, potentially matching the size of the source data during full backups or restores.
While powerful, configuring specific backend URLs and authentication can sometimes be complex and requires understanding of the target storage system.
BACKEND URL FORMATS
The target_url specifies where backups are stored. Common formats include:
file:///path/to/local/dir: Local filesystem.
scp://user@host[:port]/path/to/dir: Secure Copy Protocol (SSH).
sftp://user@host[:port]/path/to/dir: SSH File Transfer Protocol.
ftp://user:password@host[:port]/path/to/dir: File Transfer Protocol.
ftps://user:password@host[:port]/path/to/dir: FTP over SSL/TLS.
s3://bucket_name[/path]: Amazon S3 cloud storage.
webdav://user:password@host[:port]/path/to/dir: Web Distributed Authoring and Versioning.
BACKUP CHAIN CONCEPT
duplicity builds a 'backup chain'. The first backup is a full backup. Subsequent backups are incremental, storing only the changes relative to the previous backup in the chain. To restore to any point in time, duplicity reconstructs the desired state by applying all relevant incremental changes on top of the last full backup. This chaining mechanism is highly efficient in terms of storage but means that all parts of the chain must be intact for successful restoration.
HISTORY
duplicity was initially developed by Kenneth Loafman, with its first public release appearing around 2001-2002. Written in Python, it quickly gained traction as a reliable and secure tool for managing off-site backups due to its unique combination of incremental backups, strong encryption, and digital signing. Its design philosophy of using standard tools like rsync and GnuPG for its core functionality contributed to its robustness and adoption. It has since been continuously maintained and updated by a community of developers, evolving to support new storage backends and features, cementing its position as a cornerstone in Linux system administration for secure data archival.