pg_basebackup
Create PostgreSQL cluster base backup
TLDR
Take a base backup from a remote PostgreSQL server
Take a backup with progress shown
Create a compressed backup (gzip) in tar format
Create an incremental backup using a previous manifest file
Write a recovery configuration for setting up a standby
Relocate a tablespace during backup
Limit transfer rate to reduce server load
Stream WAL logs while taking the backup
SYNOPSIS
pg_basebackup -D directory [-F {p|t}] [-X {n|f|s}] [-z|-Z level] [OPTION]...
PARAMETERS
-D, --pgdata=directory
Sets the target directory for the backup. This is a mandatory option.
-F, --format=format
Sets the output format. p for plain (default), t for tar. With tar, all files are written into tar archives.
-X, --wal-method=method
Includes the required WAL files in the backup. none (default) excludes WAL. fetch collects WAL files after the backup. stream streams WAL files during the backup process.
-z, --gzip
Enables gzip compression for tar files. Requires -F t.
-Z, --compress=level
Enables gzip compression with a specified compression level (0-9). Requires -F t.
-S, --slot=slotname
Requires a replication slot with the specified name to be used for streaming WAL. This ensures the master retains WAL files until the backup is complete.
-l, --label=label
Sets the label for the backup (e.g., 'my_daily_backup'). This is written to the backup_label file.
-P, --progress
Enables progress reporting during the backup process, showing the amount of data transferred.
-v, --verbose
Enables verbose output, displaying more details about the operation.
-h, --host=hostname
Specifies the host name of the machine on which the server is running. Defaults to a local Unix domain socket if available, otherwise localhost.
-p, --port=port
Specifies the TCP port or local Unix domain socket file extension on which the server is listening for connections.
-U, --username=username
User name to connect as. Requires replication privileges.
-T, --tablespace-mapping=olddir=newdir
Relocates a tablespace from olddir to newdir in the backup. Can be specified multiple times.
--checkpoint=method
Sets the checkpoint method. fast (default) for minimal delay, spread to spread disk I/O over a longer time.
--no-sync
Disables syncing of files to disk. This can be faster but risks data corruption if the system crashes during backup. Use with extreme caution.
--max-rate=rate
Limits the data transfer rate (in KB/s) for the backup. Useful to prevent `pg_basebackup` from consuming too much network bandwidth or disk I/O.
--verify-checksums
Verifies data checksums during the backup process, if checksums are enabled on the server. Helps detect data corruption.
--dry-run
Performs all actions except actually writing the backup files. Useful for testing connectivity and permissions.
DESCRIPTION
pg_basebackup is a utility used to take a consistent base backup of a running PostgreSQL database cluster. It makes a binary copy of the cluster's data directory, ensuring all necessary files for a full data recovery are included. This command is an essential tool for setting up Point-in-Time Recovery (PITR) and for creating standby servers for replication. It connects to the target PostgreSQL server, which must be running, and streams the data directory contents, optionally including or streaming the Write-Ahead Log (WAL) files. Backups can be taken in either plain file format, which mirrors the server's data directory structure, or tar format, where files are compressed into tar archives. It significantly simplifies the process of creating a consistent backup compared to manual file system copy methods, as it handles the complexities of ensuring data consistency during the backup operation.
CAVEATS
pg_basebackup requires the PostgreSQL server to be running and the connecting user to have replication privileges (or be a superuser). It generally does not back up configuration files that are stored outside the main PGDATA directory, such as a custom postgresql.conf if symlinked elsewhere, nor does it typically include logs. While it creates a consistent backup, it's crucial to ensure sufficient disk space at the destination. The `--write-recovery-conf` option, used in older PostgreSQL versions to automatically create a recovery.conf file, is deprecated and effectively removed in PostgreSQL 12 and later, replaced by the standby.signal file and configuration within postgresql.conf.
USAGE IN POINT-IN-TIME RECOVERY (PITR)
pg_basebackup forms the foundation of a robust PITR strategy. A base backup provides a consistent starting point, which is then combined with a continuous stream of archived Write-Ahead Log (WAL) files. To restore to a specific point in time, the base backup is first restored, and then the archived WAL files are replayed up to the desired recovery target. Using pg_basebackup -X stream or -X fetch ensures that all WAL segments necessary to make the base backup consistent are either included or streamed, making the backup self-contained and ready for recovery.
REPLICATION SLOTS FOR RELIABLE BACKUPS
When taking a base backup, especially for creating a new standby server or ensuring WAL retention for PITR, it is highly recommended to use the --slot=slotname option. A replication slot ensures that the primary server does not remove WAL files that are still needed by the base backup process or a subsequent recovery. This prevents scenarios where the primary might truncate WAL segments before they have been successfully included in the backup or processed by a replica, thus avoiding backup failures or data inconsistencies. The slot should be created on the primary server before initiating the backup.
HISTORY
The pg_basebackup utility was introduced with PostgreSQL 9.1 in 2011. Before its introduction, users typically performed base backups by manually copying the data directory (e.g., using rsync or tar) and ensuring WAL archiving was correctly set up. pg_basebackup streamlined this complex process by providing an integrated, online method for creating consistent file-system-level backups, significantly improving the ease and reliability of setting up replication and Point-in-Time Recovery. Its development focused on making PostgreSQL disaster recovery and standby server provisioning more robust and user-friendly.
SEE ALSO
pg_dump(1), pg_restore(1), pg_rewind(1), pg_receivewal(1), pg_ctl(1)


