LinuxCommandLibrary

pg_basebackup

Create PostgreSQL cluster base backup

TLDR

Take a base backup from a remote PostgreSQL server

$ pg_basebackup [[-h|--host]] [host] [[-D|--pgdata]] [path/to/backup_dir]
copy

Take a backup with progress shown
$ pg_basebackup [[-h|--host]] [host] [[-D|--pgdata]] [path/to/backup_dir] [[-P|--progress]]
copy

Create a compressed backup (gzip) in tar format
$ pg_basebackup [[-D|--pgdata]] [path/to/backup_dir] [[-F|--format]] [[t|tar]] [[-z|--gzip]]
copy

Create an incremental backup using a previous manifest file
$ pg_basebackup [[-D|--pgdata]] [path/to/backup_dir] [[-i|--incremental]] [path/to/old_manifest]
copy

Write a recovery configuration for setting up a standby
$ pg_basebackup [[-D|--pgdata]] [path/to/backup_dir] [[-R|--write-recovery-conf]]
copy

Relocate a tablespace during backup
$ pg_basebackup [[-D|--pgdata]] [path/to/backup_dir] [[-T|--tablespace-mapping]] [path/to/old_tablespace]=[path/to/new_tablespace]
copy

Limit transfer rate to reduce server load
$ pg_basebackup [[-D|--pgdata]] [path/to/backup_dir] [[-r|--max-rate]] [100M]
copy

Stream WAL logs while taking the backup
$ pg_basebackup [[-D|--pgdata]] [path/to/backup_dir] [[-X|--wal-method]] stream
copy

SYNOPSIS


pg_basebackup -D directory [-F {p|t}] [-X {n|f|s}] [-z|-Z level] [OPTION]...

PARAMETERS

-D, --pgdata=directory
    Sets the target directory for the backup. This is a mandatory option.

-F, --format=format
    Sets the output format. p for plain (default), t for tar. With tar, all files are written into tar archives.

-X, --wal-method=method
    Includes the required WAL files in the backup. none (default) excludes WAL. fetch collects WAL files after the backup. stream streams WAL files during the backup process.

-z, --gzip
    Enables gzip compression for tar files. Requires -F t.

-Z, --compress=level
    Enables gzip compression with a specified compression level (0-9). Requires -F t.

-S, --slot=slotname
    Requires a replication slot with the specified name to be used for streaming WAL. This ensures the master retains WAL files until the backup is complete.

-l, --label=label
    Sets the label for the backup (e.g., 'my_daily_backup'). This is written to the backup_label file.

-P, --progress
    Enables progress reporting during the backup process, showing the amount of data transferred.

-v, --verbose
    Enables verbose output, displaying more details about the operation.

-h, --host=hostname
    Specifies the host name of the machine on which the server is running. Defaults to a local Unix domain socket if available, otherwise localhost.

-p, --port=port
    Specifies the TCP port or local Unix domain socket file extension on which the server is listening for connections.

-U, --username=username
    User name to connect as. Requires replication privileges.

-T, --tablespace-mapping=olddir=newdir
    Relocates a tablespace from olddir to newdir in the backup. Can be specified multiple times.

--checkpoint=method
    Sets the checkpoint method. fast (default) for minimal delay, spread to spread disk I/O over a longer time.

--no-sync
    Disables syncing of files to disk. This can be faster but risks data corruption if the system crashes during backup. Use with extreme caution.

--max-rate=rate
    Limits the data transfer rate (in KB/s) for the backup. Useful to prevent `pg_basebackup` from consuming too much network bandwidth or disk I/O.

--verify-checksums
    Verifies data checksums during the backup process, if checksums are enabled on the server. Helps detect data corruption.

--dry-run
    Performs all actions except actually writing the backup files. Useful for testing connectivity and permissions.

DESCRIPTION


pg_basebackup is a utility used to take a consistent base backup of a running PostgreSQL database cluster. It makes a binary copy of the cluster's data directory, ensuring all necessary files for a full data recovery are included. This command is an essential tool for setting up Point-in-Time Recovery (PITR) and for creating standby servers for replication. It connects to the target PostgreSQL server, which must be running, and streams the data directory contents, optionally including or streaming the Write-Ahead Log (WAL) files. Backups can be taken in either plain file format, which mirrors the server's data directory structure, or tar format, where files are compressed into tar archives. It significantly simplifies the process of creating a consistent backup compared to manual file system copy methods, as it handles the complexities of ensuring data consistency during the backup operation.

CAVEATS


pg_basebackup requires the PostgreSQL server to be running and the connecting user to have replication privileges (or be a superuser). It generally does not back up configuration files that are stored outside the main PGDATA directory, such as a custom postgresql.conf if symlinked elsewhere, nor does it typically include logs. While it creates a consistent backup, it's crucial to ensure sufficient disk space at the destination. The `--write-recovery-conf` option, used in older PostgreSQL versions to automatically create a recovery.conf file, is deprecated and effectively removed in PostgreSQL 12 and later, replaced by the standby.signal file and configuration within postgresql.conf.

USAGE IN POINT-IN-TIME RECOVERY (PITR)

pg_basebackup forms the foundation of a robust PITR strategy. A base backup provides a consistent starting point, which is then combined with a continuous stream of archived Write-Ahead Log (WAL) files. To restore to a specific point in time, the base backup is first restored, and then the archived WAL files are replayed up to the desired recovery target. Using pg_basebackup -X stream or -X fetch ensures that all WAL segments necessary to make the base backup consistent are either included or streamed, making the backup self-contained and ready for recovery.

REPLICATION SLOTS FOR RELIABLE BACKUPS

When taking a base backup, especially for creating a new standby server or ensuring WAL retention for PITR, it is highly recommended to use the --slot=slotname option. A replication slot ensures that the primary server does not remove WAL files that are still needed by the base backup process or a subsequent recovery. This prevents scenarios where the primary might truncate WAL segments before they have been successfully included in the backup or processed by a replica, thus avoiding backup failures or data inconsistencies. The slot should be created on the primary server before initiating the backup.

HISTORY


The pg_basebackup utility was introduced with PostgreSQL 9.1 in 2011. Before its introduction, users typically performed base backups by manually copying the data directory (e.g., using rsync or tar) and ensuring WAL archiving was correctly set up. pg_basebackup streamlined this complex process by providing an integrated, online method for creating consistent file-system-level backups, significantly improving the ease and reliability of setting up replication and Point-in-Time Recovery. Its development focused on making PostgreSQL disaster recovery and standby server provisioning more robust and user-friendly.

SEE ALSO

Copied to clipboard