LinuxCommandLibrary

pg_checksums

Manage PostgreSQL data checksums

TLDR

Check data checksums in a database cluster

$ pg_checksums [[-D|--pgdata]] [path/to/data]
copy

Enable data checksums in a database cluster
$ pg_checksums [[-e|--enable]] [[-D|--pgdata]] [path/to/data]
copy

Disable data checksums in a database cluster
$ pg_checksums [[-d|--disable]] [[-D|--pgdata]] [path/to/data]
copy

Check data checksums with verbose output
$ pg_checksums [[-v|--verbose]] [[-D|--pgdata]] [path/to/data]
copy

Check data checksums showing progress
$ pg_checksums [[-P|--progress]] [[-D|--pgdata]] [path/to/data]
copy

Display help
$ pg_checksums [[-?|--help]]
copy

SYNOPSIS

pg_checksums {-D DATADIR | --pgdata=DATADIR} {-e | --enable} [-f | --force] [-v | --verbose]
pg_checksums {-D DATADIR | --pgdata=DATADIR} {-d | --disable} [-f | --force] [-v | --verbose]
pg_checksums {-D DATADIR | --pgdata=DATADIR} {-c | --check} [-v | --verbose]
pg_checksums {-V | --version}
pg_checksums {-? | --help}

PARAMETERS

--pgdata=DATADIR, -D DATADIR
    Required. Specifies the path to the PostgreSQL data directory. This must be provided for all operations.

--enable, -e
    Enables data checksums for the cluster. This operation requires the cluster to be shut down before execution.

--disable, -d
    Disables data checksums for the cluster. This operation also requires the cluster to be shut down.

--check, -c
    Verifies the integrity of existing data checksums throughout the cluster. This operation can be performed while the cluster is running, but may cause I/O load.

--force, -f
    Forces the `enable` or `disable` operation without prompting for user confirmation. Use with caution.

--verbose, -v
    Provides more detailed output during the command execution, showing progress or specific actions.

--version, -V
    Displays the `pg_checksums` version information and exits.

--help, -?
    Shows help about `pg_checksums` command line arguments and exits.

DESCRIPTION

`pg_checksums` is a utility for managing and verifying data checksums within a PostgreSQL database cluster. Data checksums provide an additional layer of data integrity, helping to detect corruption caused by underlying storage system issues. This command allows users to enable, disable, or check the validity of checksums for an entire PostgreSQL data directory.

When enabling or disabling checksums, the utility rewrites all data files in the cluster, which can be a time-consuming operation depending on the size of the cluster. It is crucial that the PostgreSQL server is not running when enabling or disabling checksums. Verifying checksums, however, can be performed while the server is online, though it may incur significant I/O load. Checksums are stored directly within the data pages, and once enabled, they are automatically maintained by the PostgreSQL server for all future data writes.

CAVEATS

  • Cluster Downtime Required: Enabling or disabling checksums (`-e` or `-d`) requires the PostgreSQL server to be completely shut down before running `pg_checksums`. Running these operations on a live cluster will result in errors and potential data corruption.
  • Significant I/O and Time: Both enabling and disabling checksums involve rewriting all data files in the cluster. This is an I/O-intensive and potentially time-consuming operation, especially for large databases. Ensure adequate disk space and time are available.
  • Disk Space: The process of enabling/disabling checksums may temporarily require additional disk space as files are rewritten.
  • Not a Full Backup Replacement: While checksums help detect corruption, they do not prevent it or recover from it. They are not a substitute for a robust backup strategy.
  • Performance Impact: While typically minimal, enabling checksums can introduce a slight overhead due to the additional calculation and storage, though modern systems often mitigate this effectively.

EXIT STATUS

`pg_checksums` exits with status 0 if successful, and a non-zero status if an error occurs, providing an indication of the operation's outcome.

PRIVILEGES

The user running `pg_checksums` must have appropriate read and write permissions on the data directory (and its contents) when enabling or disabling checksums, typically the same user that runs the PostgreSQL server process.

CONFIGURATION FILE

The status of checksums for a cluster is recorded in the `postgresql.conf` file (and internally in the control file), where `data_checksums` parameter indicates if they are enabled.

HISTORY

Data checksums were first introduced in PostgreSQL 9.3 as an option during the `initdb` process, allowing users to enable them for new clusters from the start. However, changing the checksum status of an existing cluster (i.e., enabling or disabling them after `initdb`) was not straightforward and typically required a dump/restore operation or using `pg_upgrade` to a new cluster. The `pg_checksums` utility was introduced in PostgreSQL 12 to provide a dedicated, simplified, and in-place method for managing data checksums on existing clusters, significantly improving the flexibility for administrators to enforce data integrity policies.

SEE ALSO

initdb(1): Used to create a new PostgreSQL data cluster, where checksums can be optionally enabled during initialization., pg_basebackup(1): Used to take base backups of a PostgreSQL cluster; backups will naturally include checksum information if enabled on the source., pg_upgrade(1): Used to upgrade a PostgreSQL cluster to a newer major version; often involves creating a new cluster which provides an opportunity to change checksum status.

Copied to clipboard