LinuxCommandLibrary

pg_upgrade

Upgrade PostgreSQL database cluster

TLDR

Check clusters before upgrading

$ pg_upgrade [[-c|--check]] [[-b|--old-bindir]] [path/to/old_bin] [[-B|--new-bindir]] [path/to/new_bin] [[-d|--old-datadir]] [path/to/old_data] [[-D|--new-datadir]] [path/to/new_data]
copy

Perform the actual upgrade
$ pg_upgrade [[-b|--old-bindir]] [path/to/old_bin] [[-B|--new-bindir]] [path/to/new_bin] [[-d|--old-datadir]] [path/to/old_data] [[-D|--new-datadir]] [path/to/new_data]
copy

Use multiple parallel jobs during the upgrade
$ pg_upgrade [[-j|--jobs]] [njobs] [[-b|--old-bindir]] [path/to/old_bin] [[-B|--new-bindir]] [path/to/new_bin] [[-d|--old-datadir]] [path/to/old_data] [[-D|--new-datadir]] [path/to/new_data]
copy

Specify the old and new PostgreSQL ports
$ pg_upgrade [[-p|--old-port]] [port] [[-P|--new-port]] [port] [[-b|--old-bindir]] [path/to/old_bin] [[-B|--new-bindir]] [path/to/new_bin] [[-d|--old-datadir]] [path/to/old_data] [[-D|--new-datadir]] [path/to/new_data]
copy

Use hard links instead of copying files to the new cluster
$ pg_upgrade [[-k|--link]] [[-b|--old-bindir]] [path/to/old_bin] [[-B|--new-bindir]] [path/to/new_bin] [[-d|--old-datadir]] [path/to/old_data] [[-D|--new-datadir]] [path/to/new_data]
copy

Display help
$ pg_upgrade [[-?|--help]]
copy

SYNOPSIS

pg_upgrade [option...]

PARAMETERS

-b old-bindir
    Specifies the absolute path to the old PostgreSQL cluster's bin directory (e.g., /usr/local/pg9.6/bin).

-B new-bindir
    Specifies the absolute path to the new PostgreSQL cluster's bin directory (e.g., /usr/local/pg14/bin).

-d old-datadir
    Specifies the absolute path to the data directory of the old PostgreSQL cluster.

-D new-datadir
    Specifies the absolute path to the data directory of the new PostgreSQL cluster. This directory must be empty, except possibly for pg_wal if initdb -X was used.

-j num-jobs
    Runs tasks in parallel using the specified number of concurrent jobs. This can significantly speed up file copy or link operations.

-k, --link, --hard-link
    Uses hard links instead of copying files. This is the default, fastest method, but requires old and new clusters to be on the same filesystem.

-m, --copy, --clone
    Copies files instead of using hard links. Slower than linking but allows relocating data directories to a different filesystem.

-r, --retain
    Retains the old cluster's data directory after a successful upgrade when using hard links (--link mode). Useful for backup or potential rollback.

-U username
    Specifies the database superuser name to use for connecting to both old and new clusters. Defaults to the operating system user.

-v, --verbose
    Enables verbose output, providing more detailed information during the upgrade process.

--check
    Performs all compatibility checks but does not modify any files or perform the actual upgrade. Highly recommended for a dry run before a real upgrade.

--progress
    Displays a progress bar during the file copy/link phase.

--old-options options
    Additional command-line options passed directly to the old postgres server when it starts (e.g., -c shared_buffers=128MB).

--new-options options
    Additional command-line options passed directly to the new postgres server when it starts.

--old-port port
    Specifies the port number the old cluster should listen on during the upgrade process. Defaults to 50432.

--new-port port
    Specifies the port number the new cluster should listen on during the upgrade process. Defaults to 50433.

--discard-old-cluster
    Removes the old cluster's data directory after a successful upgrade. Use with extreme caution and only after verifying the new cluster's integrity.

--locale locale
    Sets the locale for the new cluster, overriding the initdb default. Must match the old cluster's locale for compatibility.

DESCRIPTION

pg_upgrade is a powerful utility designed to facilitate in-place upgrades of PostgreSQL database clusters to a new major version. It efficiently transfers data files from an older PostgreSQL instance to a newly initialized, empty data directory of a newer version. Unlike the traditional pg_dumpall method, pg_upgrade avoids dumping and reloading all data, significantly reducing downtime and disk space requirements.

It achieves this by either creating hard links to existing data files (the default and fastest method) or by copying them. This process preserves the original data and configuration files, making it a robust and preferred method for many production environments. pg_upgrade performs extensive compatibility checks before initiating the upgrade, ensuring a smoother transition and providing guidance on any necessary pre- or post-upgrade steps.

CAVEATS

  • Environment Consistency: Both old and new PostgreSQL bin directories must be accessible and correctly specify the major versions being upgraded. The pg_upgrade binary itself must come from the new PostgreSQL version.
  • New Data Directory: The new cluster's data directory specified by -D must be empty, except for a potential pg_wal subdirectory if initdb was run with -X.
  • Custom C Extensions: Custom C functions, types, or aggregates defined in shared libraries (e.g., from CREATE EXTENSION modules) may need to be recompiled against the new PostgreSQL version and potentially re-created in the database.
  • Locale and Encoding: The locale (specifically LC_CTYPE and LC_COLLATE) and database encoding must match between the old and new clusters for a smooth upgrade.
  • contrib Modules: Some contrib modules might require reinstallation or specific upgrade steps. Always check documentation for particular extensions.
  • Post-Upgrade ANALYZE: It is critical to run ANALYZE on all databases after a successful pg_upgrade to update optimizer statistics for the new cluster. A script (analyze_new_cluster.sh) is generated for this purpose.
  • Cluster Shutdown: Both old and new clusters must be completely shut down before running pg_upgrade. The utility will attempt to start them briefly during checks.
  • Superuser Access: The operating system user running pg_upgrade must be able to connect as a superuser to both the old and new clusters without requiring a password (e.g., via peer authentication or .pgpass).

UPGRADE MODES

pg_upgrade supports two primary methods for transferring data:
Hard Links (--link): This is the default and fastest method. It creates hard links from the new cluster's data directory to the old cluster's data files. This means no actual data copying occurs, making it very quick, but requires the old and new data directories to reside on the same filesystem. The old cluster directories are retained by default unless --discard-old-cluster is specified.
File Copy (--copy): This method copies all data files from the old cluster to the new one. While slower and requiring more disk space, it allows the new cluster to reside on a different filesystem or partition than the old one. This mode is useful for migrating data to new storage or hardware.

POST-UPGRADE SCRIPTS

After a successful pg_upgrade operation, two or three scripts are generated in the new cluster's data directory:
analyze_new_cluster.sh: This script contains VACUUM ANALYZE commands for all databases in the new cluster. It is crucial to run this script to update optimizer statistics, ensuring optimal query performance.
delete_old_cluster.sh: This script helps remove the old cluster's data directory and related files, freeing up disk space. It should only be run after you have thoroughly tested the new cluster and are confident the upgrade was successful and you no longer need the old data.
relink_old_cluster.sh: (Only for --link mode) This script is generated if you used hard links and chose to retain the old cluster. It allows you to revert the hard links, making the old cluster runnable again if needed (e.g., for troubleshooting or if the upgrade failed after data modification in the new cluster).

HISTORY

The pg_upgrade utility was introduced in PostgreSQL 8.4 as a robust and efficient alternative to the traditional pg_dumpall method for major version upgrades. Prior to its existence, administrators often faced extended downtime and significant disk space requirements when migrating to a new PostgreSQL release. pg_upgrade dramatically reduced these burdens by leveraging file system links or copies, allowing for in-place upgrades with minimal data re-processing. Its development addressed a long-standing need for a faster, more reliable upgrade path in production environments, making major version migrations considerably less daunting.

SEE ALSO

initdb(1), pg_ctl(1), pg_dump(1), pg_dumpall(1)

Copied to clipboard