LinuxCommandLibrary

pigz

Compress files using parallel gzip

TLDR

Compress a file with default options

$ pigz [path/to/file]
copy

Compress a file using the best compression method
$ pigz [[-9|--best]] [path/to/file]
copy

Compress a file using no compression and 4 processors
$ pigz -0 [[-p|--processes]] [4] [path/to/file]
copy

Compress a directory using tar
$ tar cf - [path/to/directory] | pigz > [path/to/file.tar.gz]
copy

Decompress a file
$ pigz [[-d|--decompress]] [archive.gz]
copy

List the contents of an archive
$ pigz [[-l|--list]] [archive.tar.gz]
copy

SYNOPSIS

pigz [options] [files...]

PARAMETERS

-d, --decompress, --uncompress
    Decompress. This is the default action if pigz is invoked as unpigz.

-z, --compress
    Compress. This is the default action if no operation option is specified.

-c, --stdout, --to-stdout
    Write output to standard output, keep original files unchanged. No progress indicator is displayed when writing to stdout.

-k, --keep
    Keep (don't delete) input files during compression or decompression.

-f, --force
    Force overwrite of existing output files. Also allows compressing symbolic links.

-p , --processes , --threads
    Use NUM processors (threads) for compression/decompression. Default is the number of online processors. NUM=0 means use all online processors.

-r, --recursive
    Recurse into directories. If input is a directory, pigz will compress all files within it.

-l, --list
    List the contents of a compressed file. This option implies -d.

-t, --test
    Test compressed file integrity. This option implies -d.

-q, --quiet
    Suppress all warnings.

-v, --verbose
    Enable verbose output. Displays information about compression ratios, processing files, etc.

-N, --name
    When compressing, retain the original name and timestamp in the compressed file header.

-n, --no-name
    When compressing, do not save the original name and timestamp.

-0 to -9
    Compression level, where -1 is fastest compression (less compression ratio), and -9 is best compression (slowest, but smallest file size). Default is -6.

--fast
    Equivalent to -1.

--best
    Equivalent to -9.

-b , --blocksize
    Set the block size for compression. Suffixes like K, M, G can be used (e.g., 128K). Default is 128K. Larger blocks can increase compression ratio but may use more memory.

-R, --rsyncable
    Produce an rsync-friendly archive. This adds a performance penalty and results in a slightly larger compressed file, but allows rsync to efficiently update parts of the compressed file.

--suffix
    Use SUFFIX instead of .gz as the compressed file name suffix.

-h, --help
    Display help message and exit.

-V, --version
    Display version information and exit.

DESCRIPTION

pigz is a parallel implementation of gzip that compresses and decompresses files using multiple processors and threads. While gzip is limited to a single CPU core, pigz leverages all available cores to significantly speed up compression and decompression operations, especially on large files or archives.


It is designed to be a drop-in replacement for gzip, producing a compatible gzip (.gz) file format, allowing files compressed by pigz to be decompressed by gzip and vice-versa. pigz achieves parallelism by splitting the input data into independent blocks, compressing each block concurrently, and then concatenating the compressed blocks. This makes it an invaluable tool in environments where high-performance data processing is critical, such as data centers, cloud computing, and big data applications.

CAVEATS

  • pigz benefits most from multi-core processors and large files. For very small files, the overhead of parallelism might negate performance gains or even make it slightly slower than gzip.
  • The speedup is also limited by disk I/O performance. If your disk cannot keep up with the processing speed, pigz may become I/O bound.
  • The --rsyncable option is useful for incremental backups with rsync, but it incurs a performance overhead and can result in slightly larger compressed files.

ENVIRONMENT VARIABLE

The PIGZ environment variable can be used to specify default options for pigz. For example, PIGZ='-p 8' would make pigz always use 8 threads by default.

PARALLEL DECOMPRESSION

While pigz uses multiple threads for compression by splitting data into blocks, it can also parallelize decompression for files that were compressed by pigz itself (which include block headers). For files compressed by gzip or other tools without such headers, decompression remains largely single-threaded.

HISTORY

pigz was created by Mark Adler, a prominent figure in data compression (also a co-author of the zlib library and the gzip specification). It was first released around 2007-2008 to address the increasing demand for faster compression on multi-core systems, as the original gzip was inherently single-threaded.


Its development focused on making it a direct, high-performance replacement for gzip, maintaining file format compatibility while significantly reducing processing times by leveraging parallelism. This made it quickly adopted in environments handling large datasets where compression speed was a bottleneck.

SEE ALSO

gzip(1), gunzip(1), zcat(1), tar(1), xz(1), bzip2(1)

Copied to clipboard