pigz
Compress files using parallel gzip
TLDR
Compress a file with default options
Compress a file using the best compression method
Compress a file using no compression and 4 processors
Compress a directory using tar
Decompress a file
List the contents of an archive
SYNOPSIS
pigz [options] [files...]
PARAMETERS
-d, --decompress, --uncompress
Decompress. This is the default action if pigz is invoked as unpigz.
-z, --compress
Compress. This is the default action if no operation option is specified.
-c, --stdout, --to-stdout
Write output to standard output, keep original files unchanged. No progress indicator is displayed when writing to stdout.
-k, --keep
Keep (don't delete) input files during compression or decompression.
-f, --force
Force overwrite of existing output files. Also allows compressing symbolic links.
-p
Use NUM processors (threads) for compression/decompression. Default is the number of online processors. NUM=0 means use all online processors.
-r, --recursive
Recurse into directories. If input is a directory, pigz will compress all files within it.
-l, --list
List the contents of a compressed file. This option implies -d.
-t, --test
Test compressed file integrity. This option implies -d.
-q, --quiet
Suppress all warnings.
-v, --verbose
Enable verbose output. Displays information about compression ratios, processing files, etc.
-N, --name
When compressing, retain the original name and timestamp in the compressed file header.
-n, --no-name
When compressing, do not save the original name and timestamp.
-0 to -9
Compression level, where -1 is fastest compression (less compression ratio), and -9 is best compression (slowest, but smallest file size). Default is -6.
--fast
Equivalent to -1.
--best
Equivalent to -9.
-b
Set the block size for compression. Suffixes like K, M, G can be used (e.g., 128K). Default is 128K. Larger blocks can increase compression ratio but may use more memory.
-R, --rsyncable
Produce an rsync-friendly archive. This adds a performance penalty and results in a slightly larger compressed file, but allows rsync to efficiently update parts of the compressed file.
--suffix
Use SUFFIX instead of .gz as the compressed file name suffix.
-h, --help
Display help message and exit.
-V, --version
Display version information and exit.
DESCRIPTION
pigz is a parallel implementation of gzip that compresses and decompresses files using multiple processors and threads. While gzip is limited to a single CPU core, pigz leverages all available cores to significantly speed up compression and decompression operations, especially on large files or archives.
It is designed to be a drop-in replacement for gzip, producing a compatible gzip (.gz) file format, allowing files compressed by pigz to be decompressed by gzip and vice-versa. pigz achieves parallelism by splitting the input data into independent blocks, compressing each block concurrently, and then concatenating the compressed blocks. This makes it an invaluable tool in environments where high-performance data processing is critical, such as data centers, cloud computing, and big data applications.
CAVEATS
- pigz benefits most from multi-core processors and large files. For very small files, the overhead of parallelism might negate performance gains or even make it slightly slower than gzip.
- The speedup is also limited by disk I/O performance. If your disk cannot keep up with the processing speed, pigz may become I/O bound.
- The --rsyncable option is useful for incremental backups with rsync, but it incurs a performance overhead and can result in slightly larger compressed files.
ENVIRONMENT VARIABLE
The PIGZ environment variable can be used to specify default options for pigz. For example, PIGZ='-p 8' would make pigz always use 8 threads by default.
PARALLEL DECOMPRESSION
While pigz uses multiple threads for compression by splitting data into blocks, it can also parallelize decompression for files that were compressed by pigz itself (which include block headers). For files compressed by gzip or other tools without such headers, decompression remains largely single-threaded.
HISTORY
pigz was created by Mark Adler, a prominent figure in data compression (also a co-author of the zlib library and the gzip specification). It was first released around 2007-2008 to address the increasing demand for faster compression on multi-core systems, as the original gzip was inherently single-threaded.
Its development focused on making it a direct, high-performance replacement for gzip, maintaining file format compatibility while significantly reducing processing times by leveraging parallelism. This made it quickly adopted in environments handling large datasets where compression speed was a bottleneck.