bgzip
Block compression with random access support for genomics data
TLDR
Compress file
SYNOPSIS
bgzip [options] [file]
DESCRIPTION
bgzip is a block compression tool that creates gzip-compatible files with internal indexing support. Unlike standard gzip, bgzip compresses data in blocks, allowing random access to specific regions when combined with a .gzi index.
The tool is part of htslib and commonly used for genomics data files (VCF, SAM, BED) enabling indexed access.
PARAMETERS
-d, --decompress
Decompress file-c, --stdout
Write to standard output-@, --threads n
Number of threads-r, --reindex
Rebuild .gzi index-b, --offset n
Virtual file offset for random access-s, --size n
Size to extract (with -b)-l, --compress-level n
Compression level (0-9)
FEATURES
- Block-based compression
- Random access support
- gzip-compatible format
- Multi-threaded compression
- Index generation (.gzi files)
- Streaming support
WORKFLOW
bgzip variants.vcf
# Creates: variants.vcf.gz
# Decompress
bgzip -d variants.vcf.gz
# Compress with 4 threads
bgzip -@ 4 large.vcf
# Random access (requires .gzi index)
bgzip -b 1000 -s 500 file.vcf.gz
USE WITH TABIX
bgzip file.vcf
tabix -p vcf file.vcf.gz
# Now tools can query regions
tabix file.vcf.gz chr1:1000-2000
CAVEATS
Slightly larger files than maximum gzip compression. Requires .gzi index for random access. Not all gzip tools recognize block structure. Primarily useful for genomics applications.
HISTORY
bgzip was developed as part of SAMtools/htslib around 2009 to enable efficient random access to compressed genomics data files.
