tabix
Genomic position file indexer
TLDR
SYNOPSIS
tabix [options] file [region...]
DESCRIPTION
tabix is a generic indexer for TAB-delimited genome position files. It creates an index that enables fast retrieval of data lines overlapping specified genomic regions.Input files must be position-sorted and compressed with bgzip. The index file (.tbi or .csi) enables random access to compressed data without decompressing the entire file.Common applications include indexing VCF variant files, BED annotation files, and GFF/GTF gene annotation files. The tool is essential for working with large genomic datasets in bioinformatics pipelines.Region queries use 1-based inclusive coordinates in the format chr:start-end.
PARAMETERS
-p, --preset format
Input format preset: gff, bed, sam, vcf.-s, --sequence col
Column of sequence name (default: 1).-b, --begin col
Column of start position (default: 4).-e, --end col
Column of end position (default: 5).-S, --skip-lines n
Skip first n lines.-c, --comment char
Skip lines starting with character (default: #).-0, --zero-based
Positions are 0-based half-open.-C, --csi
Create CSI index instead of TBI.-f, --force
Overwrite existing index.-h, --print-header
Print header lines with output.-H, --only-header
Print only header/meta lines.-l, --list-chroms
List sequence names stored in the index file.-r, --reheader file
Replace the header with the content of file.-R, --regions file
Query regions from BED or TAB-delimited file.-T, --targets file
Similar to -R but reads input sequentially.-m, --min-shift INT
Set minimal interval size for CSI indices to 2^INT (default: 14).-D
Do not download index file before opening (remote files only).--separate-regions
Insert region name before each group in output.--cache INT
Set BGZF block cache size in megabytes (default: 10).
CAVEATS
Input must be bgzip-compressed, not gzip. TBI index format supports chromosomes up to 512 Mbp; use CSI (-C) for larger. Preset options cannot be combined with manual column specifications. The index stores column settings so retrieval doesn't need format specification.
HISTORY
tabix was developed by Heng Li and published in Bioinformatics journal in 2011. It is now part of the HTSlib project maintained by the samtools/htslib team. The tool has become a standard component in genomics workflows for efficient data access.
