tabix
Genomic position file indexer
TLDR
Index a VCF file
SYNOPSIS
tabix [options] file [region...]
DESCRIPTION
tabix is a generic indexer for TAB-delimited genome position files. It creates an index that enables fast retrieval of data lines overlapping specified genomic regions.
Input files must be position-sorted and compressed with bgzip. The index file (.tbi or .csi) enables random access to compressed data without decompressing the entire file.
Common applications include indexing VCF variant files, BED annotation files, and GFF/GTF gene annotation files. The tool is essential for working with large genomic datasets in bioinformatics pipelines.
Region queries use 1-based inclusive coordinates in the format chr:start-end.
PARAMETERS
-p, --preset format
Input format preset: gff, bed, sam, vcf.-s, --sequence col
Column of sequence name (default: 1).-b, --begin col
Column of start position (default: 4).-e, --end col
Column of end position (default: 5).-S, --skip-lines n
Skip first n lines.-c char
Skip lines starting with character (default: #).-0, --zero-based
Positions are 0-based half-open.-C, --csi
Create CSI index instead of TBI.-f, --force
Overwrite existing index.-h, --print-header
Print header lines with output.-H
Print only header lines.-l
List chromosome names in index.-R file
Query regions from BED or TAB file.--separate-regions
Print region name before each group.
CAVEATS
Input must be bgzip-compressed, not gzip. TBI index format supports chromosomes up to 512 Mbp; use CSI (-C) for larger. Preset options cannot be combined with manual column specifications. The index stores column settings so retrieval doesn't need format specification.
HISTORY
tabix was developed by Heng Li and published in Bioinformatics journal in 2011. It is now part of the HTSlib project maintained by the samtools/htslib team. The tool has become a standard component in genomics workflows for efficient data access.
