vcftools
Analyze genomic variant call format files
TLDR
Filter VCF file by chromosome and output new VCF
SYNOPSIS
vcftools [--vcf file | --gzvcf file | --bcf file] [--out prefix] [options]
DESCRIPTION
VCFtools is a suite of utilities for analyzing Variant Call Format (VCF) and Binary Call Format (BCF) files, the standard formats for storing genomic sequence variations. It provides comprehensive tools for filtering, manipulating, and computing statistics from variant data.
The tool supports filtering variants by quality scores, allele frequencies, missing data, genomic regions, and individual samples. It calculates population genetics statistics including allele frequencies, nucleotide diversity, Fst, linkage disequilibrium, and relatedness measures.
VCFtools can convert between formats, compare VCF files, and extract subsets of data for downstream analysis. Output files use the prefix specified by --out with appropriate extensions for each analysis type.
PARAMETERS
--vcf file
Input VCF file (v4.0, v4.1, or v4.2).--gzvcf file
Input compressed (gzipped) VCF file.--bcf file
Input BCF2 format file.--out prefix
Output file prefix. Results are written to prefix.extension.--recode
Output a new VCF file after applying filters.--recode-INFO-all
Retain all INFO fields in recoded output.--chr name
Process only variants on specified chromosome.--keep file
Retain only individuals listed in file (one ID per line).--remove file
Remove individuals listed in file.--maf float
Filter by minimum minor allele frequency.--minQ int
Minimum variant quality score.--freq
Calculate allele frequencies.--depth
Calculate mean depth per individual.--relatedness
Calculate pairwise relatedness statistics.--hap-r2
Calculate linkage disequilibrium statistics using phased haplotypes.
CAVEATS
Large VCF files can consume significant memory. Some operations require the input to be sorted by chromosome and position. Compressed files should use bgzip compression (not gzip) for optimal performance with indexing. Binary BCF format is faster for repeated analyses.
HISTORY
VCFtools was developed by Adam Auton and Anthony Marcketta at Cornell University, with the first release around 2011. It was created to address the need for efficient VCF manipulation as next-generation sequencing became widespread. The tool has become a standard in bioinformatics pipelines for variant analysis and quality control.
