LinuxCommandLibrary

vcftools

Analyze genomic variant call format files

TLDR

Filter VCF file by chromosome and output new VCF

$ vcftools --vcf [input.vcf] --chr [chr1] --recode --out [output]
copy
Calculate allele frequency
$ vcftools --vcf [input.vcf] --freq --out [output]
copy
Extract specific individuals
$ vcftools --vcf [input.vcf] --keep [individuals.txt] --recode --out [output]
copy
Filter by minimum quality score
$ vcftools --vcf [input.vcf] --minQ [30] --recode --out [output]
copy
Calculate depth statistics per individual
$ vcftools --vcf [input.vcf] --depth --out [output]
copy
Filter by minor allele frequency
$ vcftools --vcf [input.vcf] --maf [0.05] --recode --out [output]
copy
Read compressed VCF file
$ vcftools --gzvcf [input.vcf.gz] --freq --out [output]
copy

SYNOPSIS

vcftools [--vcf file | --gzvcf file | --bcf file] [--out prefix] [options]

DESCRIPTION

VCFtools is a suite of utilities for analyzing Variant Call Format (VCF) and Binary Call Format (BCF) files, the standard formats for storing genomic sequence variations. It provides comprehensive tools for filtering, manipulating, and computing statistics from variant data.
The tool supports filtering variants by quality scores, allele frequencies, missing data, genomic regions, and individual samples. It calculates population genetics statistics including allele frequencies, nucleotide diversity, Fst, linkage disequilibrium, and relatedness measures.
VCFtools can convert between formats, compare VCF files, and extract subsets of data for downstream analysis. Output files use the prefix specified by --out with appropriate extensions for each analysis type.

PARAMETERS

--vcf file

Input VCF file (v4.0, v4.1, or v4.2).
--gzvcf file
Input compressed (gzipped) VCF file.
--bcf file
Input BCF2 format file.
--out prefix
Output file prefix. Results are written to prefix.extension.
--recode
Output a new VCF file after applying filters.
--recode-INFO-all
Retain all INFO fields in recoded output.
--chr name
Process only variants on specified chromosome.
--keep file
Retain only individuals listed in file (one ID per line).
--remove file
Remove individuals listed in file.
--maf float
Filter by minimum minor allele frequency.
--minQ int
Minimum variant quality score.
--freq
Calculate allele frequencies.
--depth
Calculate mean depth per individual.
--relatedness
Calculate pairwise relatedness statistics.
--hap-r2
Calculate linkage disequilibrium statistics using phased haplotypes.

CAVEATS

Large VCF files can consume significant memory. Some operations require the input to be sorted by chromosome and position. Compressed files should use bgzip compression (not gzip) for optimal performance with indexing. Binary BCF format is faster for repeated analyses.

HISTORY

VCFtools was developed by Adam Auton and Anthony Marcketta at Cornell University, with the first release around 2011. It was created to address the need for efficient VCF manipulation as next-generation sequencing became widespread. The tool has become a standard in bioinformatics pipelines for variant analysis and quality control.

SEE ALSO

bcftools(1), tabix(1), bgzip(1), samtools(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard

> TERMINAL_GEAR

Curated for the Linux community