bcftools
SYNOPSIS
bcftools command [options] [arguments]
General options:
bcftools [-?] [-h] [--version] [-l] [-v] [options]
bcftools command --help
Note: bcftools is a collection of subcommands; each subcommand has its own specific set of options and arguments.
PARAMETERS
call
Performs variant calling using a consensus model from alignment data.
view
Views, filters, and converts VCF/BCF files between formats or applies basic filters.
filter
Filters sites and samples using a flexible expression language based on annotations or genotypes.
merge
Merges multiple VCF/BCF files that contain variants from the same samples but potentially different regions.
concat
Concatenates multiple VCF/BCF files from different regions or chromosomes, assuming sorted inputs.
query
Extracts fields from VCF/BCF files into a tabular format, similar to 'awk'.
stats
Calculates and displays various statistics from VCF/BCF files, useful for quality control.
norm
Normalizes indels and splits multi-allelic sites into separate bi-allelic records.
index
Creates or checks a VCF/BCF index (CSI or TBI) for fast random access.
annotate
Adds, removes, or modifies annotations within VCF/BCF files based on external sources or expressions.
gtcheck
Checks sample identity or relatedness between VCFs and BAMs.
consensus
Generates a consensus sequence from a VCF/BCF file and a reference genome.
mpileup
Generates genotype likelihoods from alignment files (BAM/CRAM) for variant calling.
sort
Sorts VCF/BCF files by coordinate.
regions
Manipulates and processes genomic regions or intervals for other commands.
DESCRIPTION
bcftools is a suite of utilities for working with VCF (Variant Call Format) and BCF (Binary Call Format) files, which are standard formats for storing genomic variations. Developed as part of the SAMtools project, bcftools provides a comprehensive set of functions for manipulating, filtering, merging, calling, and analyzing variant data. It's an indispensable tool in genomics for tasks ranging from basic file inspection and reformatting to complex variant calling and population genetics analysis. Its integration with SAMtools and HTSlib libraries ensures efficient handling of large genomic datasets. Key functionalities include viewing and converting file formats, filtering variants based on various criteria, merging multiple VCFs, performing genotype likelihood-based variant calling, and calculating statistics.
CAVEATS
bcftools can be complex due to its many subcommands and their specific options. Users should be mindful of input file formats (VCF/BCF) and ensure proper indexing for efficient processing of large datasets, especially when using region-based operations. Memory usage can be significant for extensive filtering or very large VCF/BCF files. Incorrect usage of filter expressions can inadvertently remove desired variants. The `call` subcommand's parameters often require careful tuning depending on the sequencing technology and dataset characteristics.
FILE FORMATS
bcftools primarily works with VCF (Variant Call Format) and BCF (Binary Call Format) files. BCF is a highly compressed, binary equivalent of VCF, offering significant advantages in terms of file size and parsing speed, particularly crucial for large-scale genomic datasets. bcftools can seamlessly convert between these two formats, usually favoring BCF for internal processing efficiency.
INDEXING
For efficient random access to specific genomic regions or samples, BCF/VCF files must be indexed. bcftools supports `CSI` (Coordinate-sorted index) or `TBI` (Tabix index) formats, which can be created using the `bcftools index` subcommand. Operations targeting specific regions (e.g., using the `-r` or `-R` options) are highly dependent on the presence of a valid and up-to-date index for optimal performance.
EXPRESSION LANGUAGE
Many bcftools subcommands, such as `filter` and `view`, utilize a powerful and flexible expression language. This allows users to select or exclude variants based on various criteria derived from annotations (INFO fields), genotype information (FORMAT fields), or other VCF fields. This robust filtering capability is essential for quality control and selecting specific subsets of variants for downstream analysis.
HISTORY
bcftools originated as part of the samtools project, initially developed by Heng Li. It started as the variant calling and processing functionalities within `samtools`, evolving from features like `pileup` and `call`. As the scope and complexity of variant data analysis grew, it was separated into its own dedicated suite of tools for VCF and BCF files. This allowed for more focused development and optimization for variant manipulation, while remaining tightly integrated with the core HTSlib library. It continues to be actively developed and maintained by the HTSlib community.


