samtools
Manipulate SAM/BAM/CRAM sequence alignment files
TLDR
View BAM file
SYNOPSIS
samtools command [-b] [-o output] [-@ threads] [options] [file] [region]
DESCRIPTION
samtools manipulates alignments in SAM (Sequence Alignment/Map) format and its binary equivalent BAM. It's essential for next-generation sequencing data analysis.
SAM/BAM files contain sequence reads aligned to reference genomes. Each record includes read name, position, mapping quality, CIGAR string (alignment operations), and optional tags.
The view command converts between formats and filters alignments. Sorted BAM files with indices enable random access to genomic regions. Most downstream tools require sorted, indexed BAM.
Statistics commands (flagstat, stats, idxstats) summarize alignment characteristics: mapping rates, insert sizes, coverage distributions. These quality metrics guide analysis decisions.
Pileup output (mpileup) aggregates alignments at each position for variant calling. Coverage commands calculate read depth across regions.
CRAM format provides better compression than BAM with reference-based encoding. Samtools handles CRAM transparently.
PARAMETERS
view
View/convert SAM/BAM/CRAM.sort
Sort alignments.index
Create BAM index.merge
Merge sorted files.flagstat
Statistics from FLAG field.stats
Comprehensive statistics.idxstats
Per-reference statistics.faidx
Index FASTA file.depth
Compute depth at each position.mpileup
Generate pileup for variants.coverage
Calculate coverage statistics.fastq
Extract FASTQ from BAM.-b
Output BAM format.-S
Input is SAM (deprecated, auto-detected).-o FILE
Output file.-@ NUM, --threads NUM
Number of threads.-f FLAGS
Only include reads with FLAGS.-F FLAGS
Exclude reads with FLAGS.-q MAPQ
Minimum mapping quality.-h
Include header.
CAVEATS
Large BAM files require significant memory for some operations. Threading helps but some commands are single-threaded. Unsorted BAM limits available operations. Index required for random access. Reference needed for CRAM files.
HISTORY
samtools was developed by Heng Li at the Wellcome Sanger Institute, released around 2009. It defined the SAM/BAM formats that became standards for sequence alignment. The project is maintained by the samtools/htslib team, part of the broader bioinformatics ecosystem built on these formats.
