LinuxCommandLibrary

samtools

Manipulate SAM/BAM/CRAM sequence alignment files

TLDR

View BAM file

$ samtools view [alignment.bam]
copy
Convert SAM to BAM
$ samtools view -bS [alignment.sam] > [alignment.bam]
copy
Sort BAM file
$ samtools sort [input.bam] -o [sorted.bam]
copy
Index BAM file
$ samtools index [sorted.bam]
copy
View specific region
$ samtools view [sorted.bam] [chr1:1000-2000]
copy
Count alignments
$ samtools view -c [alignment.bam]
copy
Generate statistics
$ samtools flagstat [alignment.bam]
copy
Merge BAM files
$ samtools merge [output.bam] [input1.bam] [input2.bam]
copy

SYNOPSIS

samtools command [-b] [-o output] [-@ threads] [options] [file] [region]

DESCRIPTION

samtools manipulates alignments in SAM (Sequence Alignment/Map) format and its binary equivalent BAM. It's essential for next-generation sequencing data analysis.
SAM/BAM files contain sequence reads aligned to reference genomes. Each record includes read name, position, mapping quality, CIGAR string (alignment operations), and optional tags.
The view command converts between formats and filters alignments. Sorted BAM files with indices enable random access to genomic regions. Most downstream tools require sorted, indexed BAM.
Statistics commands (flagstat, stats, idxstats) summarize alignment characteristics: mapping rates, insert sizes, coverage distributions. These quality metrics guide analysis decisions.
Pileup output (mpileup) aggregates alignments at each position for variant calling. Coverage commands calculate read depth across regions.
CRAM format provides better compression than BAM with reference-based encoding. Samtools handles CRAM transparently.

PARAMETERS

view

View/convert SAM/BAM/CRAM.
sort
Sort alignments.
index
Create BAM index.
merge
Merge sorted files.
flagstat
Statistics from FLAG field.
stats
Comprehensive statistics.
idxstats
Per-reference statistics.
faidx
Index FASTA file.
depth
Compute depth at each position.
mpileup
Generate pileup for variants.
coverage
Calculate coverage statistics.
fastq
Extract FASTQ from BAM.
-b
Output BAM format.
-S
Input is SAM (deprecated, auto-detected).
-o FILE
Output file.
-@ NUM, --threads NUM
Number of threads.
-f FLAGS
Only include reads with FLAGS.
-F FLAGS
Exclude reads with FLAGS.
-q MAPQ
Minimum mapping quality.
-h
Include header.

CAVEATS

Large BAM files require significant memory for some operations. Threading helps but some commands are single-threaded. Unsorted BAM limits available operations. Index required for random access. Reference needed for CRAM files.

HISTORY

samtools was developed by Heng Li at the Wellcome Sanger Institute, released around 2009. It defined the SAM/BAM formats that became standards for sequence alignment. The project is maintained by the samtools/htslib team, part of the broader bioinformatics ecosystem built on these formats.

SEE ALSO

bcftools(1), bwa(1), bedtools(1), tabix(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard

> TERMINAL_GEAR

Curated for the Linux community