LinuxCommandLibrary

samtools

Manipulate and analyze sequence alignment data

TLDR

Convert a SAM input file to BAM stream and save to file

$ samtools view -S [[-b|--bam]] [input.sam] > [output.bam]
copy

Take input from stdin (-) and print the SAM header and any reads overlapping a specific region to stdout
$ [other_command] | samtools view [[-h|--with-header]] - chromosome:start-end
copy

Sort file and save to BAM (the output format is automatically determined from the output file's extension)
$ samtools sort [input] [[-o|--output]] [output.bam]
copy

Index a sorted BAM file (creates sorted_input.bam.bai)
$ samtools index [sorted_input.bam]
copy

Print alignment statistics about a file
$ samtools flagstat [sorted_input]
copy

Count alignments to each index (chromosome/contig)
$ samtools idxstats [sorted_indexed_input]
copy

Merge multiple files
$ samtools merge [output] [input1 input2 ...]
copy

Split input file according to read groups
$ samtools split [merged_input]
copy

SYNOPSIS

samtools [options]

PARAMETERS

view
    Converts SAM to BAM, BAM to SAM, calls alignments in a specific region, etc.

sort
    Sorts a SAM or BAM file.

index
    Indexes a BAM file for fast random access.

merge
    Merges multiple SAM or BAM files into one.

mpileup
    Generates genotype likelihoods for variant calling.

faidx
    Indexes FASTA file.

tview
    Text alignment viewer.

depth
    Calculates read depth at each position.

flagstat
    Provides simple statistics from BAM file.

DESCRIPTION

samtools is a suite of programs for interacting with and manipulating sequence alignment data in the SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map), and CRAM (Compressed Alignment/Map) formats. These formats are commonly used for storing and analyzing next-generation sequencing data. samtools provides a wide range of functionalities, including indexing alignment files for efficient random access, sorting and merging alignment data, filtering reads based on various criteria, generating summary statistics of alignment data, and converting between different alignment formats.
The toolset is essential for bioinformatics workflows involving read mapping, variant calling, and other downstream analyses of sequencing data. samtools is widely used and actively maintained, making it a reliable and powerful resource for researchers working with genomic data.

CAVEATS

Many samtools commands require indexed BAM files for efficient operation. Ensure that BAM files are properly indexed before using commands that require random access.

EXIT STATUS

samtools returns 0 on successful completion, and non-zero on failure.

FORMAT SPECIFICATIONS

samtools primarily works with SAM, BAM and CRAM file formats. Specifications for each format can be found in the samtools documentation.

HISTORY

samtools was initially developed by Heng Li at the Sanger Institute. It has evolved significantly over time with contributions from many developers. It is written in C and designed for performance and efficiency. samtools is an essential component of many bioinformatics pipelines, used for processing and analyzing sequencing data from a variety of platforms.

SEE ALSO

bcftools(1), tabix(1)

Copied to clipboard