bwa
Align DNA sequences to a reference genome
TLDR
Index the reference genome
Map single-end reads (sequences) to indexed genome using 32 [t]hreads and compress the result to save space
Map pair-end reads (sequences) to the indexed genome using 32 [t]hreads and compress the result to save space
Map pair-end reads (sequences) to the indexed genome using 32 [t]hreads with [M]arking shorter split hits as secondary for output SAM file compatibility in Picard software and compress the result
Map pair-end reads (sequences) to indexed genome using 32 [t]hreads with FASTA/Q [C]omments (e.g. BC:Z:CGTAC) appending to a compressed result
SYNOPSIS
bwa <command> [options]
Examples:
bwa index [-p prefix] [-a algo] <ref.fa>
bwa mem [options] <ref> <in1.fq> [<in2.fq>]
PARAMETERS
index
Index reference FASTA sequences into BWT and auxiliary files
mem
Run BWA-MEM algorithm: seed, chain, align reads to SAM
aln
Legacy: BWA-ALN gapped alignment to .sai files (short reads)
samse
Convert single-end .sai alignments to SAM
sampe
Convert paired-end .sai alignments to SAM
bwasw
BWA-SW for long-query gapped alignment
-t N
Number of threads (default: 1)
-R STR
Read group header line (SAM @RG)
-c INT
Skip Smith-Waterman (mem; faster, less accurate)
-M
Mark shorter split hits as secondary (mem; GATK compat)
DESCRIPTION
BWA (Burrows-Wheeler Aligner) is a popular open-source software tool for mapping DNA sequencing reads to a reference genome. It excels in speed, accuracy, and low memory usage, leveraging the Burrows-Wheeler Transform (BWT) and FM-index for efficient queries.
Originally designed for short Illumina reads, modern versions like bwa mem handle longer reads from PacBio, Oxford Nanopore, and paired-end data. It supports base quality scores, clipping, and gapped alignment.
Workflow typically starts with bwa index to build compact index files (.bwt, .pac, etc.) from a reference FASTA file. Alignment subcommands generate SAM/BAM output compatible with samtools and GATK pipelines.
bwa mem is the flagship algorithm, combining seeding, chaining, and Smith-Waterman for optimal results. Legacy modes (aln/sampe) are faster for very short reads but deprecated for new projects.
Developed for bioinformatics, BWA is essential in NGS (Next-Generation Sequencing) analysis, variant calling, and RNA-seq. It's highly cited, with ongoing updates for emerging technologies.
CAVEATS
Best for reads <500bp; use minimap2 for ultra-long reads. Requires pre-indexed reference. Legacy aln/sampe faster but less accurate than mem. High-memory for large genomes.
QUICK EXAMPLE
Index: bwa index ref.fa
Align PE: bwa mem -t 8 ref.fa reads_R1.fq reads_R2.fq | samtools sort -o aligned.bam
OUTPUT
Produces SAM format with MAPQ, CIGAR, MD tag. Pipe to samtools view/sort for BAM.
HISTORY
Created by Heng Li in 2009 at Wellcome Sanger Institute. Initial aln algorithm for short reads. BWA-MEM (v0.7, 2013) revolutionized with chaining for longer reads. Maintained on GitHub; v0.7.17 (2018) last major release.
SEE ALSO
bowtie2(1), minimap2(1), samtools(1), gatk(1)


