bwa

Align DNA sequences to a reference genome

TLDR

Index the reference genome

$ bwa index [path/to/reference.fa]

Map single-end reads (sequences) to indexed genome using 32 [t]hreads and compress the result to save space

$ bwa mem -t 32 [path/to/reference.fa] [path/to/read_single_end.fq.gz] | gzip > [path/to/alignment_single_end.sam.gz]

Map pair-end reads (sequences) to the indexed genome using 32 [t]hreads and compress the result to save space

$ bwa mem -t 32 [path/to/reference.fa] [path/to/read_pair_end_1.fq.gz] [path/to/read_pair_end_2.fq.gz] | gzip > [path/to/alignment_pair_end.sam.gz]

Map pair-end reads (sequences) to the indexed genome using 32 [t]hreads with [M]arking shorter split hits as secondary for output SAM file compatibility in Picard software and compress the result

$ bwa mem -M -t 32 [path/to/reference.fa] [path/to/read_pair_end_1.fq.gz] [path/to/read_pair_end_2.fq.gz] | gzip > [path/to/alignment_pair_end.sam.gz]

Map pair-end reads (sequences) to indexed genome using 32 [t]hreads with FASTA/Q [C]omments (e.g. BC:Z:CGTAC) appending to a compressed result

$ bwa mem -C -t 32 [path/to/reference.fa] [path/to/read_pair_end_1.fq.gz] [path/to/read_pair_end_2.fq.gz] | gzip > [path/to/alignment_pair_end.sam.gz]

SYNOPSIS

bwa <command> [options]

Examples:
bwa index [-p prefix] [-a algo] <ref.fa>
bwa mem [options] <ref> <in1.fq> [<in2.fq>]

index
    Index reference FASTA sequences into BWT and auxiliary files

mem
    Run BWA-MEM algorithm: seed, chain, align reads to SAM

aln
    Legacy: BWA-ALN gapped alignment to .sai files (short reads)

samse
    Convert single-end .sai alignments to SAM

sampe
    Convert paired-end .sai alignments to SAM

bwasw
    BWA-SW for long-query gapped alignment

-t N
    Number of threads (default: 1)

-R STR
    Read group header line (SAM @RG)

-c INT
    Skip Smith-Waterman (mem; faster, less accurate)

-M
    Mark shorter split hits as secondary (mem; GATK compat)

DESCRIPTION

BWA (Burrows-Wheeler Aligner) is a popular open-source software tool for mapping DNA sequencing reads to a reference genome. It excels in speed, accuracy, and low memory usage, leveraging the Burrows-Wheeler Transform (BWT) and FM-index for efficient queries.

Originally designed for short Illumina reads, modern versions like bwa mem handle longer reads from PacBio, Oxford Nanopore, and paired-end data. It supports base quality scores, clipping, and gapped alignment.

Workflow typically starts with bwa index to build compact index files (.bwt, .pac, etc.) from a reference FASTA file. Alignment subcommands generate SAM/BAM output compatible with samtools and GATK pipelines.

bwa mem is the flagship algorithm, combining seeding, chaining, and Smith-Waterman for optimal results. Legacy modes (aln/sampe) are faster for very short reads but deprecated for new projects.

Developed for bioinformatics, BWA is essential in NGS (Next-Generation Sequencing) analysis, variant calling, and RNA-seq. It's highly cited, with ongoing updates for emerging technologies.

bwa

Align DNA sequences to a reference genome

TLDR

SYNOPSIS

PARAMETERS

DESCRIPTION

CAVEATS

QUICK EXAMPLE

OUTPUT

HISTORY

SEE ALSO