blastn

Search nucleotide databases for similar sequences

TLDR

Align two or more sequences using megablast (default), with the e-value threshold of 1e-9, pairwise output format (default)

$ blastn -query [query.fa] -subject [subject.fa] -evalue [1e-9]

Align two or more sequences using blastn

$ blastn -task blastn -query [query.fa] -subject [subject.fa]

Align two or more sequences, custom tabular output format, output to file

$ blastn -query [query.fa] -subject [subject.fa] -outfmt '[6 qseqid qlen qstart qend sseqid slen sstart send bitscore evalue pident]' -out [output.tsv]

Search nucleotide databases using a nucleotide query, 16 threads (CPUs) to use in the BLAST search, with a maximum number of 10 aligned sequences to keep

$ blastn -query [query.fa] -db [path/to/blast_db] -num_threads [16] -max_target_seqs [10]

Search the remote non-redundant nucleotide database using a nucleotide query

$ blastn -query [query.fa] -db [nt] -remote

Display help (use -help for detailed help)

$ blastn -h

SYNOPSIS

blastn -query <Query_file> -db <Database> [-out <Output_file>] [options]

-query <File_in>
    FASTA file with nucleotide query sequence(s)

-db <String>
    Pre-formatted nucleotide database name

-out <File_out>
    Output file name; default is stdout

-outfmt <String>
    Output format; e.g., '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore'

-evalue <Real>
    Expectation value threshold for saving hits; default 10.0

-max_target_seqs <Integer>
    Max number of aligned sequences to keep; default 100

-max_hsps <Integer>
    Max HSPs per subject sequence; default 0 (unlimited)

-perc_identity <Real>
    Percent identity threshold for filtering alignments

-qcov_hsp_perc <Real>
    Query coverage per HSP threshold

-task <String>
    Task: blastn, megablast, dc-megablast, blastn-short, blastr

-strand <String>
    Query strand: both, minus, plus

-dust <String>
    Low-complexity filter: 'yes', 'no', 'T' or level

-word_size <Integer>
    Word size for initial matching; default varies by task

-penalty <Integer>
    Mismatch penalty

-reward <Integer>
    Match reward

-gapopen <Integer>
    Gap open cost

-gapextend <Integer>
    Gap extension cost

-use_index <Boolean>
    Use pre-built database index

-num_threads <Integer>
    Number of CPU threads; default 1

-help
    Display help

-version
    Print version info

DESCRIPTION

blastn is a command-line tool from the NCBI BLAST+ suite for comparing nucleotide query sequences to nucleotide subject databases using the BLAST algorithm. It identifies regions of local similarity by finding high-scoring segment pairs (HSPs), suitable for discovering homologous DNA sequences, finding genes, or annotating genomes.

It supports various tasks like standard blastn for divergent matches, megablast for closely related sequences, and dc-megablast for discontiguous megablast. Users specify a FASTA query file and a pre-formatted nucleotide database. Results include alignments with scores, E-values, identities, and gaps.

Optimized for speed on large datasets, it handles megabases efficiently but requires significant RAM and CPU. Output formats range from pairwise to tabular for parsing. Essential for bioinformatics pipelines in genomics and metagenomics.

CAVEATS

Requires BLAST+ installation and databases formatted via makeblastdb. Large databases demand high RAM (e.g., nt db needs 100s GB). Not for protein searches (use blastp). E-value tuning critical to avoid false positives/negatives.
Indexing optional but speeds up remote searches.

blastn

Search nucleotide databases for similar sequences

TLDR

SYNOPSIS

PARAMETERS

DESCRIPTION

CAVEATS

COMMON TASKS

OUTPUT PARSING

PERFORMANCE TIPS

HISTORY

SEE ALSO