blastn
Search nucleotide databases for similar sequences
TLDR
Align two or more sequences using megablast (default), with the e-value threshold of 1e-9, pairwise output format (default)
Align two or more sequences using blastn
Align two or more sequences, custom tabular output format, output to file
Search nucleotide databases using a nucleotide query, 16 threads (CPUs) to use in the BLAST search, with a maximum number of 10 aligned sequences to keep
Search the remote non-redundant nucleotide database using a nucleotide query
Display help (use -help for detailed help)
SYNOPSIS
blastn -query <Query_file> -db <Database> [-out <Output_file>] [options]
PARAMETERS
-query <File_in>
FASTA file with nucleotide query sequence(s)
-db <String>
Pre-formatted nucleotide database name
-out <File_out>
Output file name; default is stdout
-outfmt <String>
Output format; e.g., '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore'
-evalue <Real>
Expectation value threshold for saving hits; default 10.0
-max_target_seqs <Integer>
Max number of aligned sequences to keep; default 100
-max_hsps <Integer>
Max HSPs per subject sequence; default 0 (unlimited)
-perc_identity <Real>
Percent identity threshold for filtering alignments
-qcov_hsp_perc <Real>
Query coverage per HSP threshold
-task <String>
Task: blastn, megablast, dc-megablast, blastn-short, blastr
-strand <String>
Query strand: both, minus, plus
-dust <String>
Low-complexity filter: 'yes', 'no', 'T' or level
-word_size <Integer>
Word size for initial matching; default varies by task
-penalty <Integer>
Mismatch penalty
-reward <Integer>
Match reward
-gapopen <Integer>
Gap open cost
-gapextend <Integer>
Gap extension cost
-use_index <Boolean>
Use pre-built database index
-num_threads <Integer>
Number of CPU threads; default 1
-help
Display help
-version
Print version info
DESCRIPTION
blastn is a command-line tool from the NCBI BLAST+ suite for comparing nucleotide query sequences to nucleotide subject databases using the BLAST algorithm. It identifies regions of local similarity by finding high-scoring segment pairs (HSPs), suitable for discovering homologous DNA sequences, finding genes, or annotating genomes.
It supports various tasks like standard blastn for divergent matches, megablast for closely related sequences, and dc-megablast for discontiguous megablast. Users specify a FASTA query file and a pre-formatted nucleotide database. Results include alignments with scores, E-values, identities, and gaps.
Optimized for speed on large datasets, it handles megabases efficiently but requires significant RAM and CPU. Output formats range from pairwise to tabular for parsing. Essential for bioinformatics pipelines in genomics and metagenomics.
CAVEATS
Requires BLAST+ installation and databases formatted via makeblastdb. Large databases demand high RAM (e.g., nt db needs 100s GB). Not for protein searches (use blastp). E-value tuning critical to avoid false positives/negatives.
Indexing optional but speeds up remote searches.
COMMON TASKS
blastn: General purpose.
megablast: Closely related sequences (faster).
dc-megablast: More divergent with masking.
blastn-short: Short reads (<30bp).
OUTPUT PARSING
Use -outfmt 6 for tabular (TSV). Columns customizable. Pairwise default is -outfmt 0.
PERFORMANCE TIPS
Set -num_threads to CPU cores. Use -task megablast for speed. Pre-index with -make_index.
HISTORY
Part of BLAST suite developed by NCBI. Original BLAST paper by Altschul et al. (1990). BLAST+ (stand-alone) released 2009, replacing legacy C version. Current 2.15+ supports new features like cloud-optimized indexing and lineage-specific hits.
SEE ALSO
blastp(1), blastx(1), tblastn(1), tblastx(1), makeblastdb(1), blastdbcmd(1)


