blastn
Search nucleotide databases for similar sequences
TLDR
Align two or more sequences using megablast (default), with the e-value threshold of 1e-9, pairwise output format (default)
Align two or more sequences using blastn
Align two or more sequences, custom tabular output format, output to file
Search nucleotide databases using a nucleotide query, 16 threads (CPUs) to use in the BLAST search, with a maximum number of 10 aligned sequences to keep
Search the remote non-redundant nucleotide database using a nucleotide query
Display help (use -help for detailed help)
SYNOPSIS
blastn [OPTIONS] -query <query_file> -db <database_name> [-out <output_file>]
PARAMETERS
-query <file>
Path to the input query nucleotide sequence(s) file in FASTA format.
-db <name>
Name or path of the BLAST nucleotide database to search against. This database must be pre-formatted using makeblastdb.
-out <file>
Path to the output file where results will be written. If not specified, results go to standard output.
-outfmt <format>
Specifies the output format. Common values include 0 (pairwise), 6 (tabular, often preferred for parsing), and 7 (tabular with comments).
-evalue <float>
Expectation value (E-value) cutoff for reporting matches. Lower values indicate higher statistical significance.
-max_target_seqs <int>
Maximum number of aligned sequences to keep for each query.
-num_threads <int>
Number of threads (CPU cores) to use for parallel execution, speeding up searches on multi-core systems.
-task <string>
Specify a particular BLAST algorithm task. Examples include 'blastn' (default), 'megablast' (optimized for highly similar sequences), and 'dc-megablast'.
-strand <string>
Search for matches on the 'plus' (forward), 'minus' (reverse complement), or 'both' strands of the query sequence.
-perc_identity <float>
Minimum percentage of identical matches for an alignment to be reported (e.g., 90.0 for 90% identity).
-word_size <int>
Size of the initial word (seed) for seeding alignments. Larger words are faster but less sensitive. Default is 11 for blastn, 28 for megablast.
DESCRIPTION
The blastn command is a fundamental tool within the NCBI BLAST+ suite, designed for performing nucleotide-nucleotide basic local alignment searches. It efficiently compares a query nucleotide sequence (or multiple sequences) against a pre-formatted nucleotide sequence database. The primary goal is to identify regions of local similarity, which often indicate functional, structural, or evolutionary relationships between sequences.
blastn utilizes a heuristic algorithm to achieve high search speed, making it suitable for searching large databases. While it sacrifices some sensitivity compared to optimal alignment algorithms, it provides a practical balance for genomic-scale analysis. Biologists and bioinformaticians widely use blastn for tasks such as gene discovery, identifying homologous sequences, primer design, validating sequencing reads, and phylogenetic analysis. The output typically includes alignment scores, E-values (expected number of random matches), percent identity, and graphical representations, enabling comprehensive interpretation of similarity.
CAVEATS
Due to its heuristic nature, blastn is not guaranteed to find all possible optimal alignments, especially for very divergent sequences. A pre-formatted database created with makeblastdb is essential before running blastn. Large queries or databases can be resource-intensive, requiring significant CPU and memory. The interpretation of E-values and understanding of output formats are crucial for meaningful results.
BLAST DATABASES
Before running blastn, you must prepare your target sequences into a BLAST database using the makeblastdb command. This process indexes the sequences, allowing blastn to search them efficiently. Without a properly formatted database, blastn cannot execute a search.
E-VALUE SIGNIFICANCE
The E-value (Expectation value) is a critical statistical measure in BLAST results. It represents the number of hits one can 'expect' to see by chance when searching a database of a particular size. A lower E-value indicates a more statistically significant match, meaning it's less likely to have occurred randomly. Common cutoffs are 1e-5 or 1e-10 for significant biological relationships.
OUTPUT FORMATS
Choosing the correct output format (via -outfmt) is vital for how you intend to use the results. Format 0 (pairwise) is human-readable, showing detailed alignments. Format 6 (tabular) is highly recommended for programmatic parsing, as it outputs results in a clean, tab-separated table, making it easy to integrate into scripts and data analysis workflows.
HISTORY
The original BLAST algorithm, upon which blastn is based, was first published by Stephen F. Altschul and colleagues in 1990. Developed at the National Center for Biotechnology Information (NCBI), it quickly became a cornerstone tool in bioinformatics. The current command-line utility, part of the BLAST+ suite, represents a significant rewrite in C++ from the original C toolkit, released around 2009. This iteration brought improved performance, multi-threading capabilities, and a more modular design, solidifying blastn's role as the go-to tool for rapid nucleotide sequence comparison.
SEE ALSO
makeblastdb(1), blastp(1), blastx(1), tblastn(1), tblastx(1)