blastx
Search translated nucleotide sequence against protein database
SYNOPSIS
blastx [-h] [-help] [-version] [-query <File_in>] [-db <Str>] [-out <File_out>] [OPTIONS...]
PARAMETERS
-query <File_in>
Nucleotide query FASTA/FASTQ file
-db <String>
Protein database name (pre-formatted)
-out <File_out>
Output file; default stdout
-evalue <Real>
Expectation threshold (default 10.0)
-outfmt <String>
Output format (e.g., 6 for tabular)
-num_threads <Int>
CPU threads (default 1)
-max_target_seqs <Int>
Max target sequences (default 100)
-max_hsps <Int>
Max HSPs per subject (default 0, no limit)
-seg <String>
Low-complexity filter (default yes)
-dust <String>
Nucleotide low-complexity filter
-soft_masking <Bool>
Apply soft masking (default true)
-lcase_masking
Use lower case as masking
-parse_seqids
Parse query Seq-ids
-query_gencode <Int>
Genetic code (default 1)
-frame <String>
Query frame(s): 'F'/'R'/'B'
-num_alignments <Int>
Number of alignments (legacy)
-num_descriptions <Int>
Number of descriptions (legacy)
-ungapped
Ungapped alignments
-word_size <Int>
Word size (default 3)
-matrix <String>
Scoring matrix (default BLOSUM62)
-threshold <Int>
Neighborhood word threshold
-comp_based_stats <Int>
Composition stats (0-3)
-use_sw_tback
Use Smith-Waterman traceback
DESCRIPTION
blastx is a command-line tool from NCBI's BLAST+ suite for bioinformatics. It translates a nucleotide query sequence in all six reading frames (three forward, three reverse-complement) into hypothetical proteins and searches a protein database for similar sequences using the BLAST algorithm. This identifies potential protein-coding regions in genomic DNA, ESTs, or cDNA even without prior annotation.
Key advantages include detecting distant homologs despite frameshifts, introns, or sequencing errors. Outputs include alignments with scores, E-values, identities, positives, and gaps. E-value indicates significance: lower is better (e.g., <0.001).
Usage suits large-scale genomic analysis, metagenomics, and functional annotation. Requires pre-built protein databases like nr, swissprot. Computationally intensive; benefits from multi-threading. Integrates with pipelines via tabular outputs for parsing.
CAVEATS
Requires BLAST+ installed and pre-formatted databases (use makeblastdb). Large databases need significant RAM/disk. Six-frame translation increases compute time vs. blastp. Default filters may miss hits; tune with care. Legacy options may be deprecated.
COMMON OUTPUT FORMATS
-outfmt 6: tabular (qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore).
-outfmt 0: pairwise.
-outfmt 7: alignment.
DATABASE PREP
Download from NCBI (e.g., nr). Run makeblastdb -in proteins.fasta -dbtype prot -out dbname.
E-VALUE INTERPRETATION
Expected hits by chance: <10^-5 significant. Adjust via -evalue for stringency.
HISTORY
Developed by NCBI Altschul et al. (1990 paper). blastx in original BLAST (1997). BLAST+ 2.2.22+ (2010) replaced legacy C version with faster C++ implementation, better threading, supporting modern formats. Widely used in genomics since.
SEE ALSO
blastp(1), blastn(1), tblastn(1), tblastx(1), makeblastdb(1), blastdbcmd(1)


