tblastx

Search nucleotide sequences for translated protein similarities

SYNOPSIS

tblastx -query <query_file> -db <database_name> [options]

Example:
tblastx -query my_gene.fna -db nr_nucl -out blastx_results.txt -outfmt 7 -evalue 1e-6 -num_threads 8

-query <file>
    Specifies the FASTA file containing the query nucleotide sequence(s).

-db <string>
    Defines the name of the BLAST nucleotide database to search against. This database must be pre-formatted using makeblastdb.

-out <file>
    Redirects the output of the BLAST search to the specified file instead of standard output.

-outfmt <format>
    Controls the output format. Common values include 0 (Pairwise), 6 (Tabular), 7 (Tabular with comments), 11 (ASN.1).

-evalue <real>
    Sets the expectation value (E-value) threshold for reporting matches. Lower values mean more stringent results.

-max_target_seqs <int>
    Specifies the maximum number of aligned sequences to keep and display in the output.

-num_threads <int>
    Enables multi-threading, specifying the number of CPU threads to use for the search, speeding up execution.

-matrix <string>
    Specifies the amino acid scoring matrix to use (e.g., BLOSUM62, PAM30). Default is BLOSUM62.

-gapopen <int>
    Penalty for opening a gap in an alignment.

-gapextend <int>
    Penalty for extending an existing gap in an alignment.

-word_size <int>
    The size of the initial exact match needed to start an alignment.

-remote
    Performs the search on NCBI's public BLAST servers instead of a local database.

-help
    Displays the full help message and available options for the command.

DESCRIPTION

The tblastx command, part of the NCBI BLAST+ suite, performs a sequence similarity search where both the query nucleotide sequence and the subject nucleotide database are dynamically translated in all six possible reading frames (three forward and three reverse complementary). This comprehensive translation allows for the detection of very distant evolutionary relationships or conserved protein domains, even when direct nucleotide sequence similarity is low due to mutations, insertions, or deletions that might cause frameshifts. It is particularly powerful for cross-species comparisons or when analyzing genomic regions that may contain coding sequences with unknown protein products, providing insights into potential protein function based on homology to translated sequences. The output includes alignments, statistical scores (like E-value), and other relevant information about the detected similarities.

CAVEATS

tblastx searches are computationally intensive due to the six-frame translation of both query and database. This can lead to longer execution times and higher memory consumption compared to other BLAST programs, especially with large query files or databases. Ensure sufficient system resources are available. Proper database formatting using makeblastdb is crucial; tblastx cannot search unindexed FASTA files directly.

USAGE CONTEXT

tblastx is ideal for finding highly divergent protein sequences or identifying coding regions in newly sequenced genomes where protein products are unknown. Its six-frame translation approach makes it robust against frameshift mutations that might obscure similarity in other BLAST variants. It's often used in comparative genomics and evolutionary biology studies.

DATABASE REQUIREMENTS

Before running tblastx, the target nucleotide sequence database must be formatted using the makeblastdb command. This process creates necessary index files that allow tblastx to efficiently search and translate the database sequences. Without a properly formatted database, tblastx will not be able to perform a search.

HISTORY

tblastx is a fundamental component of the original BLAST suite, developed by Stephen Altschul, Warren Gish, Webb Miller, Eugene Myers, and David Lipman at the National Center for Biotechnology Information (NCBI). It was designed to address the challenge of finding protein homology from nucleotide sequences, particularly when direct nucleotide similarity is low. With the introduction of the BLAST+ suite (a complete rewrite in C++), tblastx, along with other BLAST programs, gained significant performance improvements, better multi-threading capabilities, and enhanced modularity, while retaining its core functionality and utility in bioinformatics.