blastdbcmd
Extract sequences from BLAST databases
TLDR
SYNOPSIS
blastdbcmd [-db database] [-entry id] [options]
DESCRIPTION
blastdbcmd is a utility for extracting sequences and metadata from BLAST databases. It can retrieve individual sequences by accession, extract all sequences, display database statistics, and generate custom reports.The tool is part of the NCBI BLAST+ suite and works with databases created by makeblastdb or downloaded from NCBI.
PARAMETERS
-db name
BLAST database name or path-entry id
Sequence identifier(s) to retrieve; use "all" for entire database-entry_batch file
File containing list of sequence identifiers-out file
Output file (default: stdout).-outfmt format
Custom output format string using % tokens.-info
Display database information (type, number of sequences, total length, date).-list path
List databases in specified path.-recursive
Search directories recursively (with -list).-show_blastdb_search_path
Display BLAST database search paths.-dbtype type
Database type: nucl (nucleotide) or prot (protein). Needed when both types share a name.-target_only
Retrieve only target sequences (no redundant group members).-tax_info
Display taxonomy information (requires taxonomy database).-range start-stop
Extract subsequence range (1-based, inclusive).-strand strand
Strand to retrieve: plus or minus (nucleotide only).-line_length N
Line length for FASTA output (default: 80). Use 0 for single-line sequences.-long_seqids
Use long sequence identifiers including database and accession.version.
OUTPUT FORMAT TOKENS
%a - Accession%g - GI number%o - OID (ordinal ID)%t - Title (definition line)%s - Sequence data%l - Sequence length%T - Taxonomy ID%S - Scientific name%L - Common name%m - Masking data%h - Hash value%e - Membership integer
ENVIRONMENT
BLASTDB
Colon-separated list of directories to search for BLAST databases.
CAVEATS
Requires pre-formatted BLAST databases created by makeblastdb or downloaded from NCBI. Taxonomy information requires the BLAST taxonomy database (taxdb.btd/bti) to be installed. Large extractions may require significant time and disk space. The -range option uses 1-based inclusive coordinates.
SEE ALSO
makeblastdb(1), blastn(1), blastp(1), blastx(1), tblastn(1)
