LinuxCommandLibrary

blastdbcmd

Retrieve sequences from BLAST database

SYNOPSIS

blastdbcmd [options]

PARAMETERS

-db <String>
    BLAST database name or path (local or remote).

-dbtype {nucl|prot}
    Database type: nucleotide (nucl) or protein (prot).

-entry <String>
    Single sequence ID (GI or accession).

-entry_batch <File>
    File with one sequence ID per line.

-seqidlist <File>
    File listing sequence IDs for batch retrieval.

-out <File>
    Output file; defaults to stdout.

-outfmt <String>
    Output format (e.g., fasta, fastx, 0 for FASTA).

-range <String>
    Genomic range (e.g., 1-100).

-strand {plus|minus|both}
    DNA strand selection.

-start <Integer>
    Starting coordinate (1-based).

-stop <Integer>
    Ending coordinate (1-based).

-length <Integer>
    Sequence length to extract.

-lineage [{all|taxid}]
    Fetch taxonomy lineage.

-taxid <Integer>
    Single TaxID filter.

-taxids <String>
    Comma-separated TaxIDs.

-taxidlist <File>
    File with TaxIDs, one per line.

-title <String>
    Custom defline title.

-header {T|F}
    Include FASTA header (True).

-info
    Display database summary info.

-show_gis
    Show GI numbers in output.

-help / -h
    Print usage summary.

-version
    Print version information.

DESCRIPTION

blastdbcmd is a versatile command-line utility from the NCBI BLAST+ toolkit, designed for extracting and manipulating sequence data from pre-formatted BLAST databases. It enables users to fetch specific sequences by identifiers such as GI numbers, accession numbers, or SeqIDs, supporting both single entries and batch processing from files.

Common use cases include retrieving FASTA sequences for alignment preparation, inspecting database contents, generating custom sequence subsets, or extracting associated metadata like taxonomy lineages and titles. The tool supports a wide range of output formats (e.g., FASTA, tabular, ASN.1) and options for specifying genomic ranges, strands, or taxons.

blastdbcmd is essential in bioinformatics pipelines for database querying without full BLAST searches, offering efficiency for large-scale genomic data handling. It works with both protein and nucleotide databases created via makeblastdb, and supports remote NCBI databases. Its flexibility makes it invaluable for researchers analyzing NGS data, phylogenetics, or functional genomics.

CAVEATS

Requires BLAST+ installation and databases formatted with makeblastdb. Remote access needs internet; large batches may consume high memory. Output formats are case-sensitive.

COMMON EXAMPLE

Single sequence:
blastdbcmd -db nt -entry NC_000001.11 -outfmt fasta

Batch:
blastdbcmd -db nr -entry_batch ids.txt -outfmt fasta > output.fasta

OUTPUT FORMATS

Key codes: 0=FASTA, 1=tabular GI, 2=plain text, 3=ASN.1, 11=XML. See blastdbcmd -help for full list.

HISTORY

Introduced in NCBI BLAST+ 2.2.22 (2010) as a replacement for legacy blastdbcmd from BLAST 2.2.x. Continuously updated with BLAST+ releases for improved performance, remote db support, and new formats.

SEE ALSO

makeblastdb(1), blastn(1), blastp(1), blastx(1), update_blastdb.pl(1)

Copied to clipboard