LinuxCommandLibrary

blastp

Search protein databases for sequence similarity

TLDR

Align two or more sequences using blastp, with the e-value threshold of 1e-9, pairwise output format, output to screen

$ blastp -query [query.fa] -subject [subject.fa] -evalue [1e-9]
copy

Align two or more sequences using blastp-fast
$ blastp -task blastp-fast -query [query.fa] -subject [subject.fa]
copy

Align two or more sequences, custom tabular output format, output to file
$ blastp -query [query.fa] -subject [subject.fa] -outfmt '[6 qseqid qlen qstart qend sseqid slen sstart send bitscore evalue pident]' -out [output.tsv]
copy

Search protein databases using a protein query, 16 threads to use in the BLAST search, with a maximum number of 10 aligned sequences to keep
$ blastp -query [query.fa] -db [blast_database_name] -num_threads [16] -max_target_seqs [10]
copy

Search the remote non-redundant protein database using a protein query
$ blastp -query [query.fa] -db [nr] -remote
copy

Display help (use -help for detailed help)
$ blastp -h
copy

SYNOPSIS

blastp -query <File> -db <String> [-out <File>] [options...]

PARAMETERS

-query <File>
    Input query protein FASTA file

-db <String>
    Pre-formatted protein database name

-out <File>
    Output file (default: stdout)

-evalue <Real>
    Expectation value cutoff (default: 10.0)

-outfmt <Integer>
    Output format (e.g., 0=pairwise, 6=tabular)

-max_target_seqs <Integer>
    Max target sequences (default: 10)

-num_threads <Integer>
    Number of CPU threads

-max_hsps <Integer>
    Max HSPs per subject (default: 0=unlimited)

-qcov_hsp_perc <Integer>
    Query coverage per HSP (%)

-perc_identity <Real>
    Percent identity cutoff

-ungapped
    Perform ungapped alignments

-dbtype <String>
    Database type (e.g., prot)

-query_gencode <Integer>
    Genetic code for query (default: 1)

-subject_gencode <Integer>
    Genetic code for database

-matrix <String>
    Scoring matrix (default: BLOSUM62)

-threshold <Integer>
    Word threshold (default: 11)

-word_size <Integer>
    Word size (default: 3)

-gapopen <Integer>
    Gap opening penalty

-gapextend <Integer>
    Gap extension penalty

-seg <String>
    Low-complexity filtering (default: yes)

-soft_masking <Boolean>
    Apply soft masking

-help
    Print help

-version
    Print version info

DESCRIPTION

BLASTP (Basic Local Alignment Search Tool for Proteins) is a command-line program from the NCBI BLAST+ suite used to compare a protein query sequence against a protein sequence database, identifying regions of local similarity.

It employs a heuristic algorithm to rapidly find statistically significant matches, reporting alignments with scores, E-values, and identities. Ideal for annotating protein function, discovering homologs, or evolutionary studies.

Key steps: Provide a FASTA query file and a pre-formatted protein database (e.g., nr, swissprot via makeblastdb). Outputs include pairwise alignments, tabular data, or XML.

Computationally efficient with multi-threading support, but large databases require significant RAM and time. E-value thresholds control false positives; lower is stricter.

CAVEATS

Requires pre-formatted databases via makeblastdb; large DBs (e.g., nr) need 100s GB RAM/disk. Not for nucleotide searches. E-values depend on DB size. Multi-threading may not scale linearly.

TABULAR OUTPUT EXAMPLE

-outfmt 6 produces columns: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
e.g., for TSV parsing.

DATABASE SETUP

Use makeblastdb -in proteins.fasta -dbtype prot -out mydb to create local DBs.

HISTORY

Developed by NCBI; original BLAST (1990) by Altschul et al. BLAST+ (C++ rewrite) released 2009, current versions (2.14+) support modern features like better parallelism and cloud integration.

SEE ALSO

blastn(1), blastx(1), tblastn(1), tblastx(1), psiblast(1), makeblastdb(1)

Copied to clipboard