blastp
Search protein databases for sequence similarity
TLDR
Align two or more sequences using blastp, with the e-value threshold of 1e-9, pairwise output format, output to screen
Align two or more sequences using blastp-fast
Align two or more sequences, custom tabular output format, output to file
Search protein databases using a protein query, 16 threads to use in the BLAST search, with a maximum number of 10 aligned sequences to keep
Search the remote non-redundant protein database using a protein query
Display help (use -help for detailed help)
SYNOPSIS
blastp -query <File> -db <String> [-out <File>] [options...]
PARAMETERS
-query <File>
Input query protein FASTA file
-db <String>
Pre-formatted protein database name
-out <File>
Output file (default: stdout)
-evalue <Real>
Expectation value cutoff (default: 10.0)
-outfmt <Integer>
Output format (e.g., 0=pairwise, 6=tabular)
-max_target_seqs <Integer>
Max target sequences (default: 10)
-num_threads <Integer>
Number of CPU threads
-max_hsps <Integer>
Max HSPs per subject (default: 0=unlimited)
-qcov_hsp_perc <Integer>
Query coverage per HSP (%)
-perc_identity <Real>
Percent identity cutoff
-ungapped
Perform ungapped alignments
-dbtype <String>
Database type (e.g., prot)
-query_gencode <Integer>
Genetic code for query (default: 1)
-subject_gencode <Integer>
Genetic code for database
-matrix <String>
Scoring matrix (default: BLOSUM62)
-threshold <Integer>
Word threshold (default: 11)
-word_size <Integer>
Word size (default: 3)
-gapopen <Integer>
Gap opening penalty
-gapextend <Integer>
Gap extension penalty
-seg <String>
Low-complexity filtering (default: yes)
-soft_masking <Boolean>
Apply soft masking
-help
Print help
-version
Print version info
DESCRIPTION
BLASTP (Basic Local Alignment Search Tool for Proteins) is a command-line program from the NCBI BLAST+ suite used to compare a protein query sequence against a protein sequence database, identifying regions of local similarity.
It employs a heuristic algorithm to rapidly find statistically significant matches, reporting alignments with scores, E-values, and identities. Ideal for annotating protein function, discovering homologs, or evolutionary studies.
Key steps: Provide a FASTA query file and a pre-formatted protein database (e.g., nr, swissprot via makeblastdb). Outputs include pairwise alignments, tabular data, or XML.
Computationally efficient with multi-threading support, but large databases require significant RAM and time. E-value thresholds control false positives; lower is stricter.
CAVEATS
Requires pre-formatted databases via makeblastdb; large DBs (e.g., nr) need 100s GB RAM/disk. Not for nucleotide searches. E-values depend on DB size. Multi-threading may not scale linearly.
TABULAR OUTPUT EXAMPLE
-outfmt 6 produces columns: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
e.g., for TSV parsing.
DATABASE SETUP
Use makeblastdb -in proteins.fasta -dbtype prot -out mydb to create local DBs.
HISTORY
Developed by NCBI; original BLAST (1990) by Altschul et al. BLAST+ (C++ rewrite) released 2009, current versions (2.14+) support modern features like better parallelism and cloud integration.
SEE ALSO
blastn(1), blastx(1), tblastn(1), tblastx(1), psiblast(1), makeblastdb(1)


