LinuxCommandLibrary
GitHubF-DroidGoogle Play Store

needle

Needleman-Wunsch global pairwise sequence alignment (EMBOSS)

TLDR

Globally align two sequences from FASTA files
$ needle -asequence [seq1.fasta] -bsequence [seq2.fasta] -gapopen [10] -gapextend [0.5] -outfile [out.needle]
copy
Align by database accession (e.g. UniProt)
$ needle -asequence sp:[hba_human] -bsequence sp:[hbb_human] -gapopen [10] -gapextend [0.5] -outfile [result.needle]
copy
Use a specific scoring matrix
$ needle -asequence [a.fa] -bsequence [b.fa] -datafile [EBLOSUM62] -gapopen [10] -gapextend [0.5] -outfile [out.needle]
copy
Choose an alternative output format
$ needle -asequence [a.fa] -bsequence [b.fa] -gapopen [10] -gapextend [0.5] -aformat3 [markx10] -outfile [out.txt]
copy
Run non-interactively (no prompts)
$ needle -auto -asequence [a.fa] -bsequence [b.fa] -gapopen [10] -gapextend [0.5] -outfile [a_vs_b.needle]
copy

SYNOPSIS

needle -asequence seqfile -bsequence seqfile -gapopen f -gapextend f -outfile file [options]

DESCRIPTION

needle computes the optimal global pairwise alignment of two sequences using the Needleman-Wunsch dynamic programming algorithm. It ships as part of EMBOSS (European Molecular Biology Open Software Suite) and is intended for nucleotide or protein sequences of comparable length where the entire sequences should be aligned end-to-end.Gap-open and gap-extend penalties are mandatory parameters that shape the alignment, and a scoring matrix (BLOSUM, PAM, EDNAFULL, ...) determines how matches and mismatches are weighted. The output is a formatted alignment that reports score, length, percentage identity, similarity, and gap statistics; many alternative formats are available via -aformat3.For local alignment of subsequences use water; for very long sequences where memory is a concern use stretcher, which implements a linear-space variant of the algorithm.

PARAMETERS

-asequence file

First input sequence (single sequence, any EMBOSS-supported format).
-bsequence file
Second input sequence (one or many sequences to align against the first).
-gapopen float
Penalty for opening a gap (typical: 10.0 for proteins, 10.0 for DNA).
-gapextend float
Penalty for extending an existing gap (typical: 0.5).
-datafile matrix
Scoring matrix name (e.g. EBLOSUM62, EDNAFULL).
-endweight
Apply end-gap penalties (default: false; end gaps are free).
-outfile file
Path to the alignment report.
-aformat3 format
Output alignment format (pair, markx0...markx10, msf, fasta, ...).
-brief
Print a brief alignment summary instead of the full pairwise view.
-auto
Skip all interactive prompts (suitable for scripts).

CAVEATS

Time and memory complexity are O(m·n) in the lengths of the two sequences, so needle is not appropriate for very long sequences — use stretcher instead. Option syntax is EMBOSS-specific (long names introduced by a single dash) and is not interchangeable with GNU-style flags. End gaps are free by default; enable -endweight if you want them penalized.

HISTORY

needle was written by Alan Bleasby as part of EMBOSS, a project started in 1996 at the Sanger Centre / MRC to provide an open, integrated suite of bioinformatics tools. The Needleman-Wunsch algorithm itself was published in 1970 by Saul B. Needleman and Christian D. Wunsch.

SEE ALSO

water(1), stretcher(1), matcher(1), blastp(1)

Copied to clipboard
Kai