nextclade
Analyze viral sequences to track evolution
TLDR
Align sequences to user provided reference, outputting the alignment to a file
Create a TSV report, auto-downloading the latest dataset
List all available datasets
Download the latest SARS-CoV-2 dataset
Use a downloaded dataset, producing all outputs
Run on multiple files
Try reverse complement if sequence does not align
SYNOPSIS
nextclade --input-fasta
PARAMETERS
--input-fasta
Path to the input FASTA file containing SARS-CoV-2 genome sequences.
--input-tree
Path to the phylogenetic tree file to use.
--input-dataset
Path to directory containing dataset files (e.g., sequences, reference genome, clade definitions).
--output-dir
Path to the output directory where results will be saved.
--output-fasta
Path to the output Aligned FASTA file. With aligned input sequences.
--output-tsv
Path to the output TSV file. Contains per-sequence results (clade, mutations).
--output-json
Path to the output JSON file. Contains per-sequence results in JSON format.
--output-tree
Path to the output Newick tree file, with the new sequences placed in the tree. By default it won't generate tree.
--include-reference
Include reference genome sequence in the output FASTA file.
--genes
Comma-separated list of gene names to include in the analysis.
--extend-nucs
Number of nucleotides to extend to each side of the mutations
--help
Display help message and exit.
--version
Display version information and exit.
DESCRIPTION
nextclade is a command-line tool designed for fast and accurate phylogenetic placement and mutation calling of SARS-CoV-2 genomes. It performs various steps including sequence alignment, phylogenetic tree placement, clade assignment, and identification of mutations (amino acid changes, insertions, deletions) relative to a reference genome. The tool outputs a comprehensive report containing information about the genome quality, mutations, and the assigned clade.
It leverages pre-computed phylogenetic trees and carefully curated datasets (e.g., sequences, annotations) to provide reproducible and standardized results. It's particularly useful for researchers and public health officials involved in genomic surveillance of SARS-CoV-2, providing insights into the evolution and spread of different viral lineages. The analysis is typically performed against a known set of clades, making it straightforward to track variants of concern and variants of interest. nextclade can be customized using configuration files, allowing users to adjust the parameters based on the specific research or surveillance requirements.
CAVEATS
Results depend heavily on the quality of the input sequences and the completeness of the reference dataset. Interpretation of the results requires understanding of phylogenetic principles and viral evolution.
INPUT DATA REQUIREMENTS
The input FASTA file should contain sequences that have been quality-controlled and trimmed to remove primer sequences. nextclade works best with complete or near-complete genomes.
DATASET CONFIGURATION
The dataset directory contains essential reference information, like reference genomes, clade definitions and phylogenetic tree. Ensure that it aligns with the region and timeframe being analyzed.
HISTORY
nextclade was developed to address the need for rapid and accurate analysis of SARS-CoV-2 genomes during the COVID-19 pandemic. Its development and usage have grown substantially with the emergence of new variants and the increasing importance of genomic surveillance. It is actively maintained and updated to incorporate new lineages and improve accuracy.
SEE ALSO
mafft(1), iqtree(1)