nextclade
analyzes viral genome sequences, assigning clades, calling mutations
TLDR
Analyze sequences
SYNOPSIS
nextclade [run] [dataset] [-i input] [-d dataset] [-o output] [options]
DESCRIPTION
nextclade analyzes viral genome sequences, assigning clades, calling mutations, and assessing sequence quality. It's widely used for SARS-CoV-2 surveillance.
The tool aligns sequences against a reference genome, identifies mutations (substitutions, insertions, deletions), and assigns sequences to clades in the phylogenetic tree.
Quality control metrics flag potential problems: missing data, mixed bases, frameshifts, stop codons, and unusual mutations. These help identify sequencing errors or contamination.
Datasets contain reference sequences, gene annotations, and clade definitions. Pre-built datasets are available for major pathogens. Custom datasets can be created.
Output includes detailed mutation lists, clade assignments, and quality scores. Results can be visualized or processed for epidemiological analysis.
Tree placement shows where sequences fit in the global phylogeny, useful for tracking outbreak origins.
PARAMETERS
run
Analyze sequences.dataset list
List available datasets.dataset get
Download dataset.-i FILE
Input FASTA file.-d NAME
Dataset name.-D DIR
Dataset directory.-o FILE
Output TSV file.--output-tree FILE
Output tree JSON.--output-fasta FILE
Output aligned FASTA.--output-json FILE
Output JSON results.-j N
Number of threads.--min-length N
Minimum sequence length.--include-reference
Include reference in outputs.
CAVEATS
Results depend on dataset quality. Novel clades may not be assigned correctly. Large datasets need significant memory. Some features are pathogen-specific.
HISTORY
Nextclade was developed at the Nextstrain project by Cornelius Roemer and others, starting around 2020 during the COVID-19 pandemic. It provides rapid clade assignment and quality control for genomic surveillance programs worldwide.
