nextclade
Analyze viral sequences to track evolution
TLDR
Align sequences to user provided reference, outputting the alignment to a file
Create a TSV report, auto-downloading the latest dataset
List all available datasets
Download the latest SARS-CoV-2 dataset
Use a downloaded dataset, producing all outputs
Run on multiple files
Try reverse complement if sequence does not align
SYNOPSIS
nextclade <SUBCOMMAND> [OPTIONS]
PARAMETERS
run
Main analysis subcommand for sequences
dataset
Manage analysis datasets (get, list, update)
--help, -h
Print help information
--version, -V
Print version information
--dataset-name <NAME>
Dataset name (e.g. 'sars-cov-2')
--dataset-url <URL>
URL to zip dataset
--input-sequences <PATH>
Input FASTA/FASTQ sequences
--input-reference <PATH>
Custom reference FASTA
--output-tsv <PATH>
Tab-separated results
--output-csv <PATH>
Comma-separated results
--output-json <PATH>
JSON results
--output-fasta <PATH>
Aligned FASTA output
--output-tree <PATH>
Phylogenetic tree (Nexus)
--threads <N>
Number of CPU threads
--include-endpoint-mutations
Include mutations outside gene regions
--output-basename <STR>
Base name for all outputs
DESCRIPTION
Nextclade is a fast, scalable command-line tool for analyzing viral genomes, especially SARS-CoV-2. It processes FASTA/FASTQ input sequences to perform:
• Clade assignment (Nextstrain, WHO)
• Mutation calling (nucleotide/aa substitutions)
• Quality control (scoring missing data, divergences)
• Alignment to reference genomes
• Pango lineage inference
Using predefined datasets (genes, references, trees), it outputs results in TSV, CSV, JSON, aligned FASTA, and phylogenetic trees (Nexus/Newick). Supports multi-threading for high-throughput surveillance.
Part of the Nextstrain ecosystem, it's used globally for COVID-19 tracking. Datasets auto-update for latest variants. Install via Conda, Cargo, or binaries. Handles thousands of sequences efficiently on standard hardware.
CAVEATS
Requires internet for default datasets; download locally for offline use.
Large inputs may need significant RAM (>8GB recommended).
Primarily optimized for SARS-CoV-2; check dataset compatibility for other viruses.
DATASETS
Prebuilt collections (reference, genes, tree) for viruses. Use nextclade dataset get --name sars-cov-2 to download.
OUTPUTS
Core TSV/JSON fields: clade, pango_lineage, qc_overall_score, substitutions, aa_substitutions.
INSTALLATION
Via Conda: conda install -c bioconda nextclade; Cargo: cargo install nextclade; or prebuilt binaries.
HISTORY
Developed by Nextstrain team in 2020 for COVID-19 surveillance. First release coincided with pandemic; evolved with variants (Alpha to Omicron+). Weekly dataset updates since 2021. Version 2.x introduced multi-virus support.
SEE ALSO
nextalign(1), pangolin(1), iqtree(1)


