LinuxCommandLibrary

nextclade

analyzes viral genome sequences, assigning clades, calling mutations

TLDR

Analyze sequences

$ nextclade run -i [sequences.fasta] -d [sars-cov-2]
copy
Analyze with output files
$ nextclade run -i [sequences.fasta] -d [sars-cov-2] -o [output.tsv]
copy
List available datasets
$ nextclade dataset list
copy
Download dataset
$ nextclade dataset get -n [sars-cov-2] -o [dataset/]
copy
Run with local dataset
$ nextclade run -i [sequences.fasta] -D [dataset/]
copy
Generate tree output
$ nextclade run -i [sequences.fasta] -d [sars-cov-2] --output-tree [tree.json]
copy
Output aligned sequences
$ nextclade run -i [sequences.fasta] -d [sars-cov-2] --output-fasta [aligned.fasta]
copy

SYNOPSIS

nextclade [run] [dataset] [-i input] [-d dataset] [-o output] [options]

DESCRIPTION

nextclade analyzes viral genome sequences, assigning clades, calling mutations, and assessing sequence quality. It's widely used for SARS-CoV-2 surveillance.
The tool aligns sequences against a reference genome, identifies mutations (substitutions, insertions, deletions), and assigns sequences to clades in the phylogenetic tree.
Quality control metrics flag potential problems: missing data, mixed bases, frameshifts, stop codons, and unusual mutations. These help identify sequencing errors or contamination.
Datasets contain reference sequences, gene annotations, and clade definitions. Pre-built datasets are available for major pathogens. Custom datasets can be created.
Output includes detailed mutation lists, clade assignments, and quality scores. Results can be visualized or processed for epidemiological analysis.
Tree placement shows where sequences fit in the global phylogeny, useful for tracking outbreak origins.

PARAMETERS

run

Analyze sequences.
dataset list
List available datasets.
dataset get
Download dataset.
-i FILE
Input FASTA file.
-d NAME
Dataset name.
-D DIR
Dataset directory.
-o FILE
Output TSV file.
--output-tree FILE
Output tree JSON.
--output-fasta FILE
Output aligned FASTA.
--output-json FILE
Output JSON results.
-j N
Number of threads.
--min-length N
Minimum sequence length.
--include-reference
Include reference in outputs.

CAVEATS

Results depend on dataset quality. Novel clades may not be assigned correctly. Large datasets need significant memory. Some features are pathogen-specific.

HISTORY

Nextclade was developed at the Nextstrain project by Cornelius Roemer and others, starting around 2020 during the COVID-19 pandemic. It provides rapid clade assignment and quality control for genomic surveillance programs worldwide.

SEE ALSO

nextalign(1), pangolin(1), mafft(1), minimap2(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard

> TERMINAL_GEAR

Curated for the Linux community