LinuxCommandLibrary

nextclade

Analyze viral sequences to track evolution

TLDR

Align sequences to user provided reference, outputting the alignment to a file

$ nextclade run [path/to/sequences.fa] [[-r|--input-ref]] [path/to/reference.fa] [[-o|--output-fasta]] [path/to/alignment.fa]
copy

Create a TSV report, auto-downloading the latest dataset
$ nextclade run [path/to/fasta] [[-d|--dataset-name]] [dataset_name] [[-t|--output-tsv]] [path/to/report.tsv]
copy

List all available datasets
$ nextclade dataset list
copy

Download the latest SARS-CoV-2 dataset
$ nextclade dataset get [[-n|--name]] sars-cov-2 [[-o|--output-dir]] [path/to/directory]
copy

Use a downloaded dataset, producing all outputs
$ nextclade run [[-D|--input-dataset]] [path/to/dataset_dir] [[-O|--output-all]] [path/to/output_dir] [path/to/sequences.fasta]
copy

Run on multiple files
$ nextclade run [[-d|--dataset-name]] [dataset_name] [[-t|--output-tsv]] [path/to/output_tsv] -- [path/to/input_fasta_1 path/to/input_fasta_2 ...]
copy

Try reverse complement if sequence does not align
$ nextclade run --retry-reverse-complement [[-d|--dataset-name]] [dataset_name] [[-t|--output-tsv]] [path/to/output_tsv] [path/to/input_fasta]
copy

SYNOPSIS

nextclade --input-fasta --output-dir

PARAMETERS

--input-fasta
    Path to the input FASTA file containing SARS-CoV-2 genome sequences.

--input-tree
    Path to the phylogenetic tree file to use.

--input-dataset
    Path to directory containing dataset files (e.g., sequences, reference genome, clade definitions).

--output-dir
    Path to the output directory where results will be saved.

--output-fasta
    Path to the output Aligned FASTA file. With aligned input sequences.

--output-tsv
    Path to the output TSV file. Contains per-sequence results (clade, mutations).

--output-json
    Path to the output JSON file. Contains per-sequence results in JSON format.

--output-tree
    Path to the output Newick tree file, with the new sequences placed in the tree. By default it won't generate tree.

--include-reference
    Include reference genome sequence in the output FASTA file.

--genes
    Comma-separated list of gene names to include in the analysis.

--extend-nucs
    Number of nucleotides to extend to each side of the mutations

--help
    Display help message and exit.

--version
    Display version information and exit.

DESCRIPTION

nextclade is a command-line tool designed for fast and accurate phylogenetic placement and mutation calling of SARS-CoV-2 genomes. It performs various steps including sequence alignment, phylogenetic tree placement, clade assignment, and identification of mutations (amino acid changes, insertions, deletions) relative to a reference genome. The tool outputs a comprehensive report containing information about the genome quality, mutations, and the assigned clade.

It leverages pre-computed phylogenetic trees and carefully curated datasets (e.g., sequences, annotations) to provide reproducible and standardized results. It's particularly useful for researchers and public health officials involved in genomic surveillance of SARS-CoV-2, providing insights into the evolution and spread of different viral lineages. The analysis is typically performed against a known set of clades, making it straightforward to track variants of concern and variants of interest. nextclade can be customized using configuration files, allowing users to adjust the parameters based on the specific research or surveillance requirements.

CAVEATS

Results depend heavily on the quality of the input sequences and the completeness of the reference dataset. Interpretation of the results requires understanding of phylogenetic principles and viral evolution.

INPUT DATA REQUIREMENTS

The input FASTA file should contain sequences that have been quality-controlled and trimmed to remove primer sequences. nextclade works best with complete or near-complete genomes.

DATASET CONFIGURATION

The dataset directory contains essential reference information, like reference genomes, clade definitions and phylogenetic tree. Ensure that it aligns with the region and timeframe being analyzed.

HISTORY

nextclade was developed to address the need for rapid and accurate analysis of SARS-CoV-2 genomes during the COVID-19 pandemic. Its development and usage have grown substantially with the emergence of new variants and the increasing importance of genomic surveillance. It is actively maintained and updated to incorporate new lineages and improve accuracy.

SEE ALSO

mafft(1), iqtree(1)

Copied to clipboard