nextflow
Run and manage computational pipelines
TLDR
Run a pipeline, use cached results from previous runs
Run a specific release of a remote workflow from GitHub
Run with a given work directory for intermediate files, save execution report
Show details of previous runs in current directory
Remove cache and intermediate files for a specific run
List all downloaded projects
Pull the latest version of a remote workflow from Bitbucket
Update Nextflow
SYNOPSIS
nextflow <command> [<options>] [<arguments>]
nextflow run <pipeline> [<options>] [<params>]
PARAMETERS
-h, --help
Displays help information for the command or global options.
-v, --version
Shows the Nextflow version number.
-C <file>, --config <file>
Loads an additional configuration file for the pipeline execution.
-bg, --background
Runs the Nextflow pipeline execution in the background.
-q, --quiet
Suppresses informational messages, showing only warnings or errors.
-ansi <mode>
Enables or disables ANSI console output colors (e.g., 'true', 'false', 'auto').
-r <revision>, --revision <revision>
Specifies the pipeline version to execute (e.g., Git branch, tag, or commit ID).
-profile <name>
Specifies one or more configuration profiles to apply (e.g., 'docker', 'conda', 'slurm').
-params-file <file>
Loads pipeline parameters from a YAML or JSON file.
-entry <name>
Specifies the entry point (workflow) to execute within the pipeline.
-resume
Resumes a previous pipeline execution from the last successful checkpoint.
-w <dir>, --work-dir <dir>
Sets the pipeline's work directory where intermediate files are stored.
-name <name>
Assigns a custom name to the pipeline execution, useful for tracking.
-latest
Forces the download of the latest revision of the pipeline from a repository.
-offline
Disables internet access for pipeline execution (e.g., no Git fetch or container pull).
-stub
Creates empty stub output files for processes, useful for testing pipeline logic.
-with-docker <image>
Executes processes using the specified Docker image for containerization.
-with-singularity <image>
Executes processes using the specified Singularity image for containerization.
-with-conda <env>
Enables Conda environment management for process dependencies.
-with-tower
Enables monitoring and logging of the pipeline execution with Nextflow Tower.
--container-engine <engine>
Specifies the container engine to use (e.g., 'docker', 'singularity', 'podman').
DESCRIPTION
Nextflow is an open-source workflow management system designed for building and deploying complex computational pipelines. It excels at handling large-scale data processing in a reproducible and portable manner across various computing environments. Built on the reactive programming paradigm, Nextflow simplifies the development of parallel and distributed processes. It manages dependencies, tracks outputs, and automatically retries failed tasks, ensuring robust execution. A core feature is its ability to seamlessly integrate with container technologies like Docker and Singularity, and resource managers such as Slurm, SGE, Kubernetes, and AWS Batch, enabling pipelines to run consistently from a local machine to a high-performance computing cluster or cloud environment. It's widely adopted in bioinformatics for its efficiency and reproducibility.
CAVEATS
While powerful, Nextflow has a learning curve, especially when writing complex pipelines or debugging issues related to executor and container environments. Resource management and optimization can require significant effort to get optimal performance on diverse infrastructures. The DSL2 (Domain Specific Language 2) for pipeline definition, while expressive, requires familiarity with Groovy-like syntax and concepts.
NEXTFLOW DSL2 AND MODULES
Nextflow pipelines are defined using a Groovy-based Domain Specific Language (DSL). DSL2, introduced in Nextflow version 20.07.0, revolutionized pipeline development by enabling the creation of modular components (processes and workflows) that can be easily shared, reused, and composed into larger pipelines. This greatly enhances pipeline organization, maintainability, and collaboration.
CONFIGURATION FILES
Nextflow uses a powerful configuration system, primarily via .config files. These files allow users to define various settings like executor types, container images, resource requests (CPU, memory), and custom parameters. Configurations can be layered, with system-wide, project-specific, and user-specific settings overriding each other, providing immense flexibility for different execution environments.
CHANNELS
The core mechanism for data flow in Nextflow is the "channel." Channels are asynchronous queues that connect processes, allowing them to exchange data without direct file system interaction. This reactive data-flow model is fundamental to Nextflow's ability to parallelize tasks efficiently and manage dependencies.
HISTORY
Nextflow was initially developed by Paolo Di Tommaso at the Centre for Genomic Regulation (CRG) in Barcelona, Spain, and publicly released in 2013. It emerged from the need for a more robust and reproducible way to manage complex data analysis pipelines in genomics. Its key innovation was the adoption of a reactive, data-flow programming model, inspired by functional programming, which made pipeline parallelization and error handling more intuitive. The introduction of DSL2 in recent years significantly improved pipeline modularity and reusability, further solidifying its position as a leading tool in scientific computing, particularly in bioinformatics and life sciences.
SEE ALSO
snakemake(1), cwltool(1), make(1), docker(1), singularity(1)