parallel
Run jobs in parallel using multiple CPUs
TLDR
Gzip several files at once, using all cores
Read arguments from stdin, run 4 jobs at once
Convert JPEG images to PNG using replacement strings
Parallel xargs, cram as many args as possible onto one command
Break stdin into ~1M blocks, feed each block to stdin of new command
Run on multiple machines via SSH
Download 4 files simultaneously from a text file containing links showing progress
Print the jobs which parallel is running in stderr
SYNOPSIS
parallel [options] [command [arguments...]] ::: [arguments...]
parallel [options] [command [arguments...]] :::+ [arguments...]
parallel [options] [command [arguments...]] -- [arguments...]
parallel [options] < input_file
some_command | parallel [options] [command [arguments...]]
PARAMETERS
-j N, --jobs N
Run N jobs in parallel. 0 means as many as possible (one per CPU core).
-n N, --max-args N
Use at most N arguments per command line. Similar to xargs -n.
--pipe
Input is read from stdin and passed as a single argument per job, line by line.
--pipe-part
Input is read from stdin and split into parts, each part becoming stdin for a job.
--rsh shell, --ssh shell
Use rsh or ssh for remote execution on specified hosts.
--workdir dir
Change to this directory on the remote or local host before executing the command.
--results dir
Store output, stderr, and exit status of each job in files within the specified directory.
--colsep regexp
Treat regexp as the argument separator in input lines. Default is whitespace.
--group
Group the output of each command together, printing it only after the command finishes.
--keep-order
Output results in the same order as the input arguments, possibly delaying output.
--dry-run
Print the commands that would be executed without actually running them.
--delay N
Wait N seconds between starting new jobs. Useful for preventing resource exhaustion.
--eta
Show estimated time of arrival for the entire job.
--no-notice
Suppress the initial 'To cite GNU Parallel...' notice.
--timeout N
Terminate a job if it runs longer than N seconds.
DESCRIPTION
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. It reads arguments from standard input or arguments given on the command line, and then executes commands for each argument. It can replace xargs and for loops for parallel processing, often drastically speeding up operations by utilizing all available CPU cores. parallel excels at distributing workloads across multiple processors, machines, or even remote systems via SSH. It provides robust features for managing job output, errors, and execution order, making it a versatile "swiss army knife" for various parallel computing tasks, from data processing to batch operations. Its flexibility allows users to control the number of simultaneous jobs, resource allocation, and even interact with other programs for complex workflows.
CAVEATS
Quoting and Escaping: Special characters and spaces in arguments often require careful quoting to ensure they are passed correctly to the executed command, especially when using shell interpretation.
Shell Interpretation: By default, parallel executes commands through a shell. This can lead to unexpected behavior if commands contain shell-specific syntax or unquoted variables. Using --shell /bin/sh -c
can help clarify.
Resource Consumption: While efficient, running too many parallel jobs without sufficient system resources (CPU, RAM, I/O) can lead to performance degradation or instability. Careful tuning of the --jobs
parameter is crucial.
Dependency: GNU parallel is primarily written in Perl and is usually a separate package that might not be installed by default on all Linux distributions.
INPUT SOURCES AND ARGUMENT HANDLING
parallel is incredibly flexible in handling input. It can take arguments directly on the command line using :::
, read them from standard input (one argument per line), or process files directly. Understanding how parallel handles arguments, particularly with the replacement strings like {}
(full argument), {.}
(basename without extension), {/}
(dirname), {#}
(job number), and {%}
(job percentage), is crucial for advanced usage. Proper quoting and argument transformation are essential for correctness.
REMOTE EXECUTION
A significant feature of parallel is its ability to execute commands on remote hosts via ssh or rsh. This allows for distributed computing without complex setup, using simple shell commands to spread a workload across multiple machines.
PROGRESS AND ETA
For long-running tasks, parallel provides excellent feedback, including progress bars and estimated time of arrival (with --eta
), which greatly aids in monitoring and managing large-scale operations.
HISTORY
GNU parallel was created by Ole Tange and first released around 2007. It was designed to fill a gap between xargs and more complex job schedulers, providing a simple yet powerful way to parallelize shell commands. Its development has been continuous, making it a very mature and feature-rich tool. It gained significant popularity due to its flexibility, ease of use, and ability to drastically speed up common command-line tasks by leveraging multi-core processors. It is widely adopted across various fields, from scientific computing to system administration.