parallel
Run jobs in parallel using multiple CPUs
TLDR
Gzip several files at once, using all cores
Read arguments from stdin, run 4 jobs at once
Convert JPEG images to PNG using replacement strings
Parallel xargs, cram as many args as possible onto one command
Break stdin into ~1M blocks, feed each block to stdin of new command
Run on multiple machines via SSH
Download 4 files simultaneously from a text file containing links showing progress
Print the jobs which parallel is running in stderr
SYNOPSIS
parallel [options] command ::: arguments ...
PARAMETERS
--jobs N
Run N jobs in parallel. Defaults to the number of CPU cores available. Using 0 or 'auto' will detect the number of CPUs automatically.
command
The command to be executed in parallel.
::: arguments
The arguments to be passed to the command. These are distributed among the parallel processes.
-a file
Read arguments from the specified file, one argument per line.
--dry-run
Print the commands that would be executed without actually running them.
--delay SEC
Wait SEC seconds before starting each job.
--sshlogin server
Run jobs on remote servers via SSH.
--pipe
Pipe input to all commands in parallel.
--will-cite
Prints a citation for GNU Parallel. Please cite GNU Parallel when using it.
DESCRIPTION
The parallel command is a powerful shell tool that allows you to execute commands in parallel, utilizing multiple CPU cores to significantly speed up tasks that can be broken down into smaller, independent workloads. It reads commands from standard input or a file and executes them concurrently. It intelligently distributes the workload across available processors, maximizing efficiency. This is particularly useful for tasks such as image processing, video encoding, data analysis, and batch processing. It is a powerful tool for automating and accelerating workflows that would otherwise be time-consuming if executed sequentially.
Parallel provides sophisticated control over the number of jobs run concurrently, resource usage, and error handling. It is a flexible and versatile tool that can be integrated into complex pipelines and scripts to improve overall performance.
CAVEATS
Parallel can consume significant resources (CPU, memory) if not used carefully. Ensure that the number of jobs run concurrently does not overload the system. Also, be mindful of file system limitations (e.g., too many open files) when dealing with a large number of files.
ERROR HANDLING
Parallel provides options for handling errors, such as stopping execution upon the first error or continuing despite errors. Use the '--halt soon,fail=1' option to halt immediately upon the first error, or '--keep-order' to ensure output is presented in the same order as input.
RESOURCE MANAGEMENT
Use '--load N' to prevent starting new jobs when the load average is above N. This helps to avoid overloading the system and ensures that other processes have sufficient resources.
RETURN VALUE
The return value of parallel is the maximum of the return values of the commands that were run.
HISTORY
Parallel was developed by Ole Tange and is actively maintained. It emerged as a powerful alternative to tools like xargs, offering more advanced features and control over parallel execution. Its usage has grown significantly in various fields due to its ability to accelerate computationally intensive tasks. The project is open-source, with ongoing improvements and community contributions.