LinuxCommandLibrary

parallel

Run jobs in parallel using multiple CPUs

TLDR

Gzip several files at once, using all cores

$ parallel gzip ::: [path/to/file1 path/to/file2 ...]
copy

Read arguments from stdin, run 4 jobs at once
$ ls *.txt | parallel [[-j|--jobs]] 4 gzip
copy

Convert JPEG images to PNG using replacement strings
$ parallel convert {} {.}.png ::: *.jpg
copy

Parallel xargs, cram as many args as possible onto one command
$ [args] | parallel -X [command]
copy

Break stdin into ~1M blocks, feed each block to stdin of new command
$ cat [big_file.txt] | parallel --pipe --block 1M [command]
copy

Run on multiple machines via SSH
$ parallel [[-S|--sshlogin]] [machine1],[machine2] [command] ::: [arg1] [arg2]
copy

Download 4 files simultaneously from a text file containing links showing progress
$ parallel [[-j|--jobs]] 4 --bar --eta wget [[-q|--quote]] {} :::: [path/to/links.txt]
copy

Print the jobs which parallel is running in stderr
$ parallel [[-t|--verbose]] [command] ::: [args]
copy

SYNOPSIS

parallel [options] command ::: arguments ...

PARAMETERS

--jobs N
    Run N jobs in parallel. Defaults to the number of CPU cores available. Using 0 or 'auto' will detect the number of CPUs automatically.

command
    The command to be executed in parallel.

::: arguments
    The arguments to be passed to the command. These are distributed among the parallel processes.

-a file
    Read arguments from the specified file, one argument per line.

--dry-run
    Print the commands that would be executed without actually running them.

--delay SEC
    Wait SEC seconds before starting each job.

--sshlogin server
    Run jobs on remote servers via SSH.

--pipe
    Pipe input to all commands in parallel.

--will-cite
    Prints a citation for GNU Parallel. Please cite GNU Parallel when using it.

DESCRIPTION

The parallel command is a powerful shell tool that allows you to execute commands in parallel, utilizing multiple CPU cores to significantly speed up tasks that can be broken down into smaller, independent workloads. It reads commands from standard input or a file and executes them concurrently. It intelligently distributes the workload across available processors, maximizing efficiency. This is particularly useful for tasks such as image processing, video encoding, data analysis, and batch processing. It is a powerful tool for automating and accelerating workflows that would otherwise be time-consuming if executed sequentially.
Parallel provides sophisticated control over the number of jobs run concurrently, resource usage, and error handling. It is a flexible and versatile tool that can be integrated into complex pipelines and scripts to improve overall performance.

CAVEATS

Parallel can consume significant resources (CPU, memory) if not used carefully. Ensure that the number of jobs run concurrently does not overload the system. Also, be mindful of file system limitations (e.g., too many open files) when dealing with a large number of files.

ERROR HANDLING

Parallel provides options for handling errors, such as stopping execution upon the first error or continuing despite errors. Use the '--halt soon,fail=1' option to halt immediately upon the first error, or '--keep-order' to ensure output is presented in the same order as input.

RESOURCE MANAGEMENT

Use '--load N' to prevent starting new jobs when the load average is above N. This helps to avoid overloading the system and ensures that other processes have sufficient resources.

RETURN VALUE

The return value of parallel is the maximum of the return values of the commands that were run.

HISTORY

Parallel was developed by Ole Tange and is actively maintained. It emerged as a powerful alternative to tools like xargs, offering more advanced features and control over parallel execution. Its usage has grown significantly in various fields due to its ability to accelerate computationally intensive tasks. The project is open-source, with ongoing improvements and community contributions.

SEE ALSO

xargs(1), make(1), find(1)

Copied to clipboard