pdsh

Execute commands on multiple remote hosts

SYNOPSIS

pdsh [OPTIONS] [COMMAND]
pdsh -w host[,host...] [OPTIONS] [COMMAND]
pdsh -a [OPTIONS] [COMMAND]
pdsh -g groupname [OPTIONS] [COMMAND]

-w hosts
    Specify a comma-separated list of hosts, or a path to a file containing a list of hosts, to operate on.

-a
    Execute the command on all hosts in the default host group (as defined by the rcmd module or configuration).

-g groupname
    Execute the command on all hosts belonging to the specified group. Groups are typically defined in a configuration file.

-f fanout
    Set the maximum number of concurrent remote shell connections. Useful for controlling load on the local machine and network.

-R rcmd_type
    Specify the remote command service module to use (e.g., 'ssh', 'rsh', 'krb5'). 'ssh' is the most common default.

-l login_name
    Specify the username to log in as on the remote hosts.

-t timeout
    Set a timeout in seconds for the remote command to complete. If the command does not finish within this time, it is terminated.

-x exclude_hosts
    Exclude specific hosts from the list of target machines. Can be a comma-separated list or a file path.

-L
    Disable output aggregation, causing output to be displayed line-by-line as it arrives, prefixed by the hostname.

-S
    Return the maximum exit status of all remote commands executed. If all commands succeed, returns 0.

DESCRIPTION

pdsh is a high-performance, parallel remote shell utility designed for executing commands on multiple remote hosts simultaneously. It's primarily used in cluster management and high-performance computing (HPC) environments. Unlike sequential `ssh` loops, pdsh significantly improves efficiency by managing concurrent connections, aggregating output, and providing fine-grained control over execution.

It supports various remote command services (RCmds) through a modular architecture, with ssh being the most common default. pdsh is ideal for tasks such as deploying software, collecting system information, or performing maintenance operations across a large number of machines in parallel, making it an essential tool for system administrators of large server infrastructures.

CAVEATS

pdsh relies on pre-configured passwordless remote access (e.g., SSH keys) to target hosts. Without proper authentication setup, commands will fail. Its output aggregation can make it challenging to parse individual host failures without the -L option. It is a command execution tool, not a full-fledged configuration management system; it does not track or manage host state or ensure idempotency.

HOST SPECIFICATION

pdsh provides flexible ways to specify target hosts. The most common methods are a comma-separated list of hosts with -w, or using pre-defined host groups with -g. Host groups and default 'all' hosts are typically configured in files like /etc/pdsh/machines or /etc/pdsh/group, allowing for easy management of large clusters.

RCMD MODULES

pdsh employs a modular architecture for its remote command services (RCmds). This allows it to support various underlying protocols such as ssh, rsh, and krb5 (Kerberos). The desired module can be selected using the -R option, or it will default to a configured option, commonly ssh due to its security and widespread adoption.

HISTORY

pdsh was developed at Lawrence Livermore National Laboratory (LLNL) primarily for managing large-scale high-performance computing (HPC) clusters. Its design focuses on efficiency and scalability by implementing its own internal connection management and non-blocking I/O, allowing it to manage thousands of concurrent connections more effectively than simple looping `ssh` commands.