sbcast
Broadcast messages across a cluster
TLDR
Send a file to all nodes allocated to the current job
Autodetect shared libraries the transmitted file depends upon and transmit them as well
SYNOPSIS
sbcast [OPTIONS] <source_file> <destination_file>
sbcast -L
Examples:
sbcast -f --preserve /path/to/my_executable /tmp/my_executable_local
sbcast --compress my_large_data.txt /scratch/local/data.txt
PARAMETERS
-L, --list
Lists the valid nodes for the current job that sbcast can target. This option does not perform a file transfer.
-f, --force
Overwrites the destination file if it already exists on the target nodes.
-p, --preserve
Preserves the original file's permissions, ownership, and modification timestamps on the destination nodes.
-v, --verbose
Increases the verbosity of the output, providing more information about the transfer process.
-d, --debug
Sets the logging level to debug, providing very detailed output for troubleshooting.
-q, --quiet
Suppresses all informational and warning messages, displaying only errors.
-m, --mode=<permissions>
Sets the permissions for the destination file using an octal representation (e.g., 0644).
--compress[=level]
Compresses the file during transfer to reduce network bandwidth usage. An optional compression level (1-9) can be specified, with 1 being fastest and 9 being best compression.
--remote-path=<path>
(Slurm 23.02+) Specifies an alternative destination path on the remote nodes. The <destination_file> argument then becomes just the filename, not the full path.
--help
Displays a help message and exits.
--version
Shows version information and exits.
DESCRIPTION
sbcast is a specialized command-line utility provided as part of the Slurm Workload Manager. Its primary function is to efficiently broadcast a single file from the launching host to all allocated compute nodes within an active Slurm job. Unlike traditional file transfer methods like scp or shared file systems, sbcast leverages a tree-based communication algorithm to distribute the file, significantly reducing network congestion and load on shared file systems, especially in large-scale parallel computing environments. This makes it ideal for scenarios where a large executable, input data file, or library needs to be quickly and uniformly distributed to hundreds or thousands of compute nodes running a parallel application. By using sbcast, users can avoid I/O bottlenecks that commonly arise when many processes simultaneously try to read the same file from a network file system. It ensures that each compute node receives a local copy of the specified file, optimizing application startup and execution performance.
CAVEATS
sbcast is designed exclusively for use within an active Slurm job allocation. It cannot be used to transfer files to arbitrary hosts or outside of a Slurm job context. The functionality of sbcast depends on the sbcast plugin being properly configured and enabled in the Slurm installation on the cluster. It is most beneficial for transferring large files to a significant number of nodes, where traditional methods might suffer from I/O contention. For small files or single-node transfers, its overhead might not provide a significant advantage.
USAGE CONTEXT
sbcast must be executed from within an active Slurm job allocation. This means you typically run it either directly from an srun interactive session or as a command inside an sbatch script after resources have been allocated. It leverages Slurm's internal communication mechanisms to target the specific nodes assigned to your job.
PERFORMANCE ADVANTAGES
The core strength of sbcast lies in its use of a tree-based broadcasting algorithm. Instead of each node fetching the file independently from a central source, sbcast arranges the transfers in a hierarchical, fan-out manner. This significantly reduces the load on the source and network, making it highly efficient for distributing large files to a vast number of compute nodes simultaneously, thereby accelerating job startup and overall execution.
SLURM PLUGIN REQUIREMENT
For sbcast to function, the sbcast plugin must be enabled and configured within the Slurm installation on your cluster. If this plugin is not present or enabled, the sbcast command will not work, or it will report an error.
HISTORY
sbcast emerged as a critical component of the Slurm Workload Manager to address a common bottleneck in large-scale high-performance computing (HPC) environments: efficient distribution of files to hundreds or thousands of compute nodes. Traditional shared file systems often become I/O contention points when numerous parallel processes attempt to access the same file concurrently. sbcast was developed to mitigate this by implementing a specialized tree-based broadcast algorithm, allowing files to be distributed rapidly and with minimal load on the central file servers. Its integration into Slurm underscores the system's commitment to optimizing performance for parallel applications by providing low-latency, high-bandwidth file distribution directly within the job allocation. It became increasingly important as cluster sizes grew and applications became more I/O intensive.