slurmd
Manage compute node resources for Slurm
TLDR
Report node rebooted when daemon restarted (Used for testing purposes)
Run the daemon with the given nodename
Write log messages to the specified file
Read configuration from the specified file
Display help
SYNOPSIS
slurmd [OPTIONS]
PARAMETERS
-c, --conf=
Specify an alternate slurm.conf configuration file path.
-D, --debug
Run in debug mode, logging extensively and preventing daemonization.
-f, --foreground
Do not daemonize; run slurmd in the foreground. Useful for debugging.
-h, --help
Display a brief help message and exit.
-L, --locate
Locate and print the paths of Slurm configuration files.
-M, --man
Display the slurmd manual page using the man utility.
-v, --version
Display the slurmd version number and exit.
DESCRIPTION
The slurmd command launches the Slurm daemon on a compute node, making it an essential component of any Slurm cluster. Its primary role is to manage and monitor tasks on that specific node, accepting job steps dispatched by the slurmctld (the Slurm controller daemon). It runs continuously in the background, typically initiated during system boot via a service manager like systemd.
slurmd is responsible for launching user applications, enforcing resource limits defined in the job script or partition configuration, and tracking the status of running jobs and job steps. It communicates regularly with slurmctld to report node health, job status, and resource utilization. Without a running slurmd on a node, that node cannot accept or execute Slurm jobs, effectively isolating it from the cluster's compute capabilities. It plays a critical role in resource allocation, job scheduling, and ensuring the smooth execution of high-performance computing workloads.
CAVEATS
Proper configuration via slurm.conf(5) is paramount for slurmd to function correctly. Ensure network connectivity to the slurmctld daemon and appropriate permissions for Slurm's log and spool directories. Misconfigurations can lead to nodes not joining the cluster or jobs failing to launch.
PRIMARY CONFIGURATION
While slurmd has command-line options, most of its operational parameters, such as node resources, logging, and communication with slurmctld, are configured through the slurm.conf(5) file. This file is typically located at /etc/slurm/slurm.conf and must be consistent across all Slurm daemons.
TYPICAL STARTUP
slurmd is almost always started as a system service (e.g., using systemd on modern Linux distributions) at boot time. It's rarely invoked directly from the command line by an administrator for normal operation, except for debugging purposes.
HISTORY
Slurm (Simple Linux Utility for Resource Management) was originally developed by Lawrence Livermore National Laboratory. slurmd has been a core, fundamental component of the Slurm workload manager since its inception, evolving with the project to support modern cluster architectures and resource management paradigms.