LinuxCommandLibrary

slurmd

Manage compute node resources for Slurm

TLDR

Report node rebooted when daemon restarted (Used for testing purposes)

$ slurmd -b
copy

Run the daemon with the given nodename
$ slurmd -N [nodename]
copy

Write log messages to the specified file
$ slurmd -L [path/to/output_file]
copy

Read configuration from the specified file
$ slurmd -f [path/to/file]
copy

Display help
$ slurmd -h
copy

SYNOPSIS

slurmd [OPTIONS]

PARAMETERS

-c, --conf=
    Specify an alternate slurm.conf configuration file path.

-D, --debug
    Run in debug mode, logging extensively and preventing daemonization.

-f, --foreground
    Do not daemonize; run slurmd in the foreground. Useful for debugging.

-h, --help
    Display a brief help message and exit.

-L, --locate
    Locate and print the paths of Slurm configuration files.

-M, --man
    Display the slurmd manual page using the man utility.

-v, --version
    Display the slurmd version number and exit.

DESCRIPTION

The slurmd command launches the Slurm daemon on a compute node, making it an essential component of any Slurm cluster. Its primary role is to manage and monitor tasks on that specific node, accepting job steps dispatched by the slurmctld (the Slurm controller daemon). It runs continuously in the background, typically initiated during system boot via a service manager like systemd.

slurmd is responsible for launching user applications, enforcing resource limits defined in the job script or partition configuration, and tracking the status of running jobs and job steps. It communicates regularly with slurmctld to report node health, job status, and resource utilization. Without a running slurmd on a node, that node cannot accept or execute Slurm jobs, effectively isolating it from the cluster's compute capabilities. It plays a critical role in resource allocation, job scheduling, and ensuring the smooth execution of high-performance computing workloads.

CAVEATS

Proper configuration via slurm.conf(5) is paramount for slurmd to function correctly. Ensure network connectivity to the slurmctld daemon and appropriate permissions for Slurm's log and spool directories. Misconfigurations can lead to nodes not joining the cluster or jobs failing to launch.

PRIMARY CONFIGURATION

While slurmd has command-line options, most of its operational parameters, such as node resources, logging, and communication with slurmctld, are configured through the slurm.conf(5) file. This file is typically located at /etc/slurm/slurm.conf and must be consistent across all Slurm daemons.

TYPICAL STARTUP

slurmd is almost always started as a system service (e.g., using systemd on modern Linux distributions) at boot time. It's rarely invoked directly from the command line by an administrator for normal operation, except for debugging purposes.

HISTORY

Slurm (Simple Linux Utility for Resource Management) was originally developed by Lawrence Livermore National Laboratory. slurmd has been a core, fundamental component of the Slurm workload manager since its inception, evolving with the project to support modern cluster architectures and resource management paradigms.

SEE ALSO

slurmctld(8), slurm.conf(5), sbatch(1), srun(1), squeue(1), sacct(1)

Copied to clipboard