LinuxCommandLibrary

scontrol

Manage and monitor Slurm cluster

TLDR

Show information for job

$ scontrol show job [job_id]
copy

Suspend a comma-separated list of running jobs
$ scontrol suspend [job_id1,job_id2,...]
copy

Resume a comma-separated list of suspended jobs
$ scontrol resume [job_id1,job_id2,...]
copy

Hold a comma-separated list of queued jobs (Use release command to permit the jobs to be scheduled)
$ scontrol hold [job_id1,job_id2,...]
copy

Release a comma-separated list of suspended job
$ scontrol release [job_id1,job_id2,...]
copy

SYNOPSIS

scontrol provides a general syntax but operates primarily through subcommands.
scontrol [OPTIONS...] <COMMAND> [COMMAND_OPTIONS...] [ARGUMENTS...]
Examples of common commands include show, update, hold, release, reconfigure, etc.

PARAMETERS

-h, --help
    Display a help message for scontrol or a specific subcommand.

-V, --version
    Display version information.

-Q, --quiet
    Do not print informational messages, only error messages.

-M, --clusters=<string>
    Clusters to issue commands to (for federated clusters).

-d, --debug
    Produce more detailed output.

show <object>
    Display information about various Slurm objects (e.g., show node, show job, show part).

update <object>
    Modify the state of Slurm objects (e.g., update node, update job, update reservation).

hold <jobid>
    Place a job or a set of jobs in a held state.

release <jobid>
    Release a held job or set of jobs.

reconfigure
    Force slurmctld and slurmd daemons to re-read their configuration files.

drain <node_name>
    Set a node's state to DRAIN, preventing new jobs.

resume <node_name>
    Set a node's state to RESUME (opposite of DRAIN), allowing new jobs.

DESCRIPTION

scontrol is a powerful command-line utility used to view and modify the state of the Slurm Workload Manager. It allows administrators and privileged users to inspect various Slurm objects such as nodes, partitions, jobs, steps, and reservations. Furthermore, scontrol can be used to update object states (e.g., drain a node, hold a job, create a reservation) or reconfigure Slurm components dynamically. It serves as a primary interface for fine-grained control over the Slurm cluster, essential for cluster management and troubleshooting.

CAVEATS

scontrol requires appropriate user permissions (often root or Slurm administrator privileges) to modify cluster state. Misuse can disrupt cluster operations or affect running jobs. Its output format can be verbose and sometimes requires parsing for scripting purposes, though --json option is available in newer versions for specific subcommands.

INTERACTIVE VS. SCRIPTING USE

scontrol is used both interactively by administrators for immediate cluster management tasks and within scripts for automated operations like node maintenance, job management, or cluster health checks.

OUTPUT FORMATS

While its default output is human-readable, many scontrol show commands support --json for programmatic parsing, providing structured data for automation.

PERMISSIONS

Most scontrol modification commands require administrator (SlurmUser, root, or specific privileges) permissions to execute, while show commands can often be run by any user to query public information.

HISTORY

scontrol is an integral part of the Slurm Workload Manager, which originated in 2002 at Lawrence Livermore National Laboratory (LLNL) as a free, open-source cluster management and job scheduling system. It was designed to provide a highly scalable and fault-tolerant alternative to existing proprietary solutions. scontrol has continuously evolved with Slurm, adding new functionalities and improving its command-line interface to manage increasingly complex cluster environments and features like federated clusters and burst buffering. Its development focuses on robust control and monitoring capabilities for high-performance computing (HPC) clusters.

SEE ALSO

sbatch(1), srun(1), salloc(1), sacct(1), squeue(1), sinfo(1), slurm.conf(5)

Copied to clipboard