sinfo
Report cluster partition and node information
TLDR
Show a quick summary overview of the cluster
View the detailed status of all partitions across the entire cluster
View the detailed status of a specific partition
View information about idle nodes
Summarise dead nodes
List dead nodes and the reasons why
SYNOPSIS
sinfo [OPTIONS]
sinfo provides a flexible interface to query the state of Slurm partitions and nodes. Users typically combine options to filter and format the output according to their needs.
PARAMETERS
-a, --all
Displays information about all partitions, including hidden or unavailable ones.
-d, --dead
Only show information about dead nodes.
-e, --exact
Report exact node state (e.g., idle*, alloc*).
-i
Repeatedly execute and display output every specified seconds.
-l, --long
Provides a long format output, including partition time limits, job limits, and node counts.
-L, --lcm
Similar to --long, but also shows the current Node Leader/Compute Matrix (LCM) and other detailed node features.
-M
Specify the cluster(s) to query in a federated setup.
-n
Filter output to specific nodes by name.
-N, --node
Displays information in a node-centric format rather than partition-centric.
-o
Specify a custom output format using field specifiers.
-p
Filter output to specific partition(s) by name.
-R, --responding
Only display information for responding nodes.
-s, --short
Provides a compact, short format output.
-t
Filter output to nodes in a specific state (e.g., IDLE, ALLOC, DOWN).
-v, --verbose
Increases the verbosity of the output, showing more details.
-V, --version
Prints the version number of sinfo.
DESCRIPTION
sinfo is a command-line utility within the Slurm Workload Manager suite that provides detailed information about partitions and nodes managed by Slurm. It allows users and administrators to quickly assess the current state, availability, capacity, and configuration of computing resources. This command is crucial for understanding where jobs can be submitted, identifying resource bottlenecks, and monitoring the health of the Slurm cluster. It can display aggregated partition information or detailed node-level specifics, making it an indispensable tool for cluster management and job planning. Its flexibility in filtering and formatting output makes it valuable for both interactive queries and scripting.
CAVEATS
sinfo is part of the Slurm Workload Manager and will only function on systems where Slurm is installed and configured. Its output is highly dependent on the current state and configuration of the Slurm cluster. Users may have restricted views based on Slurm's security policies. The interpretation of node states can sometimes be nuanced, requiring familiarity with Slurm's state definitions.
CUSTOM OUTPUT FORMATS (-O OPTION)
The -o or --format option is extremely powerful, allowing users to define exactly which information fields are displayed and in what order. This is achieved by providing a string of percent signs followed by field specifiers (e.g., %P for Partition Name, %R for Reason, %t for Node State). This flexibility enables scripting and integration with other monitoring tools, making sinfo highly adaptable for custom reporting.
UNDERSTANDING NODE STATES
sinfo reports various node states that are critical for cluster understanding. Common states include IDLE (ready to accept jobs), ALLOCATED or ALLOC (running one or more jobs), MIXED (some CPUs allocated, some idle), DOWN (not available), DRAIN (being drained of jobs, new jobs not accepted), and MAINT (in maintenance mode). An asterisk (*) often indicates a node is currently being powered down or has a special reason.
HISTORY
The Slurm Workload Manager, which includes sinfo, originated at Lawrence Livermore National Laboratory (LLNL) in 2002. It was designed as a lightweight, scalable, and fault-tolerant job scheduler for large Linux clusters. sinfo specifically emerged as a core utility to provide users and administrators with real-time insights into the state of the cluster's partitions and nodes, a fundamental requirement for effective high-performance computing (HPC) resource management.