sinfo

Report cluster partition and node information

TLDR

Show a quick summary overview of the cluster

$ sinfo [[-s|--summarize]]

View the detailed status of all partitions across the entire cluster

$ sinfo

View the detailed status of a specific partition

$ sinfo [[-p|--partition]] [partition_name]

View information about idle nodes

$ sinfo [[-t|--states]] [idle]

Summarise dead nodes

$ sinfo [[-d|--dead]]

List dead nodes and the reasons why

$ sinfo [[-R|--list-reasons]]

SYNOPSIS

sinfo [OPTIONS]

sinfo provides a flexible interface to query the state of Slurm partitions and nodes. Users typically combine options to filter and format the output according to their needs.

-a, --all
    Displays information about all partitions, including hidden or unavailable ones.

-d, --dead
    Only show information about dead nodes.

-e, --exact
    Report exact node state (e.g., idle*, alloc*).

-i , --iterate=
    Repeatedly execute and display output every specified seconds.

-l, --long
    Provides a long format output, including partition time limits, job limits, and node counts.

-L, --lcm
    Similar to --long, but also shows the current Node Leader/Compute Matrix (LCM) and other detailed node features.

-M , --clusters=
    Specify the cluster(s) to query in a federated setup.

-n , --nodes=
    Filter output to specific nodes by name.

-N, --node
    Displays information in a node-centric format rather than partition-centric.

-o , --format=
    Specify a custom output format using field specifiers.

-p , --partition=
    Filter output to specific partition(s) by name.

-R, --responding
    Only display information for responding nodes.

-s, --short
    Provides a compact, short format output.

-t , --state=
    Filter output to nodes in a specific state (e.g., IDLE, ALLOC, DOWN).

-v, --verbose
    Increases the verbosity of the output, showing more details.

-V, --version
    Prints the version number of sinfo.

DESCRIPTION

sinfo is a command-line utility within the Slurm Workload Manager suite that provides detailed information about partitions and nodes managed by Slurm. It allows users and administrators to quickly assess the current state, availability, capacity, and configuration of computing resources. This command is crucial for understanding where jobs can be submitted, identifying resource bottlenecks, and monitoring the health of the Slurm cluster. It can display aggregated partition information or detailed node-level specifics, making it an indispensable tool for cluster management and job planning. Its flexibility in filtering and formatting output makes it valuable for both interactive queries and scripting.

CAVEATS

sinfo is part of the Slurm Workload Manager and will only function on systems where Slurm is installed and configured. Its output is highly dependent on the current state and configuration of the Slurm cluster. Users may have restricted views based on Slurm's security policies. The interpretation of node states can sometimes be nuanced, requiring familiarity with Slurm's state definitions.

CUSTOM OUTPUT FORMATS (-O OPTION)

The -o or --format option is extremely powerful, allowing users to define exactly which information fields are displayed and in what order. This is achieved by providing a string of percent signs followed by field specifiers (e.g., %P for Partition Name, %R for Reason, %t for Node State). This flexibility enables scripting and integration with other monitoring tools, making sinfo highly adaptable for custom reporting.

UNDERSTANDING NODE STATES

sinfo reports various node states that are critical for cluster understanding. Common states include IDLE (ready to accept jobs), ALLOCATED or ALLOC (running one or more jobs), MIXED (some CPUs allocated, some idle), DOWN (not available), DRAIN (being drained of jobs, new jobs not accepted), and MAINT (in maintenance mode). An asterisk (*) often indicates a node is currently being powered down or has a special reason.

HISTORY

The Slurm Workload Manager, which includes sinfo, originated at Lawrence Livermore National Laboratory (LLNL) in 2002. It was designed as a lightweight, scalable, and fault-tolerant job scheduler for large Linux clusters. sinfo specifically emerged as a core utility to provide users and administrators with real-time insights into the state of the cluster's partitions and nodes, a fundamental requirement for effective high-performance computing (HPC) resource management.