sstat

Display Slurm job or step statistics

TLDR

Display status information of a comma-separated list of jobs

$ sstat [[-j|--jobs]] [job_id]

Display job ID, average CPU and average virtual memory size of a comma-separated list of jobs, with pipes as column delimiters

$ sstat [[-p|--parsable]] [[-j|--jobs]] [job_id] [[-o|--format]] [JobID,AveCPU,AveVMSize]

Display list of fields available

$ sstat [[-e|--helpformat]]

SYNOPSIS

sstat [OPTIONS]
Example:
sstat -j 12345
sstat -j 12345.0 -o PIDS,RSS,VMSize

-a, --allsteps
    Reports on all job steps for the specified job(s).

-c, --cluster=
    Specify the cluster name to query.

-d, --delimiter=
    Specify a character delimiter for parsable output.

-E, --endtime=
    Report on jobs/steps that ended before this time. Format: HH:MM[:SS][AM|PM] MM/DD[/YY] or YYYY-MM-DD[THH:MM[:SS]].

-g, --group
    Group the output by job ID or step ID.

-i, --input=
    Read job IDs from the specified file, one per line.

-j, --jobid=
    Specify the job ID (and optional step ID) to report statistics for. Can be a comma-separated list.

-l, --alloc_in_host_order
    Order the output based on the allocation host list.

-M, --clusters=
    Clusters to issue the command to. Use 'all' for all configured clusters.

-n, --nodelist=
    Report statistics only for the specified nodes.

-o, --format=
    Specify the fields to output. Common fields include: PIDS, RSS, VMSize, CPU, Pages, MinDisk, MaxDisk, AveDisk, Read, Write, Syscall, Freq.

-P, --parsable
    Output in a colon-separated parsable format.

-p, --parsable2
    Output in a delimiter-separated parsable format, with a header.

-r, --raw
    Report raw data values (e.g., bytes instead of KB).

-S, --starttime=
    Report on jobs/steps that started after this time. Format: HH:MM[:SS][AM|PM] MM/DD[/YY] or YYYY-MM-DD[THH:MM[:SS]].

-T, --noheader
    Suppress the header line from the output.

-t, --truncate
    Truncate string fields to fit the display width.

-u, --usage
    Display usage information and exit.

-v, --verbose
    Display more detailed information.

-X, --xavg
    Show average statistics for each partition rather than individual host details.

DESCRIPTION

sstat is a command-line utility provided as part of the Slurm Workload Manager. Its primary function is to display various statistics and resource usage information for Slurm jobs and job steps, both active and completed. This tool is invaluable for monitoring job performance, diagnosing issues, and understanding resource consumption on a Slurm-managed cluster. It can report on metrics such as CPU time, memory usage, I/O activity, and more, either for specific jobs/steps or across a range of criteria like timeframes or nodes. sstat supports multiple output formats, including brief summaries, verbose details, and parsable formats suitable for scripting and data analysis. It allows administrators and users to gain insights into how resources are being utilized by their computational workloads.

CAVEATS

sstat relies on accounting data collected by Slurm. If slurmdbd (Slurm database daemon) is not configured or running, or if accounting is not enabled, sstat may not return any data or may return incomplete data.
The granularity and availability of metrics depend on Slurm's configuration and the accounting storage plugin used. Performance data for jobs that have finished and been purged from the active state may not be available via sstat unless captured by the accounting system.

OUTPUT FIELDS (-O OPTION)

The -o or --format option is highly flexible, allowing users to specify exactly which metrics to display. Users can choose from a wide range of fields like PID (Process ID), RSS (Resident Set Size), VMSize (Virtual Memory Size), CPU (CPU time used), Pages (Page faults), MinDisk/MaxDisk/AveDisk (Disk I/O statistics), Read/Write (Bytes read/written), Syscall (System calls), and Freq (CPU frequency). This customizability makes sstat a powerful tool for tailored performance analysis.

HISTORY

sstat is an integral part of the Slurm Workload Manager, which originated at Lawrence Livermore National Laboratory (LLNL) in 2002. It was developed as a free, open-source cluster management and job scheduling system for Linux clusters, replacing the proprietary Quadrics QSCHED. As Slurm evolved to support larger and more complex high-performance computing (HPC) environments, sstat was introduced and refined to provide critical real-time and historical performance insights, crucial for resource optimization and debugging in these demanding systems. Its development continues as part of the broader Slurm project, driven by community contributions and the needs of HPC centers worldwide.