LinuxCommandLibrary

sh5util

Merge Slurm HDF5 profiling data

TLDR

Merge HDF5 files for a job

$ sh5util -j [job_id]
copy
Merge HDF5 files for a specific job step
$ sh5util -j [job_id.step_id]
copy
Extract data series from a merged job file
$ sh5util -j [job_id] -E -i [path/to/file.h5] -s [Energy|Filesystem|Network|Task]
copy
Extract a specific data item from all nodes
$ sh5util -j [job_id] -I -s [series] -d [data_item]
copy
List available data items in a series
$ sh5util -j [job_id] -I -s [series] -L
copy
Keep node files after merging
$ sh5util -j [job_id] -S
copy

SYNOPSIS

sh5util [-j job[.step]] [-E|-I] [OPTIONS]

DESCRIPTION

sh5util merges HDF5 profiling files produced by Slurm's acct_gather_profile plugin across compute nodes into a single consolidated file for analysis. It supports three modes: merging node files, extracting data series to CSV, and extracting specific metrics from time series.
The tool works with Slurm job profiling data that tracks energy consumption, filesystem I/O, network activity, and task-level metrics. Output files can be analyzed with HDF5 tools or converted to CSV for use with standard data analysis applications.

PARAMETERS

-j, --jobs job[.step]

Merge HDF5 files for the specified job or job step
-p, --profiledir dir
Directory containing node-step HDF5 files
-o, --output path
Output file path (default: ./job_$jobid.h5)
-S, --savefiles
Keep node-step files after merging
--user user
User who ran the profiled job
-E, --extract
Extract data series to CSV format
-i, --input path
Input merged HDF5 file for extraction
-N, --node nodename
Extract data for specific node only
-l, --level level
Data level: Node:Totals or Node:TimeSeries
-s, --series series
Data series: Energy, Filesystem, Network, Task, or Task_#
-I, --item-extract
Extract single data item from all samples
-d, --data item
Specific data item name to extract
-L, --list
List available data items in a series
-h, --help
Display usage information

CAVEATS

Requires HDF5 profiling to be enabled in Slurm configuration. Node files must exist in the profile directory. Large jobs with many nodes and long runtimes can produce substantial HDF5 files. The acct_gather_profile plugin must be configured on the cluster.

HISTORY

sh5util is part of the Slurm (Simple Linux Utility for Resource Management) workload manager, developed at Lawrence Livermore National Laboratory. Slurm was first released in 2002 and has become one of the most widely used HPC job schedulers. HDF5 profiling support was added to provide detailed job performance analysis capabilities.

SEE ALSO

sacct(1), sstat(1), srun(1), sbatch(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard

> TERMINAL_GEAR

Curated for the Linux community