salloc

Allocate resources for interactive parallel computing jobs

TLDR

Start an interactive shell session on a node in the cluster

$ salloc

Execute the specified command synchronously on a node in the cluster

$ salloc [ls --all]

Only allocate nodes fulfilling the specified constraints

$ salloc [[-C|--constraint]] [(amd|intel)&gpu]

SYNOPSIS

salloc [OPTIONS] [COMMAND [ARGS...]]

-A, --account=account
    Charge resources used by this job to the specified account.

-C, --constraint=list
    Request nodes with a specific feature (e.g., K80, V100).

-D, --dependency=type:job_id[:job_id...]
    Defer the allocation until the specified job(s) complete.

-G, --gpus=count
    Request a specific number of GPUs per node or total.

-J, --job-name=name
    Specify a name for the job.

-L, --licenses=licenses
    Request a specific set of licenses.

-N, --nodes=min_nodes[-max_nodes]
    Request a specific number of nodes.

-n, --ntasks=number
    Request a specific number of tasks (processes).

-p, --partition=partition_name
    Request a specific partition.

-t, --time=time
    Set a time limit for the job (e.g., '1:00:00' for 1 hour).

--exclusive
    Allocate nodes exclusively for this job.

--mem=size
    Specify maximum real memory per node (e.g., '10G', '500M').

--mem-per-cpu=size
    Specify maximum real memory per allocated CPU (e.g., '100M').

--cpus-per-task=count
    Request 'count' number of CPUs per task.

DESCRIPTION

salloc is a Slurm command used to allocate resources for an interactive job on a cluster. Unlike sbatch, which submits a job script for batch execution, salloc initiates an interactive shell session on the first allocated node. This allows users to directly interact with the allocated resources, execute commands, debug code, and test applications in real-time. It's particularly useful for development, short-term testing, or when an immediate interactive environment is required. The allocation remains active until the user exits the shell or the requested time limit is reached. Resources are reserved for the exclusive use of the job during its allocation duration, even if the user is not actively executing commands within the shell.

CAVEATS

salloc allocates resources but does not automatically execute commands across all allocated nodes or tasks. To run parallel tasks or commands on multiple nodes within an salloc session, you must use srun. It's crucial to exit the salloc shell (e.g., using 'exit' or Ctrl+D) when done to release resources; otherwise, the allocation will persist until its time limit is reached, potentially impacting fair share policies for other users.

HOW IT WORKS

When you execute salloc, it requests resources from the Slurm scheduler based on your specified options (nodes, time, memory, etc.). Once the resources are available, Slurm allocates them, assigns a Job ID, and then spawns a shell (usually your default shell) on the first allocated compute node. Any commands you type in this shell will run on that node. To utilize the full allocation (e.g., run a parallel MPI job across multiple nodes), you typically invoke srun from within this interactive shell.

ENVIRONMENT VARIABLES

Upon a successful salloc allocation, several Slurm-specific environment variables are set within the interactive shell, providing information about the job. Key variables include:
SLURM_JOB_ID: The unique ID of the allocated job.
SLURM_NNODES: The total number of nodes allocated.
SLURM_CPUS_ON_NODE: Number of CPUs on the current node.
SLURM_NTASKS: Total number of tasks requested for the job.

HISTORY

salloc is an integral part of the Slurm Workload Manager, which originated in 2002. Slurm (Simple Linux Utility for Resource Management) was developed as a free, open-source job scheduler for Linux clusters. salloc was introduced early in Slurm's development to provide interactive resource allocation, addressing the need for immediate, on-demand access to cluster resources for debugging and development tasks, complementing the traditional batch processing model.