salloc
Allocate resources for interactive parallel computing jobs
TLDR
Start an interactive shell session on a node in the cluster
Execute the specified command synchronously on a node in the cluster
Only allocate nodes fulfilling the specified constraints
SYNOPSIS
salloc [OPTIONS] [COMMAND [ARGS...]]
PARAMETERS
-A, --account=account
Charge resources used by this job to the specified account.
-C, --constraint=list
Request nodes with a specific feature (e.g., K80, V100).
-D, --dependency=type:job_id[:job_id...]
Defer the allocation until the specified job(s) complete.
-G, --gpus=count
Request a specific number of GPUs per node or total.
-J, --job-name=name
Specify a name for the job.
-L, --licenses=licenses
Request a specific set of licenses.
-N, --nodes=min_nodes[-max_nodes]
Request a specific number of nodes.
-n, --ntasks=number
Request a specific number of tasks (processes).
-p, --partition=partition_name
Request a specific partition.
-t, --time=time
Set a time limit for the job (e.g., '1:00:00' for 1 hour).
--exclusive
Allocate nodes exclusively for this job.
--mem=size
Specify maximum real memory per node (e.g., '10G', '500M').
--mem-per-cpu=size
Specify maximum real memory per allocated CPU (e.g., '100M').
--cpus-per-task=count
Request 'count' number of CPUs per task.
DESCRIPTION
salloc is a Slurm command used to allocate resources for an interactive job on a cluster. Unlike sbatch, which submits a job script for batch execution, salloc initiates an interactive shell session on the first allocated node. This allows users to directly interact with the allocated resources, execute commands, debug code, and test applications in real-time. It's particularly useful for development, short-term testing, or when an immediate interactive environment is required. The allocation remains active until the user exits the shell or the requested time limit is reached. Resources are reserved for the exclusive use of the job during its allocation duration, even if the user is not actively executing commands within the shell.
CAVEATS
salloc allocates resources but does not automatically execute commands across all allocated nodes or tasks. To run parallel tasks or commands on multiple nodes within an salloc session, you must use srun. It's crucial to exit the salloc shell (e.g., using 'exit' or Ctrl+D) when done to release resources; otherwise, the allocation will persist until its time limit is reached, potentially impacting fair share policies for other users.
HOW IT WORKS
When you execute salloc, it requests resources from the Slurm scheduler based on your specified options (nodes, time, memory, etc.). Once the resources are available, Slurm allocates them, assigns a Job ID, and then spawns a shell (usually your default shell) on the first allocated compute node. Any commands you type in this shell will run on that node. To utilize the full allocation (e.g., run a parallel MPI job across multiple nodes), you typically invoke srun from within this interactive shell.
ENVIRONMENT VARIABLES
Upon a successful salloc allocation, several Slurm-specific environment variables are set within the interactive shell, providing information about the job. Key variables include:
SLURM_JOB_ID: The unique ID of the allocated job.
SLURM_NNODES: The total number of nodes allocated.
SLURM_CPUS_ON_NODE: Number of CPUs on the current node.
SLURM_NTASKS: Total number of tasks requested for the job.
HISTORY
salloc is an integral part of the Slurm Workload Manager, which originated in 2002. Slurm (Simple Linux Utility for Resource Management) was developed as a free, open-source job scheduler for Linux clusters. salloc was introduced early in Slurm's development to provide interactive resource allocation, addressing the need for immediate, on-demand access to cluster resources for debugging and development tasks, complementing the traditional batch processing model.