scancel
Cancel pending or running Slurm jobs
TLDR
Cancel a job using its ID
Cancel all jobs from a user
SYNOPSIS
scancel [OPTIONS] [job_id[_step_id] | job_id.batch | job_id[+] | job_id-job_id | reservation_name] [...]
PARAMETERS
-M cluster_name
Specify the cluster name for a federated Slurm setup.
-n job_name
Cancel jobs with the specified name.
-p partition_name
Cancel jobs associated with the specified partition.
-s signal_name | signal_number
Send the specified signal to the job or job step instead of the default termination sequence. Signals can be by name (e.g., KILL, TERM) or number.
-t job_state_list
Cancel jobs that are currently in one of the specified states (e.g., PENDING, RUNNING, SUSPENDED).
-u user_name
Cancel jobs belonging to the specified user.
--all
Cancel all jobs belonging to the calling user across all accessible clusters.
--batch
When used with a job ID, cancel only the batch job script itself, not its spawned job steps.
--reservation reservation_name
Cancel a specific reservation.
-i, --interactive
Prompt for confirmation before canceling jobs.
--verbose
Display more detailed information about the cancellation process.
DESCRIPTION
scancel is a command-line utility used to terminate or signal jobs, job steps, or reservations managed by the Slurm Workload Manager. It provides a flexible way to remove unwanted jobs from the queue or stop currently running processes. By default, scancel sends a SIGTERM signal to the job or job step, followed by a SIGKILL signal after a configurable timeout (typically 30 seconds if the job doesn't exit gracefully). This two-phase termination allows applications to perform cleanup operations before being forcibly killed. Users can specify jobs by ID, user, name, partition, or state. It's an essential tool for managing computational resources in a Slurm cluster, enabling administrators and users to maintain system stability and resource availability.
CAVEATS
Permissions: Users can only cancel their own jobs or reservations. Slurm administrators (e.g., root or users with SlurmUser privileges) can cancel any job or reservation.
Signal Handling: While scancel defaults to SIGTERM then SIGKILL, applications can catch SIGTERM to perform graceful shutdowns. If an application does not handle SIGTERM or fails to exit, SIGKILL will forcibly terminate it.
Targeting: Be precise when specifying job IDs. job_id cancels the entire job, job_id_step_id targets a specific job step, and job_id.batch cancels only the batch script itself. Using options like -u or -n can affect multiple jobs.
DEFAULT TERMINATION BEHAVIOR
When no signal is specified, scancel first sends SIGTERM to allow for graceful shutdown. If the job or step does not terminate within the configured KillWait (default 30 seconds), a SIGKILL signal is sent to forcibly terminate it.
CANCELING JOB STEPS
To cancel only a specific job step (e.g., a process launched by srun within a batch job), specify the job ID followed by an underscore and the step ID (e.g., scancel 12345_0). Omitting the step ID cancels the entire job, including all its running steps.
HISTORY
scancel is a fundamental command within the Slurm Workload Manager suite, which began its development in the early 2000s at Lawrence Livermore National Laboratory. As Slurm evolved into one of the most widely adopted workload managers in high-performance computing (HPC), scancel has remained a core utility. Its design reflects the need for robust and flexible job control in large-scale cluster environments, allowing for precise termination of computational tasks and efficient resource management. Subsequent versions of Slurm have incrementally improved its capabilities, particularly in handling federated clusters and more complex job structures.