LinuxCommandLibrary

scancel

Cancel pending or running Slurm jobs

TLDR

Cancel a job using its ID

$ scancel [job_id]
copy

Cancel all jobs from a user
$ scancel [user_name]
copy

SYNOPSIS

scancel [options] [job_id_list]

PARAMETERS

-A, --account=
    Cancel jobs associated with the specified account.

-f, --force
    Force job cancellation without confirmation.

-g, --gid=
    Cancel jobs associated with the specified group ID.

-h, --help
    Display help message and exit.

-i, --interactive
    Request confirmation before cancelling each job.

-n, --name=
    Cancel jobs with the specified name.

-p, --partition=
    Cancel jobs running on the specified partition.

-q, --qos=
    Cancel jobs associated with the specified quality of service (QOS).

-Q, --quiet
    Suppress error messages.

-s, --signal=
    Signal to send to the job. The default is SIGTERM.

-t, --state=
    Cancel jobs in the specified state.

-u, --user=
    Cancel jobs associated with the specified user ID.

--step=
    Cancel a specific step of a job.


    List of job IDs to cancel. Multiple job IDs can be specified, separated by spaces or commas.

DESCRIPTION

The scancel command is used to signal jobs, job arrays, or job steps that are currently managed by the SLURM workload manager. It allows users and administrators to terminate running or pending jobs, effectively removing them from the SLURM queue. This command provides a flexible way to manage and control resource allocation within a SLURM-managed cluster. Jobs can be cancelled based on job ID, user ID, partition, account, or other criteria. It's a crucial tool for resource management, allowing for immediate termination of jobs that are malfunctioning, exceeding time limits, or otherwise requiring immediate intervention. Correct usage ensures efficient resource utilization and prevents unnecessary resource consumption. The command can also be used by administrators to manage jobs across the entire cluster. By default, scancel sends a SIGTERM signal to the specified job. However, other signals can also be specified, allowing for more controlled termination.

CAVEATS

Insufficient permissions can prevent cancelling other users' jobs or jobs associated with different accounts. The behavior when cancelling job arrays can be complex and depends on the specific options used.

RETURN CODES

scancel returns 0 on success. When a failure occurs, scancel typically returns a non-zero exit code.
Common error conditions include:
1: Invalid option or argument supplied.
2: The specified job ID does not exist.
3: Insufficient permissions to cancel the specified job.

SIGNAL HANDLING

By default, scancel sends the SIGTERM signal to the job. It gracefully requests the job to terminate. If the job doesn't respond to SIGTERM within a reasonable time, consider using SIGKILL which immediately stops the job, potentially losing unsaved data. Carefully consider the signal to send.

HISTORY

scancel is a part of the SLURM (Simple Linux Utility for Resource Management) workload manager, which originated as a open source project. SLURM has been developed and refined over many years, with scancel evolving alongside other components to provide increasingly sophisticated job management capabilities. Its development has been driven by the needs of high-performance computing environments.

SEE ALSO

squeue(1), sbatch(1), sinfo(1)

Copied to clipboard