nvidia-smi-mig

Manage NVIDIA MIG partitions

TLDR

Create a compute instance from device 0

$ nvidia-smi mig [[-cgi|--create-gpu-instance]] [0] [[-C|--default-compute-instance]]

List GPU instances

$ nvidia-smi mig [[-lgi|--list-gpu-instances]]

Display help

$ nvidia-smi mig [[-h|--help]]

nvidia-smi mig [command] [options]

nvidia-smi mig -lgi [-i GPU_ID]
nvidia-smi mig -lci [-i GPU_ID]
nvidia-smi mig -lcc [-i GPU_ID]
nvidia-smi mig -lgi-profiles [-i GPU_ID]
nvidia-smi mig -lci-profiles [-i GPU_ID]
nvidia-smi mig -cgi GI_PROFILE_ID [-i GPU_ID] [--count N] [--default-device-file FILE]
nvidia-smi mig -dgi GI_ID [-i GPU_ID] [--force] [--no-confirm]
nvidia-smi mig -cci CI_PROFILE_ID [--gpuidx GPU_IDX | --parentgi GI_ID] [--count N]
nvidia-smi mig -dci CI_ID [--gpuidx GPU_IDX | --parentgi GI_ID] [--force] [--no-confirm]
nvidia-smi mig -mig-mode MODE [-i GPU_ID] [--no-confirm]

PARAMETERS

-lgi
    Lists all GPU Instances (GIs) on the system or specified GPU.

-lci
    Lists all Compute Instances (CIs) on the system or specified GPU.

-lcc
    Lists all MIG compute configurations on the system or specified GPU.

-lgi-profiles
    Lists supported GPU Instance profiles for a GPU.

-lci-profiles
    Lists supported Compute Instance profiles for a GPU or a parent GPU Instance.

-cgi PROFILE_ID
    Creates a GPU Instance using the specified profile ID. Use `-lgi-profiles` to find available IDs.

-dgi GI_ID
    Deletes the specified GPU Instance. Use `-lgi` to find existing GI IDs.

-cci PROFILE_ID
    Creates a Compute Instance using the specified profile ID. Use `-lci-profiles` to find available IDs.

-dci CI_ID
    Deletes the specified Compute Instance. Use `-lci` to find existing CI IDs.

-mig-mode MODE
    Sets the MIG mode for a GPU. MODE can be 'ENABLED' or 'DISABLED'. Requires a GPU reset.

-i GPU_ID
    Specifies the target GPU by its 0-indexed device ID (e.g., 0, 1).

--gpuidx GPU_IDX
    Specifies the target GPU by its index, primarily for CI operations.

--parentgi GI_ID
    Specifies the parent GPU Instance for Compute Instance operations.

--count N
    Specifies the number of instances to create (for creation commands).

--force
    Forces the operation, overriding certain checks (use with caution as it can lead to data loss).

--no-confirm
    Skips the confirmation prompt for destructive operations, allowing unattended execution.

DESCRIPTION

The `nvidia-smi mig` command is a powerful subcommand of `nvidia-smi` used to manage NVIDIA's Multi-Instance GPU (MIG) feature. MIG allows a single Ampere or newer GPU to be partitioned into up to seven independent GPU Instances (GIs) and Compute Instances (CIs). Each GI has its own dedicated memory, cache, and compute cores, providing fault isolation and guaranteed quality of service (QoS) for various workloads.

This command facilitates the creation, deletion, and listing of these MIG objects, enabling users to optimize GPU utilization by running diverse workloads concurrently with strong isolation. It's crucial for cloud providers and AI/ML researchers who need to partition high-end GPUs for multiple users or applications.

CAVEATS

The MIG feature is only available on NVIDIA GPUs based on the Ampere architecture (e.g., A100) or newer. Before using MIG, the GPU's MIG mode must be enabled, which often requires a system reboot. Operations like enabling/disabling MIG mode or deleting all instances typically require a GPU reset, which can disrupt running workloads. All `nvidia-smi mig` commands require root privileges to execute successfully.

NOTE ON COMMAND NAME

The command discussed is specifically `nvidia-smi mig`, which is a subcommand of the primary `nvidia-smi` utility. While some references might conceptually use `nvidia-smi-mig`, the literal executable command is invoked by passing `mig` as an argument to `nvidia-smi`.

GPU INSTANCE (GI) AND COMPUTE INSTANCE (CI) PROFILES

MIG utilizes predefined profiles that dictate the memory, compute, and video decode/encode capabilities allocated to each instance. GPU Instance profiles define the partitioning of the physical GPU, while Compute Instance profiles define the compute resources within a created GPU Instance. Users select these profiles based on their workload requirements and the available GPU resources.

DEVICE FILES

When GPU Instances are created, the system typically presents them as separate, independent CUDA-capable devices, each with its own device file (e.g., /dev/nvidia-giNcM). This allows applications to target specific MIG instances as if they were distinct physical GPUs, simplifying workload deployment.

HISTORY

Multi-Instance GPU (MIG) technology was introduced by NVIDIA with its Ampere architecture, first showcased with the A100 GPU in 2020. The `nvidia-smi mig` subcommand was subsequently added to the `nvidia-smi` utility to provide a programmatic interface for managing these new GPU partitioning capabilities, evolving with new GPU generations.

SEE ALSO

nvidia-smi(1), nvcc(1), nvidia-persistenced(8)