accelerate

The command 'accelerate' is not a standard Linux command

TLDR

Print environment information

$ accelerate env

Interactively create a configuration file

$ accelerate config

Print the estimated GPU memory cost of running a Hugging Face model with different data types

$ accelerate estimate-memory [name/model]

Test an Accelerate configuration file

$ accelerate test --config_file [path/to/config.yaml]

Run a model on CPU with Accelerate

$ accelerate launch [path/to/script.py] [--cpu]

Run a model on multi-GPU with Accelerate, with 2 machines

$ accelerate launch [path/to/script.py] --multi_gpu --num_machines 2

SYNOPSIS

accelerate config
accelerate launch [OPTIONS] <script_name> [SCRIPT_ARGS...]
accelerate test
accelerate env

--config_file CONFIG_FILE
    Path to a custom configuration file for accelerate settings.

--mixed_precision {no,fp16,bf16}
    Enables training with mixed precision (e.g., half-precision floats for speed/memory).

--num_processes NUM
    Specifies the total number of processes to launch across all machines.

--num_machines NUM
    Defines the total number of machines involved in a multi-node distributed setup.

--gpu_ids IDS
    Comma-separated list of GPU IDs to use (e.g., '0,1,2').

--cpu
    Forces the training script to run exclusively on the CPU.

--deepspeed
    Activates DeepSpeed integration for advanced model parallelism and optimization.

--fsdp
    Enables PyTorch's Fully Sharded Data Parallel (FSDP) for large model training.

--main_process_ip IP_ADDRESS
    IP address of the main process machine for multi-node communication.

--main_process_port PORT
    Port used by the main process for multi-node communication.

--log_with {all,tensorboard,wandb,comet_ml,clearml}
    Specifies the logging backend for tracking experiment metrics.

--project_dir DIR
    Sets the project directory for storing experiment tracking data.

--output_dir DIR
    Specifies the directory for saving model checkpoints and training outputs.

--debug
    Enables debug mode, providing more verbose output for troubleshooting.

DESCRIPTION

accelerate is a Command Line Interface (CLI) tool provided by the Hugging Face accelerate Python library. It is designed to abstract and simplify the complexities of running deep learning training scripts across various hardware configurations, including single-GPU, multi-GPU, distributed (DDP), and mixed-precision (FP16/BF16) environments. Rather than requiring developers to manually configure distributed training specifics, accelerate allows writing standard PyTorch or TensorFlow training loops that can be effortlessly scaled. The CLI primarily facilitates the configuration and launch of these training scripts, handling the underlying distributed communication, hardware allocation, and precision settings. It streamlines the development workflow by allowing a single script to run optimally from a local CPU to a multi-node GPU cluster.

CAVEATS

accelerate is not a core Linux utility. It requires Python and must be installed via pip (e.g., pip install accelerate). It is primarily designed for PyTorch and TensorFlow deep learning frameworks, and its functionality is tied to the specific training scripts written using these libraries. While simplifying distributed training, advanced configurations may still require understanding underlying concepts.

HISTORY

Developed by Hugging Face, the accelerate library and its CLI tool were introduced to address the growing complexity of distributed deep learning training. Its initial release, around late 2021 / early 2022, aimed to provide a high-level API that abstracts away the boilerplate code typically required for multi-GPU, multi-node, and mixed-precision training. It evolved to support various distributed strategies like DeepSpeed and FSDP, becoming a crucial tool for scaling Hugging Face Transformers models and other deep learning projects efficiently across diverse hardware setups.