accelerate
The command 'accelerate' is not a standard Linux command
TLDR
Print environment information
Interactively create a configuration file
Print the estimated GPU memory cost of running a Hugging Face model with different data types
Test an Accelerate configuration file
Run a model on CPU with Accelerate
Run a model on multi-GPU with Accelerate, with 2 machines
SYNOPSIS
accelerate config
accelerate launch [OPTIONS] <script_name> [SCRIPT_ARGS...]
accelerate test
accelerate env
PARAMETERS
--config_file CONFIG_FILE
Path to a custom configuration file for accelerate settings.
--mixed_precision {no,fp16,bf16}
Enables training with mixed precision (e.g., half-precision floats for speed/memory).
--num_processes NUM
Specifies the total number of processes to launch across all machines.
--num_machines NUM
Defines the total number of machines involved in a multi-node distributed setup.
--gpu_ids IDS
Comma-separated list of GPU IDs to use (e.g., '0,1,2').
--cpu
Forces the training script to run exclusively on the CPU.
--deepspeed
Activates DeepSpeed integration for advanced model parallelism and optimization.
--fsdp
Enables PyTorch's Fully Sharded Data Parallel (FSDP) for large model training.
--main_process_ip IP_ADDRESS
IP address of the main process machine for multi-node communication.
--main_process_port PORT
Port used by the main process for multi-node communication.
--log_with {all,tensorboard,wandb,comet_ml,clearml}
Specifies the logging backend for tracking experiment metrics.
--project_dir DIR
Sets the project directory for storing experiment tracking data.
--output_dir DIR
Specifies the directory for saving model checkpoints and training outputs.
--debug
Enables debug mode, providing more verbose output for troubleshooting.
DESCRIPTION
accelerate is a Command Line Interface (CLI) tool provided by the Hugging Face accelerate Python library. It is designed to abstract and simplify the complexities of running deep learning training scripts across various hardware configurations, including single-GPU, multi-GPU, distributed (DDP), and mixed-precision (FP16/BF16) environments. Rather than requiring developers to manually configure distributed training specifics, accelerate allows writing standard PyTorch or TensorFlow training loops that can be effortlessly scaled. The CLI primarily facilitates the configuration and launch of these training scripts, handling the underlying distributed communication, hardware allocation, and precision settings. It streamlines the development workflow by allowing a single script to run optimally from a local CPU to a multi-node GPU cluster.
CAVEATS
accelerate is not a core Linux utility. It requires Python and must be installed via pip (e.g., pip install accelerate). It is primarily designed for PyTorch and TensorFlow deep learning frameworks, and its functionality is tied to the specific training scripts written using these libraries. While simplifying distributed training, advanced configurations may still require understanding underlying concepts.
HISTORY
Developed by Hugging Face, the accelerate library and its CLI tool were introduced to address the growing complexity of distributed deep learning training. Its initial release, around late 2021 / early 2022, aimed to provide a high-level API that abstracts away the boilerplate code typically required for multi-GPU, multi-node, and mixed-precision training. It evolved to support various distributed strategies like DeepSpeed and FSDP, becoming a crucial tool for scaling Hugging Face Transformers models and other deep learning projects efficiently across diverse hardware setups.