faster-whisper
TLDR
Transcribe an audio file
SYNOPSIS
faster-whisper audio [--model size] [--language lang] [--task task] [options]
DESCRIPTION
faster-whisper is a reimplementation of OpenAI's Whisper using CTranslate2, a fast inference engine for Transformer models. It provides up to 4x faster transcription than the original Whisper while using less memory.
The tool supports all Whisper model sizes. Larger models are more accurate but slower. The compute type parameter controls precision: int8 is fastest and most memory-efficient, float16 is a good balance on GPU, float32 is highest precision.
Voice activity detection (VAD) filtering skips silent sections, improving both speed and accuracy. Language detection is automatic but specifying the language avoids detection overhead.
Install via pip (`pip install faster-whisper`). CTranslate2 handles model conversion automatically. GPU acceleration requires CUDA toolkit.
PARAMETERS
--model SIZE
Model size: tiny, base, small, medium, large-v1, large-v2, large-v3 (default: small).--language LANG
Language code (en, de, fr, etc.) or auto-detect.--task TASK
Task: transcribe or translate.--output_format FORMAT
Output format: txt, vtt, srt, tsv, json, all.--output_dir DIR
Output directory for results.--word_timestamps BOOL
Include word-level timestamps.--device DEVICE
Device: cpu, cuda, auto (default: auto).--compute_type TYPE
Compute type: int8, float16, float32 (default: int8 on CPU).--beam_size N
Beam search size (default: 5).--vad_filter BOOL
Enable voice activity detection filter.--threads N
Number of CPU threads.
CAVEATS
Large models require significant memory. CUDA toolkit needed for GPU. First run downloads and converts models. Accuracy varies by audio quality. No speaker diarization in CLI (available via API).
HISTORY
faster-whisper was created by Guillaume Klein (SYSTRAN) in 2023 using CTranslate2 to optimize Whisper inference. It became the preferred Whisper implementation for production use due to its speed and memory advantages. The project achieved wide adoption in transcription workflows.
SEE ALSO
whisper(1), deepspeech(1), ffmpeg(1)


