LinuxCommandLibrary

faster-whisper

TLDR

Transcribe an audio file

$ faster-whisper [audio.mp3]
copy
Transcribe with a specific model
$ faster-whisper [audio.mp3] --model [large-v3]
copy
Transcribe with language hint
$ faster-whisper [audio.mp3] --language [en]
copy
Output as SRT subtitles
$ faster-whisper [audio.mp3] --output_format [srt]
copy
Translate to English
$ faster-whisper [audio.mp3] --task [translate]
copy
Save output to directory
$ faster-whisper [audio.mp3] --output_dir [/path/to/output]
copy
Transcribe with word timestamps
$ faster-whisper [audio.mp3] --word_timestamps [true]
copy

SYNOPSIS

faster-whisper audio [--model size] [--language lang] [--task task] [options]

DESCRIPTION

faster-whisper is a reimplementation of OpenAI's Whisper using CTranslate2, a fast inference engine for Transformer models. It provides up to 4x faster transcription than the original Whisper while using less memory.
The tool supports all Whisper model sizes. Larger models are more accurate but slower. The compute type parameter controls precision: int8 is fastest and most memory-efficient, float16 is a good balance on GPU, float32 is highest precision.
Voice activity detection (VAD) filtering skips silent sections, improving both speed and accuracy. Language detection is automatic but specifying the language avoids detection overhead.
Install via pip (`pip install faster-whisper`). CTranslate2 handles model conversion automatically. GPU acceleration requires CUDA toolkit.

PARAMETERS

--model SIZE

Model size: tiny, base, small, medium, large-v1, large-v2, large-v3 (default: small).
--language LANG
Language code (en, de, fr, etc.) or auto-detect.
--task TASK
Task: transcribe or translate.
--output_format FORMAT
Output format: txt, vtt, srt, tsv, json, all.
--output_dir DIR
Output directory for results.
--word_timestamps BOOL
Include word-level timestamps.
--device DEVICE
Device: cpu, cuda, auto (default: auto).
--compute_type TYPE
Compute type: int8, float16, float32 (default: int8 on CPU).
--beam_size N
Beam search size (default: 5).
--vad_filter BOOL
Enable voice activity detection filter.
--threads N
Number of CPU threads.

CAVEATS

Large models require significant memory. CUDA toolkit needed for GPU. First run downloads and converts models. Accuracy varies by audio quality. No speaker diarization in CLI (available via API).

HISTORY

faster-whisper was created by Guillaume Klein (SYSTRAN) in 2023 using CTranslate2 to optimize Whisper inference. It became the preferred Whisper implementation for production use due to its speed and memory advantages. The project achieved wide adoption in transcription workflows.

SEE ALSO

Copied to clipboard