bark
TLDR
Generate speech from text
SYNOPSIS
python -m bark --text text --output_filename file [options]
DESCRIPTION
Bark is a transformer-based text-to-audio model by Suno AI. Unlike traditional TTS, Bark generates highly expressive speech including laughter, sighs, breathing, crying, and even music.
Special tokens in the text control non-speech sounds: `[laughs]`, `[sighs]`, `[gasps]`, `[clears throat]`, and `[music]`. Musical notation with `♪` symbols can generate singing. Capitalizing words adds emphasis, and `...` adds hesitation.
Speaker presets select voice characteristics. Presets are available for multiple languages: English, German, Spanish, French, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Turkish, and Chinese.
Install with `pip install suno-bark`. Models are downloaded automatically on first use. GPU (CUDA) is strongly recommended for reasonable generation speed.
PARAMETERS
--text TEXT
Input text to synthesize.--output_filename FILE
Output audio file path (.wav).--history_prompt PRESET
Speaker voice preset (e.g., v2/enspeaker0 through v2/enspeaker9).--text_temp FLOAT
Text generation temperature (default: 0.7).--waveform_temp FLOAT
Waveform generation temperature (default: 0.7).
CAVEATS
Slow on CPU (GPU strongly recommended). Large model downloads (~5GB). Output quality varies. Long text should be split into sentences. Not suitable for real-time synthesis. May produce unexpected audio artifacts.
HISTORY
Bark was released by Suno AI in April 2023 as an open-source text-to-audio model. Its ability to generate expressive speech with emotions and non-verbal sounds set it apart from conventional TTS systems. The model quickly gained popularity for creative audio generation.


