espeak
Convert text to speech
TLDR
Speak a phrase aloud
Speak text from stdin
Speak the contents of a [f]ile
Speak using a specific [v]oice
Speak at a specific [s]peed (default is 160) and [p]itch (default is 50)
Output the audio to a [w]AV file instead of speaking it directly
List all available voices
SYNOPSIS
espeak [options] [[options] <text>]
PARAMETERS
-a, --amplitude
Amplitude (volume) 0-200, default 100
-b, --byte
Input translation mode: 1=UTF8, 2=8bit, 4=hex, default UTF8
-d, --default-voice
Use default voice
-f
Speak from text file, not command line
-g, --gap
Gap (msec) between words, default 10
-h, --help
Display help and exit
-k, --key
Voice variant parameter, 0-99
-l, --line-length
Maximum line length (default 7), 0=whole file
-m, --mark
SSML mode: <speak> and <voice> tags
-p, --pitch
Pitch 0-99, default 50
-q, --quiet
Quiet - don't display phonemes
-r, --rate
Speed in words/min, 80-500, default 160
-s, --speed
Same as --rate
-S, --sentence
Recognize sentences: 0=none, 1=some, 2=all, default 1
-v, --voice
Voice name, e.g. en-us, list with --voices
-w, --wave
Output speech to WAV file (mono, 22050Hz)
-x
Output phoneme mnemonics
-X
Output phonemes in Festival format
--path
Search for voices/dictionaries in dir
--phonout
Write phonemes to file
--stdout
Output to stdout (raw 8-bit signed PCM, 22050Hz)
--version
Print version and exit
--voices [=lang]
List voices, filter by lang if specified
DESCRIPTION
eSpeak is a lightweight, open-source speech synthesizer for Linux and other platforms, converting text to speech using formant synthesis. It supports over 100 languages and accents, with voices defined in compact text files for easy customization. At under 2 MB, it's ideal for embedded systems, accessibility tools, automation, and scripting.
eSpeak reads text from stdin, files, or command line, outputting to speakers or WAV files. Key features include adjustable speed, pitch, volume; SSML support; phoneme output; and splitting into words/ sentences. Though robotic-sounding compared to neural TTS, it's fast, portable, and free. Part of eSpeak NG fork, it's widely used in screen readers like Orca and NVDA.
CAVEATS
Robotic voice quality; requires espeak-data package for voices. Limited prosody compared to neural synthesizers like Piper or RHVoice. UTF-8 input recommended.
VOICE SELECTION
Use espeak --voices to list. Format: language variant_name, e.g. espeak -v en+f3.
PUNCTUATION READING
Controls: -t 1 read punctuation (default 0=ignore).
HISTORY
Originated as "speak" by Jonathan Duddington in 1996. Renamed eSpeak ~2007, supporting many languages. Original project inactive post-2010; eSpeak NG fork started 2015 by volunteers, adding improvements like MBROLA support.
SEE ALSO
festival(1), flite(1), pico2wave(1)


