espeak-ng
Synthesize text to speech
TLDR
Speak a phrase aloud
Speak text from stdin
Speak the contents of a [f]ile
Speak using a specific [v]oice
Speak at a specific [s]peed (default is 175) and [p]itch (default is 50)
Output the audio to a [w]AV file instead of speaking it directly
List all available voices
SYNOPSIS
espeak-ng [options] [<text> | -f <file> | -]
PARAMETERS
-f <file>
Specify file(s) containing text to speak
-w <wavefile>
Write speech as WAV file instead of playing
-v <VOICE>
Voice name, e.g., 'en-us', 'fr'; list with 'espeak-ng --voices'
-s <SPEED>
Speed in words-per-minute (80-500, default 160)
-p <PITCH>
Pitch change (0-99, default 50)
-a <VOLUME>
Amplitude/volume (0-200, default 100)
-g <GAP>
Gap (ms) between sentences (default 14)
-l <LINE-LENGTH>
Screen reader mode: line length (default 7)
-m
Recognize SSML markup tags
-x
Print phoneme mnemonics instead of speech
--path=<DIR>
Search for voices/dictionaries in directory
--voices[=<LANG>]
List available voices (filter by language)
--version
Print version info
-h, --help
Show usage summary
DESCRIPTION
eSpeak NG is a lightweight, formant-based speech synthesizer that converts text to speech for over 100 languages and accents. It produces clear pronunciation with a compact footprint, making it ideal for embedded systems, accessibility tools, and screen readers on Linux and other platforms.
Using "international English phonetic alphabet" rules, it generates speech via formant synthesis, resulting in a somewhat robotic but intelligible voice. Key strengths include speed, low resource usage (no large speech files needed), and extensibility through user-defined voices and dictionaries.
Output can be directed to audio devices, WAV files, or piped for further processing. It supports plain text, SSML markup, and phoneme input. Variants include espeak-ng-mbrola for higher quality with external diphone voices.
Commonly used in projects like Orca screen reader or for scripting TTS needs. Voices are stored in /usr/share/espeak-ng-data.
CAVEATS
Voice quality is robotic (formant synthesis); less natural than neural TTS like piper. High CPU on slow hardware. Accents vary in fluency. Requires espeak-ng-data package for voices.
VOICE SELECTION
Run espeak-ng --voices to list; format 'lang accent variant'. Examples: en-gb, de.
PHONEMES
Use -x -q for IPA output; custom rules via espeak-ng --compile.
HISTORY
Forked from eSpeak in 2015 as 'Next Generation' to revive development after original project stalled. Active maintenance by community; versions follow semantic versioning (e.g., 1.51). Widely adopted in Debian/Ubuntu since 2016.
SEE ALSO
festival(1), flite(1), pico2wave(1)


