LinuxCommandLibrary

espeak-ng

Synthesize text to speech

TLDR

Speak a phrase aloud

$ espeak-ng "[text]"
copy

Speak text from stdin
$ echo "[text]" | espeak-ng
copy

Speak the contents of a [f]ile
$ espeak-ng -f [path/to/file]
copy

Speak using a specific [v]oice
$ espeak-ng -v [voice] "[text]"
copy

Speak at a specific [s]peed (default is 175) and [p]itch (default is 50)
$ espeak-ng -s [speed] -p [pitch] "[text]"
copy

Output the audio to a [w]AV file instead of speaking it directly
$ espeak-ng -w [path/to/output.wav] "[text]"
copy

List all available voices
$ espeak-ng --voices
copy

SYNOPSIS

espeak-ng [options] [<text> | -f <file> | -]

PARAMETERS

-f <file>
    Specify file(s) containing text to speak

-w <wavefile>
    Write speech as WAV file instead of playing

-v <VOICE>
    Voice name, e.g., 'en-us', 'fr'; list with 'espeak-ng --voices'

-s <SPEED>
    Speed in words-per-minute (80-500, default 160)

-p <PITCH>
    Pitch change (0-99, default 50)

-a <VOLUME>
    Amplitude/volume (0-200, default 100)

-g <GAP>
    Gap (ms) between sentences (default 14)

-l <LINE-LENGTH>
    Screen reader mode: line length (default 7)

-m
    Recognize SSML markup tags

-x
    Print phoneme mnemonics instead of speech

--path=<DIR>
    Search for voices/dictionaries in directory

--voices[=<LANG>]
    List available voices (filter by language)

--version
    Print version info

-h, --help
    Show usage summary

DESCRIPTION

eSpeak NG is a lightweight, formant-based speech synthesizer that converts text to speech for over 100 languages and accents. It produces clear pronunciation with a compact footprint, making it ideal for embedded systems, accessibility tools, and screen readers on Linux and other platforms.

Using "international English phonetic alphabet" rules, it generates speech via formant synthesis, resulting in a somewhat robotic but intelligible voice. Key strengths include speed, low resource usage (no large speech files needed), and extensibility through user-defined voices and dictionaries.

Output can be directed to audio devices, WAV files, or piped for further processing. It supports plain text, SSML markup, and phoneme input. Variants include espeak-ng-mbrola for higher quality with external diphone voices.

Commonly used in projects like Orca screen reader or for scripting TTS needs. Voices are stored in /usr/share/espeak-ng-data.

CAVEATS

Voice quality is robotic (formant synthesis); less natural than neural TTS like piper. High CPU on slow hardware. Accents vary in fluency. Requires espeak-ng-data package for voices.

VOICE SELECTION

Run espeak-ng --voices to list; format 'lang accent variant'. Examples: en-gb, de.

PHONEMES

Use -x -q for IPA output; custom rules via espeak-ng --compile.

HISTORY

Forked from eSpeak in 2015 as 'Next Generation' to revive development after original project stalled. Active maintenance by community; versions follow semantic versioning (e.g., 1.51). Widely adopted in Debian/Ubuntu since 2016.

SEE ALSO

festival(1), flite(1), pico2wave(1)

Copied to clipboard