LinuxCommandLibrary

espeak

Convert text to speech

TLDR

Speak a phrase aloud

$ espeak "[text]"
copy

Speak text from stdin
$ echo "[text]" | espeak
copy

Speak the contents of a [f]ile
$ espeak -f [path/to/file]
copy

Speak using a specific [v]oice
$ espeak -v [voice] "[text]"
copy

Speak at a specific [s]peed (default is 160) and [p]itch (default is 50)
$ espeak -s [speed] -p [pitch] "[text]"
copy

Output the audio to a [w]AV file instead of speaking it directly
$ espeak -w [path/to/output.wav] "[text]"
copy

List all available voices
$ espeak --voices
copy

SYNOPSIS

espeak [options] [[options] <text>]

PARAMETERS

-a, --amplitude
    Amplitude (volume) 0-200, default 100

-b, --byte
    Input translation mode: 1=UTF8, 2=8bit, 4=hex, default UTF8

-d, --default-voice
    Use default voice

-f
    Speak from text file, not command line

-g, --gap
    Gap (msec) between words, default 10

-h, --help
    Display help and exit

-k, --key
    Voice variant parameter, 0-99

-l, --line-length
    Maximum line length (default 7), 0=whole file

-m, --mark
    SSML mode: <speak> and <voice> tags

-p, --pitch
    Pitch 0-99, default 50

-q, --quiet
    Quiet - don't display phonemes

-r, --rate
    Speed in words/min, 80-500, default 160

-s, --speed
    Same as --rate

-S, --sentence
    Recognize sentences: 0=none, 1=some, 2=all, default 1

-v, --voice
    Voice name, e.g. en-us, list with --voices

-w, --wave
    Output speech to WAV file (mono, 22050Hz)

-x
    Output phoneme mnemonics

-X
    Output phonemes in Festival format

--path


    Search for voices/dictionaries in dir

--phonout
    Write phonemes to file

--stdout
    Output to stdout (raw 8-bit signed PCM, 22050Hz)

--version
    Print version and exit

--voices [=lang]
    List voices, filter by lang if specified

DESCRIPTION

eSpeak is a lightweight, open-source speech synthesizer for Linux and other platforms, converting text to speech using formant synthesis. It supports over 100 languages and accents, with voices defined in compact text files for easy customization. At under 2 MB, it's ideal for embedded systems, accessibility tools, automation, and scripting.

eSpeak reads text from stdin, files, or command line, outputting to speakers or WAV files. Key features include adjustable speed, pitch, volume; SSML support; phoneme output; and splitting into words/ sentences. Though robotic-sounding compared to neural TTS, it's fast, portable, and free. Part of eSpeak NG fork, it's widely used in screen readers like Orca and NVDA.

CAVEATS

Robotic voice quality; requires espeak-data package for voices. Limited prosody compared to neural synthesizers like Piper or RHVoice. UTF-8 input recommended.

VOICE SELECTION

Use espeak --voices to list. Format: language variant_name, e.g. espeak -v en+f3.

PUNCTUATION READING

Controls: -t 1 read punctuation (default 0=ignore).

HISTORY

Originated as "speak" by Jonathan Duddington in 1996. Renamed eSpeak ~2007, supporting many languages. Original project inactive post-2010; eSpeak NG fork started 2015 by volunteers, adding improvements like MBROLA support.

SEE ALSO

festival(1), flite(1), pico2wave(1)

Copied to clipboard