LinuxCommandLibrary

tts

Convert text to speech

TLDR

Run text-to-speech with the default models, writing the output to "tts_output.wav"

$ tts --text "[text]"
copy

List provided models
$ tts --list_models
copy

Query info for a model by idx
$ tts --model_info_by_idx [model_type/model_query_idx]
copy

Query info for a model by name
$ tts --model_info_by_name [model_type/language/dataset/model_name]
copy

Run a text-to-speech model with its default vocoder model
$ tts --text "[text]" --model_name [model_type/language/dataset/model_name]
copy

Run your own text-to-speech model (using the Griffin-Lim vocoder)
$ tts --text "[text]" --model_path [path/to/model.pth] --config_path [path/to/config.json] --out_path [path/to/file.wav]
copy

SYNOPSIS

espeak [options] ["text"]
espeak [options] -f

PARAMETERS

-v
    Selects a specific voice for synthesis. Examples: en+f3 (English, female 3), de (German).

-s
    Sets the speaking speed in words per minute (WPM). Range typically 80-450, default is 175.

-p
    Sets the base pitch of the voice. Range 0-99, default is 50.

-a
    Sets the amplitude (volume) of the speech. Range 0-200, default is 100.

-g
    Sets the pause between words in 10-millisecond units.

-w
    Writes the synthesized speech to a WAV audio file instead of playing it directly.

-f
    Reads input text from the specified file.

--stdin
    Reads input text from standard input (stdin).

-q
    Suppresses output messages to the console (quiet mode).

--voices[=]
    Lists available voices. Optionally specify a language to filter the list.

DESCRIPTION

The term "tts" refers to Text-to-Speech functionality in Linux, not a single standard command. While some users might create aliases or scripts named tts, the actual process involves dedicated command-line utilities. Common tools include espeak, flite, and festival, which synthesize spoken audio from input text. These tools are often used for accessibility, notification systems, or voice feedback in scripts. They typically take text as an argument or from standard input and output audio to the system's sound device or a WAV file. The capabilities, voice quality, and supported languages vary significantly between different TTS engines. For the purpose of this analysis, we will use espeak as a representative example, being one of the most common and versatile open-source TTS engines available on Linux.

CAVEATS

The command tts itself is not a standard Linux command and typically refers to the Text-to-Speech concept. Its functionality depends entirely on installed TTS engines like espeak, flite, or festival. Voice quality and naturalness can vary greatly and may not always sound human-like. Requires audio output capabilities (sound card, drivers) or redirection to a WAV file for use.

COMMON USAGE PATTERNS

A common pattern is piping text directly to a TTS engine or using the -f option to read from a file.
Example: echo "Hello, world!" | espeak
Example: espeak -f my_document.txt -v en-us+f2 -s 150 -w output.wav
For systems without direct audio output or for later playback, generating a WAV file is often preferred.

INTEGRATION WITH SHELL SCRIPTS

TTS commands are frequently integrated into shell scripts for audible notifications, interactive prompts, or logging.
For instance, a script might announce the completion of a long-running task: myscript && espeak "Task completed."
They can also be used in conjunction with other commands for creative outputs.

HISTORY

The development of Text-to-Speech (TTS) systems on Unix-like operating systems dates back decades, with early academic and research projects laying the groundwork. Projects like Festival (developed at the University of Edinburgh) emerged in the mid-1990s, providing robust, research-oriented TTS capabilities. Later, more lightweight and compact engines like Flite (Fast Light Text-to-Speech Engine, CMU) and eSpeak (initially developed by Jonathan Duddington) gained prominence for their efficiency and suitability for embedded systems and command-line usage. These tools have continually evolved to support more languages, improve voice quality, and integrate with modern audio systems, becoming essential components for accessibility and automation in the Linux ecosystem.

SEE ALSO

espeak(1): A compact open-source software text-to-speech synthesizer., flite(1): A small, fast run-time speech synthesis engine (from CMU Speech Group)., festival(1): A general multi-lingual speech synthesis system., aplay(1): Command-line sound player, often used to play WAV files generated by TTS engines.

Copied to clipboard