LinuxCommandLibrary
GitHubF-DroidGoogle Play Store

piper

Fast local neural text-to-speech engine

TLDR

Synthesize speech
$ echo "Hello world" | piper --model [en_US-lessac-medium] --output_file [output.wav]
copy
List models
$ piper --list-models
copy
Use specific model
$ piper -m [model.onnx] -c [model.json] < [input.txt] > [output.wav]
copy
Set speaker
$ piper --model [model.onnx] --speaker [0] < [input.txt]
copy
Adjust speaking rate
$ piper --model [model.onnx] --length_scale [1.5] < [input.txt]
copy
JSON input mode
$ echo '{"text": "Hello"}' | piper --model [model.onnx] --json-input
copy

SYNOPSIS

piper [--model file] [--outputfile file] [options_]

DESCRIPTION

piper is a fast, local neural text-to-speech system that generates natural-sounding speech from text using ONNX-based voice models. It runs entirely offline after model download, requiring no internet connection or cloud API.Each model is trained for a specific language and voice. Multi-speaker models support selecting different voice variants via speaker ID. The lengthscale and noisescale parameters control speaking rate and variation, allowing fine-tuning of output characteristics.Input is read from stdin and output as WAV audio. JSON input mode enables structured text processing with per-utterance settings. Models are available for many languages through the Piper project's model repository.

PARAMETERS

-m, --model FILE

ONNX model file.
-c, --config FILE
JSON config file.
--output_file FILE
Output WAV file.
--output_dir DIR
Output directory.
--speaker ID
Speaker ID for multi-speaker.
--length_scale FLOAT
Speaking rate (higher=slower).
--noise_scale FLOAT
Variation in speech.
--json-input
JSON input format.
--list-models
Show available models.

CAVEATS

Models must be downloaded before use (typically 15-75 MB each). Speech quality varies significantly between models and languages. GPU acceleration via CUDA is optional but improves performance for batch processing. Output is always WAV format; convert with ffmpeg for other formats.

HISTORY

Piper was created by Michael Hansen (rhasspy) for offline voice assistants. It provides fast, high-quality TTS suitable for embedded and edge devices.

SEE ALSO

espeak(1), festival(1), mimic(1), say(1)

Copied to clipboard
Kai