piper
Fast local neural text-to-speech engine
TLDR
SYNOPSIS
piper [--model file] [--outputfile file] [options_]
DESCRIPTION
piper is a fast, local neural text-to-speech system that generates natural-sounding speech from text using ONNX-based voice models. It runs entirely offline after model download, requiring no internet connection or cloud API.Each model is trained for a specific language and voice. Multi-speaker models support selecting different voice variants via speaker ID. The lengthscale and noisescale parameters control speaking rate and variation, allowing fine-tuning of output characteristics.Input is read from stdin and output as WAV audio. JSON input mode enables structured text processing with per-utterance settings. Models are available for many languages through the Piper project's model repository.
PARAMETERS
-m, --model FILE
ONNX model file.-c, --config FILE
JSON config file.--output_file FILE
Output WAV file.--output_dir DIR
Output directory.--speaker ID
Speaker ID for multi-speaker.--length_scale FLOAT
Speaking rate (higher=slower).--noise_scale FLOAT
Variation in speech.--json-input
JSON input format.--list-models
Show available models.
CAVEATS
Models must be downloaded before use (typically 15-75 MB each). Speech quality varies significantly between models and languages. GPU acceleration via CUDA is optional but improves performance for batch processing. Output is always WAV format; convert with ffmpeg for other formats.
HISTORY
Piper was created by Michael Hansen (rhasspy) for offline voice assistants. It provides fast, high-quality TTS suitable for embedded and edge devices.
