piper
Fast local neural text-to-speech engine
TLDR
Synthesize speech
SYNOPSIS
piper [--model file] [--outputfile file] [options_]
DESCRIPTION
piper is a fast, local neural text-to-speech system that generates natural-sounding speech from text using ONNX-based voice models. It runs entirely offline after model download, requiring no internet connection or cloud API.
Each model is trained for a specific language and voice. Multi-speaker models support selecting different voice variants via speaker ID. The lengthscale and noisescale parameters control speaking rate and variation, allowing fine-tuning of output characteristics.
Input is read from stdin and output as WAV audio. JSON input mode enables structured text processing with per-utterance settings. Models are available for many languages through the Piper project's model repository.
PARAMETERS
-m, --model FILE
ONNX model file.-c, --config FILE
JSON config file.--output_file FILE
Output WAV file.--output_dir DIR
Output directory.--speaker ID
Speaker ID for multi-speaker.--length_scale FLOAT
Speaking rate (higher=slower).--noise_scale FLOAT
Variation in speech.--json-input
JSON input format.--list-models
Show available models.
CAVEATS
Requires model download. Quality varies by model. GPU acceleration optional.
HISTORY
Piper was created by Michael Hansen (rhasspy) for offline voice assistants. It provides fast, high-quality TTS suitable for embedded and edge devices.
