LinuxCommandLibrary

piper

Manage gaming mice profiles on Linux

TLDR

Output a WAV [f]ile using a text-to-speech [m]odel (assuming a configuration file at model_path + .json)

$ echo [Thing to say] | piper -m [path/to/model.onnx] -f [outputfile.wav]
copy

Output a WAV [f]ile using a [m]odel and specifying its JSON [c]onfig file
$ echo [Thing to say] | piper -m [path/to/model.onnx] -c [path/to/model.onnx.json] -f [outputfile.wav]
copy

Select a particular speaker in a voice with multiple speakers by specifying the speaker's ID number
$ echo [Warum?] | piper -m [de_DE-thorsten_emotional-medium.onnx] --speaker [1] -f [angry.wav]
copy

Stream the output to the mpv media player
$ echo [Hello world] | piper -m [en_GB-northern_english_male-medium.onnx] --output-raw -f - | mpv -
copy

Speak twice as fast, with huge gaps between sentences
$ echo [Speaking twice the speed. With added drama!] | piper -m [file.onnx] --length_scale [0.5] --sentence_silence [2] -f [drama.wav]
copy

SYNOPSIS

piper [OPTIONS] --model <MODEL_PATH> [--] [<TEXT>]...

PARAMETERS

--model <MODEL>
    Path to .onnx model file (required)

--output_file <OUTPUT>
    Output WAV/MP3 file path; default: output.wav

--speaker <ID>
    Numeric speaker ID; default: 0

--speaker-name <NAME>
    Speaker name for multi-speaker models

--length-scale <SCALE>
    Control speech speed; default: 1.0 (higher=faster)

--noise-scale <SCALE>
    Phoneme-level variance; default: 0.667

--noise-w <SCALE>
    Word-level prosody noise; default: 0.8

--split-sentences
    Split input on punctuation; default: on

--no-split-sentences
    Disable sentence splitting

--cuda
    Enable CUDA acceleration (if available)

--precision <fp32|fp16>
    Model precision; default: fp32

--output-format <wav|mp3>
    Output format; default: wav

--threads <NUM>
    Worker threads; default: 1

-h, --help
    Show help

-V, --version
    Print version

DESCRIPTION

Piper is a high-quality, lightweight neural text-to-speech (TTS) system designed for low-resource devices like Raspberry Pi, but performant on desktops too.

It uses VITS-based models trained on datasets like LJSpeech, supporting dozens of languages and voices. Piper synthesizes speech entirely on CPU (with optional CUDA), producing natural-sounding audio from text input.

Key features include multi-speaker support, customizable prosody via length/noise scales, sentence splitting, and output in WAV or MP3. Models are compact (~50-150MB), enabling offline use. Ideal for embedded apps, assistants, or accessibility tools.

Unlike cloud TTS, Piper ensures privacy and low latency (<200ms on capable hardware). Voices sound expressive with good intonation. Install via pip (pip install piper-tts) or distro packages; download models from rhasspy.github.io/piper.

CAVEATS

Requires .onnx model files (download separately); CPU-only by default (slow on low-end hardware); MP3 needs libsndfile; no built-in phonemizer for some languages.

EXAMPLE USAGE

piper --model en_US-lessac-medium.onnx --output_file speech.wav "Hello, world!"
Plays via aplay speech.wav.

MODEL SOURCES

Download from https://rhasspy.github.io/piper/; multilingual support (en, de, fr, es, etc.).

HISTORY

Developed by the Rhasspy team (2021-2022) as part of open-source voice assistants for Home Assistant. Evolved from Mozilla TTS; v1.0 in 2022 with ONNX Runtime for speed/portability. Active community contributions; now at v1.2+ with more voices.

SEE ALSO

espeak-ng(1), festival(1), flite(1), festival(1)

Copied to clipboard