piper

Manage gaming mice profiles on Linux

TLDR

Output a WAV [f]ile using a text-to-speech [m]odel (assuming a configuration file at model_path + .json)

$ echo [Thing to say] | piper -m [path/to/model.onnx] -f [outputfile.wav]

Output a WAV [f]ile using a [m]odel and specifying its JSON [c]onfig file

$ echo [Thing to say] | piper -m [path/to/model.onnx] -c [path/to/model.onnx.json] -f [outputfile.wav]

Select a particular speaker in a voice with multiple speakers by specifying the speaker's ID number

$ echo [Warum?] | piper -m [de_DE-thorsten_emotional-medium.onnx] --speaker [1] -f [angry.wav]

Stream the output to the mpv media player

$ echo [Hello world] | piper -m [en_GB-northern_english_male-medium.onnx] --output-raw -f - | mpv -

Speak twice as fast, with huge gaps between sentences

$ echo [Speaking twice the speed. With added drama!] | piper -m [file.onnx] --length_scale [0.5] --sentence_silence [2] -f [drama.wav]

SYNOPSIS

piper [OPTIONS] --model <MODEL_PATH> [--] [<TEXT>]...

--model <MODEL>
    Path to .onnx model file (required)

--output_file <OUTPUT>
    Output WAV/MP3 file path; default: output.wav

--speaker <ID>
    Numeric speaker ID; default: 0

--speaker-name <NAME>
    Speaker name for multi-speaker models

--length-scale <SCALE>
    Control speech speed; default: 1.0 (higher=faster)

--noise-scale <SCALE>
    Phoneme-level variance; default: 0.667

--noise-w <SCALE>
    Word-level prosody noise; default: 0.8

--split-sentences
    Split input on punctuation; default: on

--no-split-sentences
    Disable sentence splitting

--cuda
    Enable CUDA acceleration (if available)

--precision <fp32|fp16>
    Model precision; default: fp32

--output-format <wav|mp3>
    Output format; default: wav

--threads <NUM>
    Worker threads; default: 1

-h, --help
    Show help

-V, --version
    Print version

DESCRIPTION

Piper is a high-quality, lightweight neural text-to-speech (TTS) system designed for low-resource devices like Raspberry Pi, but performant on desktops too.

It uses VITS-based models trained on datasets like LJSpeech, supporting dozens of languages and voices. Piper synthesizes speech entirely on CPU (with optional CUDA), producing natural-sounding audio from text input.

Key features include multi-speaker support, customizable prosody via length/noise scales, sentence splitting, and output in WAV or MP3. Models are compact (~50-150MB), enabling offline use. Ideal for embedded apps, assistants, or accessibility tools.

Unlike cloud TTS, Piper ensures privacy and low latency (<200ms on capable hardware). Voices sound expressive with good intonation. Install via pip (pip install piper-tts) or distro packages; download models from rhasspy.github.io/piper.

piper

Manage gaming mice profiles on Linux

TLDR

SYNOPSIS

PARAMETERS

DESCRIPTION

CAVEATS

EXAMPLE USAGE

MODEL SOURCES

HISTORY

SEE ALSO