piper
A fast, local neural text to speech system.
TLDR
Output a WAV [f]ile using a text-to-speech [m]odel (assuming a config file at model_path + .json)
$ echo [Thing to say] | piper -m [path/to/model.onnx] -f [outputfile.wav]
Output a WAV [f]ile using a [m]odel and specifying its JSON [c]onfig file
$ echo ['Thing to say'] | piper -m [path/to/model.onnx] -c [path/to/model.onnx.json] -f [outputfile.wav]
Select a particular speaker in a voice with multiple speakers by specifying the speaker's ID number
$ echo ['Warum?'] | piper -m [de_DE-thorsten_emotional-medium.onnx] --speaker [1] -f [angry.wav]
Stream the output to the mpv media player
$ echo ['Hello world'] | piper -m [en_GB-northern_english_male-medium.onnx] --output-raw -f - | mpv -
Speak twice as fast, with huge gaps between sentences
$ echo ['Speaking twice the speed. With added drama!'] | piper -m [foo.onnx] --length_scale [0.5] --sentence_silence [2] -f [drama.wav]