sussurro

Offline neural machine translation and voice for English and the Romance languages, built from scratch on ggml. Speak or type in one language, read or hear it in another — across English, Spanish, French and Italian, fully on-device: no server, no network at runtime.

The name is Italian for whisper.

Status — v0.7

A multilingual, offline voice-to-voice interpreter with a native desktop app: choose a source and a target language, speak (or type), read the result, hear it spoken — and every stage is also usable from the command line.

Translate — English, Spanish, French, Italian, in every direction. Encoder–decoder Transformers (OPUS-MT / Marian) reimplemented on ggml, with greedy and beam-search decoding, an incremental KV cache, sentence splitting, and q8_0 / q4_0 / f16 weights.
Listen — multilingual speech-to-text via whisper.cpp.
Speak — text-to-speech via sherpa-onnx running a Piper voice per language, played through miniaudio.
Desktop app — a Tauri UI (Liquid Glass) wrapping all of the above.

Everything runs locally (Metal + Accelerate on Apple Silicon).

Languages

Two kinds of model cover all twelve directions among en / es / fr / it, with no pivoting:

a single multilingual model for Romance ↔ Romance, where the target language is chosen by a sentence-initial token (>>fra<<, >>spa<<, >>ita<<);
bilingual models for English ↔ Romance (no token needed).

From → To	Model	`-l` token
en → fr / es / it	`tc-en-fr` / `tc-en-es` / `tc-en-it`	—
fr → en	`tc-fr-en`	—
it → en	`tc-it-en`	—
es → en	`es-en`	—
it/es → fr	`tc-itc-itc`	`fra`
fr/es → it	`tc-itc-itc`	`ita`
fr/it → es	`tc-itc-itc`	`spa`

The desktop app picks the right model and token automatically from the chosen languages; from the CLI you select them yourself.

Components

sussurro_core — translation library (model loading, SentencePiece tokenizer, encoder, decoder).
sussurro — CLI: translate text (-l <lang> selects the target on multilingual models).
sussurro-quantize — quantize a model to q8_0 / q4_0.
sussurro-interpret — speech → text (whisper); add -m to also translate, or omit it to transcribe only.
sussurro-speak — text → speech (WAV, and --play to play it).
scripts/loop.sh — a simple voice-to-voice demo (Italian audio in → English spoken out).
app/ — the Tauri desktop application.

Dependencies

ggml, whisper.cpp, miniaudio — git submodules under third_party/.
SentencePiece — fetched and built automatically by CMake (FetchContent, v0.2.0).
sherpa-onnx — prebuilt C API library, downloaded manually; only needed for sussurro-speak.
Tauri v2 (Rust + Node), cpal, hound — for the desktop app in app/.

Build (engine)

git clone --recurse-submodules https://github.com/whispem/sussurro.cpp.git
cd sussurro.cpp
cmake -B build && cmake --build build -j

The first build also fetches/builds SentencePiece and compiles whisper.cpp — a few minutes, once. If you cloned without --recurse-submodules: git submodule update --init --recursive.

sussurro-speak is built only once sherpa-onnx is present (see Text-to-speech below).

Models & voices

Translation models

pip install -r requirements.txt

# Romance <-> Romance (target chosen at run time by the -l token)
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-itc-itc --outfile models/tc-itc-itc.gguf

# English <-> Romance (bilingual)
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-en-fr --outfile models/tc-en-fr.gguf
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-en-es --outfile models/tc-en-es.gguf
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-en-it --outfile models/tc-en-it.gguf
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-fr-en --outfile models/tc-fr-en.gguf
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-it-en --outfile models/tc-it-en.gguf

# Spanish -> English (classic OPUS-MT)
python scripts/convert.py --model Helsinki-NLP/opus-mt-es-en --outfile models/es-en.gguf

Each .gguf is self-contained (weights, hyper-parameters, and SentencePiece tokenizers), f16 by default (add --dtype f32 for full precision). To shrink them, quantize from an f32 export:

python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-en-fr --outfile models/tc-en-fr.f32.gguf --dtype f32
./build/sussurro-quantize models/tc-en-fr.f32.gguf models/tc-en-fr.q8_0.gguf q8_0

Speech-to-text model (whisper)

bash third_party/whisper.cpp/models/download-ggml-model.sh small

whisper is already multilingual; the source language is selected at run time (-l).

Text-to-speech: sherpa-onnx + a voice per language

Download the prebuilt sherpa-onnx C API library (macOS arm64 shown; pick your platform's -shared asset from the releases):

cd third_party
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.13.3/sherpa-onnx-v1.13.3-osx-arm64-shared.tar.bz2
tar xf sherpa-onnx-v1.13.3-osx-arm64-shared.tar.bz2
mv sherpa-onnx-v1.13.3-osx-arm64-shared sherpa-onnx
rm sherpa-onnx-v1.13.3-osx-arm64-shared.tar.bz2
xattr -dr com.apple.quarantine sherpa-onnx   # macOS only
cd ..

Then one Piper voice per output language:

cd models
for v in fr_FR-tom-medium en_US-ryan-medium es_ES-davefx-medium it_IT-paola-medium; do
  curl -SL -O "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-$v.tar.bz2"
  tar xf "vits-piper-$v.tar.bz2" && rm "vits-piper-$v.tar.bz2"
done
cd ..

Re-run cmake -B build && cmake --build build -j so sussurro-speak gets built.

Command-line usage

Translate text (multilingual model needs a target token; bilingual models do not):

./build/sussurro -m models/tc-itc-itc.gguf -p "Ciao, come stai?" -l fra   # -> French
./build/sussurro -m models/tc-en-it.gguf   -p "Hello, how are you?"       # -> Italian

Transcribe speech (16 kHz mono WAV), optionally translating in the same pass:

./build/sussurro-interpret -w third_party/whisper.cpp/models/ggml-small.bin -a clip.wav -l es
./build/sussurro-interpret -w third_party/whisper.cpp/models/ggml-small.bin -a clip.wav -l it -m models/tc-it-en.gguf

Synthesize speech (and play it):

./build/sussurro-speak -k models/vits-piper-es_ES-davefx-medium -t "Hola, ¿cómo estás?" --play

Voice-to-voice demo — Italian audio in, English spoken out:

./scripts/loop.sh clip.wav

Desktop app (Tauri)

app/ is a desktop front-end built with Tauri v2 (Rust backend + a vanilla web UI) in the Liquid Glass interface: pick a source and target language, then speak or type, read the result, and hear it in that language's voice. The swap button reverses the two languages.

Prerequisites: Rust 1.77+ and Node.js 20+ (plus Xcode Command Line Tools on macOS).

cd app
npm install
npm run tauri dev

The app calls the compiled engine binaries directly, so before running you need: the binaries built (cmake --build build -j at the repo root), the models and voices in place (above), and the REPO constant in app/src-tauri/src/lib.rs set to this repo's absolute path. Microphone capture is native (via cpal); macOS asks for permission on first use (declared in app/src-tauri/Info.plist).

Note — early build. The app shells out to the local engine binaries using an absolute path, so it runs on the machine where the repo lives; it is not yet a self-contained, shareable bundle. Bundling the engine, models and voices into the app and code-signing it are possible future work.

Roadmap

A self-contained, code-signed desktop bundle.
More languages and pairs (Portuguese is one token away on the Romance model).
A more expressive voice (e.g. Qwen3-TTS).
Keyboard navigation in the app's language pickers.

License & credits

sussurro's own source: released under the MIT License.

Built on the work of others, each under its own license:

Helsinki-NLP OPUS-MT models — CC-BY 4.0.
ggml / whisper.cpp (ggml-org) — MIT.
SentencePiece (Google) — Apache-2.0.
sherpa-onnx (k2-fsa) — Apache-2.0.
miniaudio (David Reid) — public domain / MIT-0.
Piper voices (OHF-Voice / rhasspy) — see each voice's model card.
Tauri (Tauri Programme / CommonsConservancy), cpal, hound — MIT / Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
app		app
scripts		scripts
src		src
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sussurro

Status — v0.7

Languages

Components

Dependencies

Build (engine)

Models & voices

Translation models

Speech-to-text model (whisper)

Text-to-speech: sherpa-onnx + a voice per language

Command-line usage

Desktop app (Tauri)

Roadmap

License & credits

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sussurro

Status — v0.7

Languages

Components

Dependencies

Build (engine)

Models & voices

Translation models

Speech-to-text model (whisper)

Text-to-speech: sherpa-onnx + a voice per language

Command-line usage

Desktop app (Tauri)

Roadmap

License & credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages