Offline neural machine translation and voice for English and the Romance languages, built from scratch on ggml. Speak or type in one language, read or hear it in another — across English, Spanish, French and Italian, fully on-device: no server, no network at runtime.
The name is Italian for whisper.
A multilingual, offline voice-to-voice interpreter with a native desktop app: choose a source and a target language, speak (or type), read the result, hear it spoken — and every stage is also usable from the command line.
- Translate — English, Spanish, French, Italian, in every direction. Encoder–decoder Transformers (OPUS-MT / Marian) reimplemented on ggml, with greedy and beam-search decoding, an incremental KV cache, sentence splitting, and q8_0 / q4_0 / f16 weights.
- Listen — multilingual speech-to-text via whisper.cpp.
- Speak — text-to-speech via sherpa-onnx running a Piper voice per language, played through miniaudio.
- Desktop app — a Tauri UI (Liquid Glass) wrapping all of the above.
Everything runs locally (Metal + Accelerate on Apple Silicon).
Two kinds of model cover all twelve directions among en / es / fr / it, with no pivoting:
- a single multilingual model for Romance ↔ Romance, where the target language is chosen by a sentence-initial token (
>>fra<<,>>spa<<,>>ita<<); - bilingual models for English ↔ Romance (no token needed).
| From → To | Model | -l token |
|---|---|---|
| en → fr / es / it | tc-en-fr / tc-en-es / tc-en-it |
— |
| fr → en | tc-fr-en |
— |
| it → en | tc-it-en |
— |
| es → en | es-en |
— |
| it/es → fr | tc-itc-itc |
fra |
| fr/es → it | tc-itc-itc |
ita |
| fr/it → es | tc-itc-itc |
spa |
The desktop app picks the right model and token automatically from the chosen languages; from the CLI you select them yourself.
sussurro_core— translation library (model loading, SentencePiece tokenizer, encoder, decoder).sussurro— CLI: translate text (-l <lang>selects the target on multilingual models).sussurro-quantize— quantize a model to q8_0 / q4_0.sussurro-interpret— speech → text (whisper); add-mto also translate, or omit it to transcribe only.sussurro-speak— text → speech (WAV, and--playto play it).scripts/loop.sh— a simple voice-to-voice demo (Italian audio in → English spoken out).app/— the Tauri desktop application.
- ggml, whisper.cpp, miniaudio — git submodules under
third_party/. - SentencePiece — fetched and built automatically by CMake (FetchContent, v0.2.0).
- sherpa-onnx — prebuilt C API library, downloaded manually; only needed for
sussurro-speak. - Tauri v2 (Rust + Node), cpal, hound — for the desktop app in
app/.
git clone --recurse-submodules https://github.com/whispem/sussurro.cpp.git
cd sussurro.cpp
cmake -B build && cmake --build build -jThe first build also fetches/builds SentencePiece and compiles whisper.cpp — a few minutes, once.
If you cloned without --recurse-submodules: git submodule update --init --recursive.
sussurro-speak is built only once sherpa-onnx is present (see Text-to-speech below).
pip install -r requirements.txt
# Romance <-> Romance (target chosen at run time by the -l token)
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-itc-itc --outfile models/tc-itc-itc.gguf
# English <-> Romance (bilingual)
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-en-fr --outfile models/tc-en-fr.gguf
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-en-es --outfile models/tc-en-es.gguf
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-en-it --outfile models/tc-en-it.gguf
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-fr-en --outfile models/tc-fr-en.gguf
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-it-en --outfile models/tc-it-en.gguf
# Spanish -> English (classic OPUS-MT)
python scripts/convert.py --model Helsinki-NLP/opus-mt-es-en --outfile models/es-en.ggufEach .gguf is self-contained (weights, hyper-parameters, and SentencePiece tokenizers), f16 by
default (add --dtype f32 for full precision). To shrink them, quantize from an f32 export:
python scripts/convert.py --model Helsinki-NLP/opus-mt-tc-big-en-fr --outfile models/tc-en-fr.f32.gguf --dtype f32
./build/sussurro-quantize models/tc-en-fr.f32.gguf models/tc-en-fr.q8_0.gguf q8_0bash third_party/whisper.cpp/models/download-ggml-model.sh smallwhisper is already multilingual; the source language is selected at run time (-l).
Download the prebuilt sherpa-onnx C API library (macOS arm64 shown; pick your platform's -shared asset from the releases):
cd third_party
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.13.3/sherpa-onnx-v1.13.3-osx-arm64-shared.tar.bz2
tar xf sherpa-onnx-v1.13.3-osx-arm64-shared.tar.bz2
mv sherpa-onnx-v1.13.3-osx-arm64-shared sherpa-onnx
rm sherpa-onnx-v1.13.3-osx-arm64-shared.tar.bz2
xattr -dr com.apple.quarantine sherpa-onnx # macOS only
cd ..Then one Piper voice per output language:
cd models
for v in fr_FR-tom-medium en_US-ryan-medium es_ES-davefx-medium it_IT-paola-medium; do
curl -SL -O "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-$v.tar.bz2"
tar xf "vits-piper-$v.tar.bz2" && rm "vits-piper-$v.tar.bz2"
done
cd ..Re-run cmake -B build && cmake --build build -j so sussurro-speak gets built.
Translate text (multilingual model needs a target token; bilingual models do not):
./build/sussurro -m models/tc-itc-itc.gguf -p "Ciao, come stai?" -l fra # -> French
./build/sussurro -m models/tc-en-it.gguf -p "Hello, how are you?" # -> ItalianTranscribe speech (16 kHz mono WAV), optionally translating in the same pass:
./build/sussurro-interpret -w third_party/whisper.cpp/models/ggml-small.bin -a clip.wav -l es
./build/sussurro-interpret -w third_party/whisper.cpp/models/ggml-small.bin -a clip.wav -l it -m models/tc-it-en.ggufSynthesize speech (and play it):
./build/sussurro-speak -k models/vits-piper-es_ES-davefx-medium -t "Hola, ¿cómo estás?" --playVoice-to-voice demo — Italian audio in, English spoken out:
./scripts/loop.sh clip.wavapp/ is a desktop front-end built with Tauri v2 (Rust backend + a vanilla web UI) in the Liquid Glass interface: pick a source and target language, then speak or type, read the result, and hear it in that language's voice.
The swap button reverses the two languages.
Prerequisites: Rust 1.77+ and Node.js 20+ (plus Xcode Command Line Tools on macOS).
cd app
npm install
npm run tauri devThe app calls the compiled engine binaries directly, so before running you need: the binaries built (cmake --build build -j at the repo root), the models and voices in place (above), and the REPO constant in app/src-tauri/src/lib.rs set to this repo's absolute path.
Microphone capture is native (via cpal); macOS asks for permission on first use (declared in app/src-tauri/Info.plist).
Note — early build. The app shells out to the local engine binaries using an absolute path, so it runs on the machine where the repo lives; it is not yet a self-contained, shareable bundle. Bundling the engine, models and voices into the app and code-signing it are possible future work.
- A self-contained, code-signed desktop bundle.
- More languages and pairs (Portuguese is one token away on the Romance model).
- A more expressive voice (e.g. Qwen3-TTS).
- Keyboard navigation in the app's language pickers.
sussurro's own source: released under the MIT License.
Built on the work of others, each under its own license:
- Helsinki-NLP OPUS-MT models — CC-BY 4.0.
- ggml / whisper.cpp (ggml-org) — MIT.
- SentencePiece (Google) — Apache-2.0.
- sherpa-onnx (k2-fsa) — Apache-2.0.
- miniaudio (David Reid) — public domain / MIT-0.
- Piper voices (OHF-Voice / rhasspy) — see each voice's model card.
- Tauri (Tauri Programme / CommonsConservancy), cpal, hound — MIT / Apache-2.0.