[RNE Rewrite] feat: add voice activity detection pipeline by msluszniak · Pull Request #1298 · software-mansion/react-native-executorch

msluszniak · 2026-07-02T15:52:14Z

Description

Adds a Voice Activity Detection (VAD) task pipeline and a corresponding speech example app. The whole pipeline (feature extraction, chunked inference, segment postprocessing and streaming) runs in TypeScript on top of the core model.execute primitive — no new C++.

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Build the speech app on iOS and Android
Test the Voice Activity Detection screen on a physical device (mic + xnnpack; simulator can't record)
Verify speech toggles SPEAKING/SILENT and logs begin/end events
Check the HF repo: https://huggingface.co/software-mansion/react-native-executorch-fsmn-vad

Screenshots

Related issues

Closes #1249

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

Depends on the get_dynamic_dims relaxed input validation from the text-embeddings PR ([RNE Rewrite] Add image and text embeddings pipelines #1247): VAD feeds a variable-length [frames, 512] input tensor per chunk. Outputs are still validated exactly, so the output tensor is pre-allocated at the model-declared shape. Requires [RNE Rewrite] Add image and text embeddings pipelines #1247 to land and the fsmn-vad model to be re-exported with a get_dynamic_dims method.
Segments are returned in seconds (the old native path returned raw sample indices).
The FSMN output contract is assumed to be [1, frames, classes] with class 0 = non-speech (speech = 1 - p0), matching the current native implementation.

Port the VAD feature to the rewrite as a pure-TypeScript pipeline on top of the core model.execute primitive (no new C++): - src/extensions/speech/tasks/vad.ts: createVAD runner replicating the native FSMN-VAD algorithm (framing + Hann window + pre-emphasis, chunked inference, thresholding / min-duration / padding / merge). Segments are returned in seconds. Relies on the get_dynamic_dims relaxed input validation for the dynamic frame dimension; the fsmn-vad model is re-exported with it. - src/extensions/speech/vadStreamer.ts: pure streaming state machine driving onSpeechBegin / onSpeechEnd over an accumulating buffer. - src/hooks/useVAD.ts: hook wrapping createVAD + streamer lifecycle. - Register models.vad.FSMN_VAD and export the speech extension. - apps/speech: expo-router demo (mirrors apps/nlp) with a real-time mic VAD screen via react-native-audio-api.

msluszniak self-assigned this Jul 2, 2026

msluszniak added refactoring feature PRs that implement a new feature labels Jul 2, 2026

msluszniak linked an issue Jul 2, 2026 that may be closed by this pull request

[RNE Rewrite] Speech - add VAD pipeline implementation #1249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RNE Rewrite] feat: add voice activity detection pipeline#1298

[RNE Rewrite] feat: add voice activity detection pipeline#1298
msluszniak wants to merge 1 commit into
rne-rewritefrom
@ms/rewrite-vad

msluszniak commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

msluszniak commented Jul 2, 2026

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant