[RNE Rewrite] feat: add voice activity detection pipeline#1298
Draft
msluszniak wants to merge 1 commit into
Draft
[RNE Rewrite] feat: add voice activity detection pipeline#1298msluszniak wants to merge 1 commit into
msluszniak wants to merge 1 commit into
Conversation
Port the VAD feature to the rewrite as a pure-TypeScript pipeline on top of the core model.execute primitive (no new C++): - src/extensions/speech/tasks/vad.ts: createVAD runner replicating the native FSMN-VAD algorithm (framing + Hann window + pre-emphasis, chunked inference, thresholding / min-duration / padding / merge). Segments are returned in seconds. Relies on the get_dynamic_dims relaxed input validation for the dynamic frame dimension; the fsmn-vad model is re-exported with it. - src/extensions/speech/vadStreamer.ts: pure streaming state machine driving onSpeechBegin / onSpeechEnd over an accumulating buffer. - src/hooks/useVAD.ts: hook wrapping createVAD + streamer lifecycle. - Register models.vad.FSMN_VAD and export the speech extension. - apps/speech: expo-router demo (mirrors apps/nlp) with a real-time mic VAD screen via react-native-audio-api.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a Voice Activity Detection (VAD) task pipeline and a corresponding
speechexample app. The whole pipeline (feature extraction, chunked inference, segment postprocessing and streaming) runs in TypeScript on top of the coremodel.executeprimitive — no new C++.Introduces a breaking change?
Type of change
Tested on
Testing instructions
speechapp on iOS and AndroidScreenshots
Related issues
Closes #1249
Checklist
Additional notes
get_dynamic_dimsrelaxed input validation from the text-embeddings PR ([RNE Rewrite] Add image and text embeddings pipelines #1247): VAD feeds a variable-length[frames, 512]input tensor per chunk. Outputs are still validated exactly, so the output tensor is pre-allocated at the model-declared shape. Requires [RNE Rewrite] Add image and text embeddings pipelines #1247 to land and thefsmn-vadmodel to be re-exported with aget_dynamic_dimsmethod.[1, frames, classes]with class 0 = non-speech (speech = 1 - p0), matching the current native implementation.