Fix Voice-Activated mode not scrolling with low-gain microphones#61
Open
Ritavdas wants to merge 1 commit into
Open
Fix Voice-Activated mode not scrolling with low-gain microphones#61Ritavdas wants to merge 1 commit into
Ritavdas wants to merge 1 commit into
Conversation
The Voice-Activated (silence-paused) listening mode only advances the teleprompter while SpeechRecognizer.isSpeaking is true. That flag used a single hard-coded absolute threshold (average level > 0.08) over recent audio samples. On common microphones (built-in MacBook mic, AirPods at a normal distance) speech rarely averages that high, so isSpeaking stayed false and the text never scrolled even while the user was clearly speaking. Classic and Word Tracking modes don't depend on isSpeaking, which is why only this mode was affected. Replace the fixed threshold with adaptive voice-activity detection: - Continuously track a noise floor that rises slowly and falls fast, so steady ambient noise is absorbed (no false 'always speaking') while the floor stays low for quiet mics. - Trigger speech on a gain-relative threshold (max(0.025, floor * 1.8)), so detection works across very different mic gains. - Add a 0.25s hangover so natural pauses between words don't stall scrolling. - Reset VAD state on each start(with:). Fixes f#49 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the Voice-Activated listening mode (
silencePaused— "Scrolls while you speak, pauses when you're silent"), which did not scroll at all for many users even while speaking clearly.Fixes #49
Root cause
Voice-Activated mode only advances the teleprompter while
SpeechRecognizer.isSpeakingistrue. That property used a single hard-coded absolute threshold over recent audio levels:0.08is too high for common microphones (built-in MacBook mic, AirPods at a normal distance), where speech rarely averages that level. SoisSpeakingstayedfalseand the text never scrolled. Classic and Word Tracking modes don't depend onisSpeaking, which is why only the third mode was affected.Fix
Replace the fixed threshold with adaptive voice-activity detection in
SpeechRecognizer:max(0.025, noiseFloor * 1.8), so detection works across very different mic gains (quiet built-in mic through loud external mic).start(with:).The notch overlay, external display, and browser teleprompters all consume
isSpeaking, so they all benefit with no changes to their scroll logic.Testing
The fix is pure logic, so I validated it with a deterministic harness that runs both the old and new algorithms over simulated audio-level streams at the real audio-buffer rate (~0.02s/frame):
The app also builds cleanly (
xcodebuild ... build→ BUILD SUCCEEDED).