Skip to content

Fix Voice-Activated mode not scrolling with low-gain microphones#61

Open
Ritavdas wants to merge 1 commit into
f:masterfrom
Ritavdas:fix/voice-activated-scroll-vad
Open

Fix Voice-Activated mode not scrolling with low-gain microphones#61
Ritavdas wants to merge 1 commit into
f:masterfrom
Ritavdas:fix/voice-activated-scroll-vad

Conversation

@Ritavdas

Copy link
Copy Markdown

Summary

Fixes the Voice-Activated listening mode (silencePaused — "Scrolls while you speak, pauses when you're silent"), which did not scroll at all for many users even while speaking clearly.

Fixes #49

Root cause

Voice-Activated mode only advances the teleprompter while SpeechRecognizer.isSpeaking is true. That property used a single hard-coded absolute threshold over recent audio levels:

var isSpeaking: Bool {
    let recent = audioLevels.suffix(10)
    let avg = recent.reduce(0, +) / CGFloat(recent.count)
    return avg > 0.08
}

0.08 is too high for common microphones (built-in MacBook mic, AirPods at a normal distance), where speech rarely averages that level. So isSpeaking stayed false and the text never scrolled. Classic and Word Tracking modes don't depend on isSpeaking, which is why only the third mode was affected.

Fix

Replace the fixed threshold with adaptive voice-activity detection in SpeechRecognizer:

  • Adaptive noise floor — rises slowly, falls fast. Steady ambient noise gets absorbed into the floor (prevents false "always speaking"), while the floor stays low for quiet mics.
  • Gain-relative thresholdmax(0.025, noiseFloor * 1.8), so detection works across very different mic gains (quiet built-in mic through loud external mic).
  • 0.25s hangover — natural pauses between words no longer stall scrolling.
  • VAD state resets on each start(with:).

The notch overlay, external display, and browser teleprompters all consume isSpeaking, so they all benefit with no changes to their scroll logic.

Testing

The fix is pure logic, so I validated it with a deterministic harness that runs both the old and new algorithms over simulated audio-level streams at the real audio-buffer rate (~0.02s/frame):

Scenario Old New
Low-gain speech (~0.06) — the bug 0% (never scrolls) 74% ✅
Pure silence 0% 0% ✅
Speech with word gaps 55% (stutters) 100% ✅
Noisy mic + speech burst 18% detects burst, settles in ambient ✅
Stop-after-silence latency 0.22s ✅
High-gain speech (~0.45) 100% 87% ✅

The app also builds cleanly (xcodebuild ... buildBUILD SUCCEEDED).

Note: the harness covers the detection logic, not the live AVAudioEngine tap / Speech framework wiring (unchanged). A quick real-mic run of Voice-Activated mode is still recommended.

The Voice-Activated (silence-paused) listening mode only advances the
teleprompter while SpeechRecognizer.isSpeaking is true. That flag used a
single hard-coded absolute threshold (average level > 0.08) over recent
audio samples. On common microphones (built-in MacBook mic, AirPods at a
normal distance) speech rarely averages that high, so isSpeaking stayed
false and the text never scrolled even while the user was clearly
speaking. Classic and Word Tracking modes don't depend on isSpeaking,
which is why only this mode was affected.

Replace the fixed threshold with adaptive voice-activity detection:
- Continuously track a noise floor that rises slowly and falls fast, so
  steady ambient noise is absorbed (no false 'always speaking') while the
  floor stays low for quiet mics.
- Trigger speech on a gain-relative threshold (max(0.025, floor * 1.8)),
  so detection works across very different mic gains.
- Add a 0.25s hangover so natural pauses between words don't stall
  scrolling.
- Reset VAD state on each start(with:).

Fixes f#49

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The necessary settings and selections have been made; it does not scroll with my speech。

1 participant