-
Notifications
You must be signed in to change notification settings - Fork 3k
fix: #2010 Improve PCM duration calculation and handle VAD truncation (#2009) #2059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f5bfae1 to
198921a
Compare
|
Greetings, @seratch . Updated calculate_audio_length_ms to return correct PCM16 durations (fixes #2010) Made the VAD speech-start event emit conversation.item.truncate so the server properly records interruptions (fixes #2009) Here’s the test plan: uv run pytest tests/realtime/test_playback_tracker_manual_unit.py tests/realtime/test_playback_tracker.py tests/realtime/test_openai_realtime.py uv run pytest Let me know if you’d like me to tweak or add anything. |
|
@seratch do you have any updates on this one!? |
|
Thanks for working on this. To decide having the change or not, I need to do thorough testing to understand visible changes (and no regressions), but I don't have the bandwidth for it right now. |
|
@seratch I understand, totally alright. No rush from my side. By the way, I was wondering if there’s any Slack or Discord community for contributors that I could join? I’ve really enjoyed working on this project in the past few weeks and it’s been a great learning experience. |
|
This PR is stale because it has been open for 10 days with no activity. |
Summary
In #2010 we noticed realtime PCM16 audio was being reported in microseconds. The root cause was our fallback branch in
calculate_audio_length_ms, which divided by24 * 2and then by 1000, effectively treating every byte as a millisecond. I normalized the format first, split g711 from PCM handling, and introduced explicit constants for the sample rate and sample width so PCM16 now returns the true millisecond duration. This also covers empty buffers and any future lowercase/uppercase format variants.To keep coverage honest, I updated the realtime playback tracker tests to assert against the corrected math using
pytest.approx, and refreshed the manual tracker test to reflect the new millisecond totals.Test plan
Fixes #2010.