fix: support ElevenLabs non legacy voices#1054
fix: support ElevenLabs non legacy voices#1054adriablancafort wants to merge 1 commit intolivekit:mainfrom
Conversation
|
There was a problem hiding this comment.
🔴 isFinal not checked as is_final, causing stream to never complete for non-legacy voices
The PR fixes contextId to also check context_id (snake_case) for non-legacy ElevenLabs voices, but the same snake_case issue is not addressed for data.isFinal on line 554. If the non-legacy API returns is_final (snake_case, consistent with returning context_id), the if (data.isFinal) check will always be falsy.
Root cause and impact
When data.isFinal is never truthy:
stream.markDone()is never called, so#streamDonestaysfalsectx.waiter.resolve()is never called, so thewaiterPromiseinPromise.allatplugins/elevenlabs/src/tts.ts:1080never resolves#cleanupContext(contextId!)is never called, leaking context data- The
audioProcessTaskloop atplugins/elevenlabs/src/tts.ts:1041-1063spins indefinitely because#streamDoneis never set totrue
Audio data may still play (since data.audio is processed before the isFinal check), but the stream never properly terminates. The Promise.all hangs, leading to resource leaks and the synthesize stream never completing.
The fix should mirror the contextId fix:
if (data.isFinal ?? data.is_final) {(Refers to line 554)
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
@adriablancafort should we also handle snake_case for other fields to keep things consistent? Or is this only an issue for context_id?
There was a problem hiding this comment.
🔴 normalizedAlignment not checked as normalized_alignment for non-legacy voices
Following the same pattern as the contextId/context_id fix, data.normalizedAlignment on line 488 may be returned as normalized_alignment by non-legacy ElevenLabs voices. When the user has preferredAlignment: 'normalized' (the default per line 695), the alignment data will be undefined and no timed word transcripts will be generated.
Root cause and impact
At plugins/elevenlabs/src/tts.ts:486-489:
const alignment =
this.#opts.preferredAlignment === 'normalized'
? (data.normalizedAlignment as Record<string, unknown>)
: (data.alignment as Record<string, unknown>);Since preferredAlignment defaults to 'normalized' (plugins/elevenlabs/src/tts.ts:695), non-legacy voices that return normalized_alignment instead of normalizedAlignment will have alignment resolve to undefined. This means the entire alignment processing block at lines 491-546 is skipped, and no timed word transcripts are produced. While audio still plays, transcript synchronization features (word timing) will silently fail.
(Refers to lines 488-489)
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Thanks for catching this issue! |
Description
The current LiveKit ElevenLabs plugin only supports legacy voices.
When I tried to use it with a default ElevenLabs voice, it didn't play the audio.
I realized that with this simple fix it would work with non legacy voices.
Root cause: The ElevenLabs WebSocket API can return the context id as
context_id(snake_case) instead ofcontextId(camelCase). The plugin only readdata.contextId, so when the API sentcontext_idthe lookup failed and messages were dropped (no audio).Changes Made
src/tts.ts, read context id from bothdata.contextIdanddata.context_idwhen handling WebSocket messages (contextId = data.contextId ?? data.context_id).Pre-Review Checklist
Testing
restaurant_agent.tsandrealtime_agent.tswork properly (for major changes)Tested with a non-legacy default ElevenLabs voice; TTS audio now plays. Legacy voices still work.