Skip to content

fix(agents): bound AgentSession close cleanup#5602

Open
shizhigu wants to merge 2 commits intolivekit:mainfrom
shizhigu:codex/bounded-agent-session-close
Open

fix(agents): bound AgentSession close cleanup#5602
shizhigu wants to merge 2 commits intolivekit:mainfrom
shizhigu:codex/bounded-agent-session-close

Conversation

@shizhigu
Copy link
Copy Markdown

@shizhigu shizhigu commented Apr 29, 2026

Summary

This bounds AgentSession shutdown cleanup so a stuck close step does not prevent the session from emitting CloseEvent.

The motivating case is #5497: a participant can disconnect while the agent is still speaking, then cleanup can hang before downstream shutdown callbacks ever run. This PR does not try to prove the exact native/audio playout root cause. It makes the close path defensive: once session close starts, cleanup is best effort, but the close event still gets emitted within a bounded deadline.

Fixes #5497.

Invariant

aclose() emits CloseEvent within _AGENT_SESSION_CLOSE_TIMEOUT regardless of which cleanup step blocks. Steps that still need to release resources after the shared deadline are started in the background with their own bounded timeout and logged failures.

What changed

  • Added an internal close timeout for AgentSession cleanup.
  • Added a close-step helper that logs the step name and close reason, cancels timed-out work, and drains late exceptions so timed-out cleanup does not produce unhandled task warnings.
  • Wrapped close steps that can block: AMD close, AgentTask unwind, activity interrupt/drain/current speech/activity close, forward audio cancellation, recorder/IVR/toolset close, session host close, and room IO close.
  • Bound AgentTask unwind with a visited-activity guard and iteration cap so bad task chains cannot close the same activity repeatedly.
  • Kept the timeout internal. There is no new public API in this PR.

current_speech is awaited through asyncio.shield(...) so the close timeout does not cancel the speech future before activity.aclose() owns speech teardown. Late-result callbacks avoid retaining the AgentSession instance after close.

Testing

  • uv run pytest tests/test_agent_session_close.py
    • 9 passed
  • uv run ruff check livekit-agents/livekit/agents/voice/agent_session.py tests/test_agent_session_close.py
  • make check
  • Earlier branch run: PYTHONPATH="$PWD" uv run pytest tests/test_agent_session.py tests/test_agent_session_close.py -q
    • 27 passed
  • Earlier branch run: PYTHONPATH="$PWD" uv run pytest tests/test_ipc.py -q
    • 8 passed
  • Earlier make unit-tests: 618 passed, 2 skipped; tests/test_room.py setup failed locally because this machine does not have the livekit-server binary on PATH.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentSession close path hangs indefinitely when participant disconnects mid-speech

1 participant