Skip to content

[codex] Enforce first-token timeout from attempt start#62

Merged
jmlago merged 2 commits into
mainfrom
codex/first-token-wall-clock
Jul 1, 2026
Merged

[codex] Enforce first-token timeout from attempt start#62
jmlago merged 2 commits into
mainfrom
codex/first-token-wall-clock

Conversation

@MuncleUscles

Copy link
Copy Markdown
Member

Summary

  • Enforce first_token_timeout_ms as a wall-clock deadline from provider attempt start, including opening the HTTP stream and waiting for the first SSE output delta.
  • Apply the contract to OpenAI-compatible providers, Codex streaming, and the Codex aggregate call provider path.
  • Add regression coverage for slow stream-open and slow first-output cases.

Why

Live AntSeed tests showed a route could sit for the provider/httpx timeout before the router observed the first SSE line, so first_token_timeout_ms did not actually bound the attempt from its start. This makes the timeout mean: first output token within the configured deadline, or the candidate fails and fallback can proceed.

Validation

  • .venv/bin/python -m pytest tests/test_streaming.py tests/test_codex.py tests/test_antseed_concurrency.py::test_first_token_timeout_uses_internal_streaming_for_json_calls -q (35 passed; pytest cache write warning only because this checkout is outside the writable sandbox root)
  • git diff --check
  • PYTHONPYCACHEPREFIX=/private/tmp/unhardcoded-pycache .venv/bin/python -m py_compile streaming.py codex_backend.py provider_adapters/openai_compatible.py tests/test_streaming.py tests/test_codex.py

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@jmlago, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 40 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c966d4fa-60cd-408e-9dc8-06c18e8a5028

📥 Commits

Reviewing files that changed from the base of the PR and between f0688cf and 86f143f.

📒 Files selected for processing (6)
  • codex_backend.py
  • provider_adapters/common.py
  • provider_adapters/openai_compatible.py
  • streaming.py
  • tests/test_codex.py
  • tests/test_streaming.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/first-token-wall-clock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

…mmon.py

The first-token deadline (parse first_token_timeout_ms, the before-first-output
await wrapper, and the timeout error) was inlined near-verbatim in three call
sites: codex_backend.call, openai_compatible.stream_openai_compatible, and
streaming.stream_codex. Extract the three helpers once —
first_token_timeout_s(request), before_first_output(awaitable, timeout_s, t0,
saw_output) and first_token_timeout_err(timeout_s, latency_ms) — and have all
three sites use them. saw_output is passed as a callable so the codex/openai
sites gate on their saw_output flag and the pseudo-stream gates on emitted, with
no change in semantics.

Behaviour-preserving: full suite 449 passed / 2 skipped against compose Postgres,
identical before and after. Net -7 lines; kills the duplication the native-provider
streaming PR would otherwise fork a fourth time.
@jmlago jmlago marked this pull request as ready for review July 1, 2026 15:43
@jmlago jmlago merged commit 0d06b88 into main Jul 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants