[codex] Enforce first-token timeout from attempt start#62
Conversation
|
Warning Review limit reached
Next review available in: 40 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…mmon.py The first-token deadline (parse first_token_timeout_ms, the before-first-output await wrapper, and the timeout error) was inlined near-verbatim in three call sites: codex_backend.call, openai_compatible.stream_openai_compatible, and streaming.stream_codex. Extract the three helpers once — first_token_timeout_s(request), before_first_output(awaitable, timeout_s, t0, saw_output) and first_token_timeout_err(timeout_s, latency_ms) — and have all three sites use them. saw_output is passed as a callable so the codex/openai sites gate on their saw_output flag and the pseudo-stream gates on emitted, with no change in semantics. Behaviour-preserving: full suite 449 passed / 2 skipped against compose Postgres, identical before and after. Net -7 lines; kills the duplication the native-provider streaming PR would otherwise fork a fourth time.
Summary
first_token_timeout_msas a wall-clock deadline from provider attempt start, including opening the HTTP stream and waiting for the first SSE output delta.Why
Live AntSeed tests showed a route could sit for the provider/httpx timeout before the router observed the first SSE line, so
first_token_timeout_msdid not actually bound the attempt from its start. This makes the timeout mean: first output token within the configured deadline, or the candidate fails and fallback can proceed.Validation
.venv/bin/python -m pytest tests/test_streaming.py tests/test_codex.py tests/test_antseed_concurrency.py::test_first_token_timeout_uses_internal_streaming_for_json_calls -q(35 passed; pytest cache write warning only because this checkout is outside the writable sandbox root)git diff --checkPYTHONPYCACHEPREFIX=/private/tmp/unhardcoded-pycache .venv/bin/python -m py_compile streaming.py codex_backend.py provider_adapters/openai_compatible.py tests/test_streaming.py tests/test_codex.py