[codex] Enforce first-token timeout from attempt start by MuncleUscles · Pull Request #62 · genlayerlabs/unhardcoded

MuncleUscles · 2026-07-01T09:47:06Z

Summary

Enforce first_token_timeout_ms as a wall-clock deadline from provider attempt start, including opening the HTTP stream and waiting for the first SSE output delta.
Apply the contract to OpenAI-compatible providers, Codex streaming, and the Codex aggregate call provider path.
Add regression coverage for slow stream-open and slow first-output cases.

Why

Live AntSeed tests showed a route could sit for the provider/httpx timeout before the router observed the first SSE line, so first_token_timeout_ms did not actually bound the attempt from its start. This makes the timeout mean: first output token within the configured deadline, or the candidate fails and fallback can proceed.

Validation

.venv/bin/python -m pytest tests/test_streaming.py tests/test_codex.py tests/test_antseed_concurrency.py::test_first_token_timeout_uses_internal_streaming_for_json_calls -q (35 passed; pytest cache write warning only because this checkout is outside the writable sandbox root)
git diff --check
PYTHONPYCACHEPREFIX=/private/tmp/unhardcoded-pycache .venv/bin/python -m py_compile streaming.py codex_backend.py provider_adapters/openai_compatible.py tests/test_streaming.py tests/test_codex.py

coderabbitai · 2026-07-01T09:47:14Z

Warning

Review limit reached

@jmlago, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 40 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c966d4fa-60cd-408e-9dc8-06c18e8a5028

📥 Commits

Reviewing files that changed from the base of the PR and between f0688cf and 86f143f.

📒 Files selected for processing (6)

codex_backend.py
provider_adapters/common.py
provider_adapters/openai_compatible.py
streaming.py
tests/test_codex.py
tests/test_streaming.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/first-token-wall-clock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

…mmon.py The first-token deadline (parse first_token_timeout_ms, the before-first-output await wrapper, and the timeout error) was inlined near-verbatim in three call sites: codex_backend.call, openai_compatible.stream_openai_compatible, and streaming.stream_codex. Extract the three helpers once — first_token_timeout_s(request), before_first_output(awaitable, timeout_s, t0, saw_output) and first_token_timeout_err(timeout_s, latency_ms) — and have all three sites use them. saw_output is passed as a callable so the codex/openai sites gate on their saw_output flag and the pseudo-stream gates on emitted, with no change in semantics. Behaviour-preserving: full suite 449 passed / 2 skipped against compose Postgres, identical before and after. Net -7 lines; kills the duplication the native-provider streaming PR would otherwise fork a fourth time.

fix: enforce first-token timeout from attempt start

75238fd

MuncleUscles mentioned this pull request Jul 1, 2026

[codex] Stream native providers for first-token deadlines #63

Merged

jmlago marked this pull request as ready for review July 1, 2026 15:43

jmlago merged commit 0d06b88 into main Jul 1, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] Enforce first-token timeout from attempt start#62

[codex] Enforce first-token timeout from attempt start#62
jmlago merged 2 commits into
mainfrom
codex/first-token-wall-clock

MuncleUscles commented Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MuncleUscles commented Jul 1, 2026

Summary

Why

Validation

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading