Skip to content

[codex] Stream native providers for first-token deadlines#63

Merged
jmlago merged 2 commits into
mainfrom
codex/native-provider-streaming
Jul 1, 2026
Merged

[codex] Stream native providers for first-token deadlines#63
jmlago merged 2 commits into
mainfrom
codex/native-provider-streaming

Conversation

@MuncleUscles

Copy link
Copy Markdown
Member

Summary

  • Add native streaming backends for Anthropic Messages, Google Gemini streamGenerateContent, and Bedrock ConverseStream.
  • Use native streaming internally for non-streaming provider calls whenever first_token_timeout_ms is present, so native providers can satisfy the same first-output deadline contract as OpenAI-compatible routes.
  • Wire native stream adapters through the provider registry and serve.py instead of returning streaming unsupported for native api kinds.
  • Add focused tests for native stream aggregation, non-stream calls switching to stream under first-token timeout, provider registry wiring, and pre-output timeout behavior.

Notes

This is stacked on #62 (codex/first-token-wall-clock). #62 defines the wall-clock first-token contract; this PR extends that contract to native Anthropic, Gemini, and Bedrock paths.

For native streams, the timeout stops once the provider produces text or tool-call output. Text deltas are emitted immediately; tool-only starts are aggregated and returned for pseudo-streaming/final response handling if no text delta was emitted.

Validation

  • .venv/bin/python -m pytest tests/test_native_providers.py tests/test_providers.py tests/test_streaming.py tests/test_codex.py tests/test_antseed_concurrency.py::test_first_token_timeout_uses_internal_streaming_for_json_calls -q (54 passed; pytest cache warning only because checkout is outside writable sandbox root)
  • .venv/bin/python -m pytest tests/test_live_wiring.py -q (11 passed)
  • .venv/bin/python -m pytest tests/test_native_providers.py tests/test_providers.py -q (19 passed; after cleanup)
  • git diff --check
  • PYTHONPYCACHEPREFIX=/private/tmp/unhardcoded-pycache .venv/bin/python -m py_compile provider_adapters/common.py provider_adapters/anthropic.py provider_adapters/google.py provider_adapters/bedrock.py providers.py serve.py tests/test_native_providers.py tests/test_providers.py

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@jmlago, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 36 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b36294c1-13c8-45aa-ac50-bd3bd381c82a

📥 Commits

Reviewing files that changed from the base of the PR and between 0d06b88 and 4e32cb2.

📒 Files selected for processing (8)
  • provider_adapters/anthropic.py
  • provider_adapters/bedrock.py
  • provider_adapters/common.py
  • provider_adapters/google.py
  • providers.py
  • serve.py
  • tests/test_native_providers.py
  • tests/test_providers.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/native-provider-streaming

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@jmlago jmlago force-pushed the codex/native-provider-streaming branch from a9ec9a1 to c3b3d0d Compare July 1, 2026 13:08
@jmlago jmlago marked this pull request as ready for review July 1, 2026 15:44
MuncleUscles and others added 2 commits July 1, 2026 16:45
…test dispatcher wiring

stream_anthropic and stream_google repeated the whole httpx-SSE scaffolding
(open under the first-token deadline, non-2xx classification, the data:/[DONE]/
json line loop, the stream_interrupted/network exception mapping, the empty-content
guard) around a small per-provider event parse. Extract that skeleton once as
common.drive_http_sse(...) driving a StreamAcc the provider's on_event folds into;
each adapter keeps only its wire-event parse and its token-key finalize. Two real
consumers, so this is minimality, not speculation (Axis 1/3). Bedrock is NOT a
consumer: its transport is the boto3 event stream, not httpx SSE — forcing it under
the same driver would be mechanism for a shape it does not share.

Also: extract the identical no-op emit used to drive a stream as a non-stream call
into common.ignore_delta (was inlined in anthropic/google/bedrock call()), and add
the missing coverage for the serve.py wiring — native api_kinds must reach their
real stream backend, not stream_unsupported (test_providers).

Behaviour-preserving: full suite 456 passed / 2 skipped against compose Postgres
(454 before + the 2 new dispatcher tests), identical pass set otherwise.
@jmlago jmlago force-pushed the codex/native-provider-streaming branch from c3b3d0d to 4e32cb2 Compare July 1, 2026 15:47
@jmlago jmlago changed the base branch from codex/first-token-wall-clock to main July 1, 2026 15:47
@jmlago jmlago merged commit 2730bc1 into main Jul 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants