Skip to content

feat: resolve all gRPC Phase 2 debt — full bridge fidelity + KernelService implementation#36

Closed
bkrabach wants to merge 27 commits intomainfrom
feat/grpc-v2-debt-fix
Closed

feat: resolve all gRPC Phase 2 debt — full bridge fidelity + KernelService implementation#36
bkrabach wants to merge 27 commits intomainfrom
feat/grpc-v2-debt-fix

Conversation

@bkrabach
Copy link
Collaborator

@bkrabach bkrabach commented Mar 5, 2026

Summary

Resolves all gRPC Phase 2 technical debt — 15 code TODO(grpc-v2) markers eliminated, all 9 KernelService RPCs implemented, full data fidelity across the gRPC bridge layer. Remote cross-language orchestrators can now make callbacks to the kernel.

What's included

Proto Schema Fixes (Layer 1)

  • Added optional keyword to 5 proto fields: Usage.reasoning_tokens, Usage.cache_read_tokens, Usage.cache_creation_tokens, ApprovalRequest.timeout, HookResult.approval_timeout
  • Wire-compatible change — old readers unaffected

Bidirectional Conversions (Layer 2 — bulk of the work)

  • Role enum ↔ proto Role conversion helpers
  • Message ↔ proto Message with all 7 ContentBlock variants (text, thinking, redacted_thinking, tool_call, tool_result, image, reasoning)
  • ChatRequest ↔ proto ChatRequest (with ToolSpec, ResponseFormat, ToolChoice)
  • ChatResponse ↔ proto ChatResponse (with ToolCall, Usage, Degradation)
  • HookResult native → proto (reverse of existing proto → native)

Bridge Fixes (Layer 3)

  • GrpcContextBridge: Full message fidelity — role, name, tool_call_id, metadata, BlockContent all preserved. Provider name populated in get_messages_for_request().
  • GrpcProviderBridge::complete(): No longer a stub — uses ChatRequest/ChatResponse conversions
  • GrpcOrchestratorBridge: session_id stored at construction and transmitted in requests. 5 discarded params documented as by-design (remote orchestrators use KernelService callbacks).

Session Change (Layer 3)

  • Session now holds Arc<Coordinator> internally (was owned by value)
  • New coordinator_shared() method returns Arc<Coordinator> for sharing with KernelService
  • Existing coordinator() / coordinator_mut() APIs preserved

KernelService RPCs (Layer 4 — all 8 stubs now implemented)

  • GetCapability / RegisterCapability — capability store read/write
  • GetMountedModule — module registry query (tools + providers)
  • AddMessage / GetMessages — context manager access with full Message conversion
  • EmitHook / EmitHookAndCollect — hook registry with HookResult conversion
  • CompleteWithProvider — provider lookup + full ChatRequest/ChatResponse roundtrip
  • CompleteWithProviderStreaming — one-shot stream wrap of provider.complete() (true streaming deferred to Provider trait extension)

Cleanup

  • Zero TODO(grpc-v2) markers remain in source code
  • Updated audit design doc to mark all findings as RESOLVED

Design docs

  • docs/plans/2026-03-04-grpc-v2-debt-fix-design.md
  • docs/plans/2026-03-04-grpc-v2-debt-fix-implementation.md

Key design decisions

  1. KernelServiceImpl stays per-session (single Arc, not a session registry)
  2. session_id stored on bridge struct at construction (not passed through Orchestrator trait — would be breaking)
  3. Session stores Arc internally — minimal change, existing API preserved
  4. CompleteWithProviderStreaming wraps single complete() as one-shot stream (true streaming requires Provider trait change — separate future PR)
  5. Type-safe message parsing via serde_json::from_value::() (not hand-parsing JSON)

Test Plan

  • 391 total tests passing (366 unit + 12 E2E + 13 doc tests)
  • cargo clippy clean (0 warnings with -D warnings)
  • Zero TODO(grpc-v2) markers in source code
  • Zero Status::unimplemented stubs in KernelService
  • 5 proto optional fields verified
  • cargo check --features wasm compiles clean
  • CI workflow runs successfully

bkrabach added 27 commits March 4, 2026 22:08
Covers all 15 TODO(grpc-v2) markers and 8 stubbed KernelService RPCs.
4-layer approach: proto schema → conversions → bridge fixes → KernelService.
Single PR, layered commits.
Make the fields Usage.reasoning_tokens, Usage.cache_read_tokens,
Usage.cache_creation_tokens, ApprovalRequest.timeout, and
HookResult.approval_timeout optional in the proto schema and
generated Rust code. This fixes the Some(0)/None ambiguity where
zero values were previously indistinguishable from absent values.

- Proto schema: add `optional` keyword to 5 fields
- Generated code: fields become Option<i32>/Option<f64>
- conversions.rs: use .map() for Option mapping both directions
- grpc_approval.rs: remove map_approval_timeout, pass directly
- grpc_hook.rs: unwrap_or(300.0) for approval_timeout default
- Add usage_some_zero_roundtrips_correctly test
- Replace 2 approval timeout tests with 3 optional-semantics tests
Add Message conversion functions to conversions.rs:
- Private helpers: native_content_block_to_proto and proto_content_block_to_native
  handling all 7 ContentBlock variants (Text, Thinking, RedactedThinking,
  ToolCall, ToolResult, Image, Reasoning) with visibility mapping
- Private helpers: native_visibility_to_proto and proto_visibility_to_native
- Public: native_message_to_proto mapping role, content, name, tool_call_id, metadata_json
- Public: proto_message_to_native returning Result<Message, String> with error on
  None content, empty-string→None for name/tool_call_id, JSON parse for metadata
- 4 tests: message_text_content_roundtrip, message_block_content_text_roundtrip,
  message_with_tool_call_id_roundtrip, message_none_content_returns_error
Add native_chat_request_to_proto() and proto_chat_request_to_native()
in crates/amplifier-core/src/generated/conversions.rs.

Mappings:
- messages: Vec<Message> via native_message_to_proto / proto_message_to_native
- tools: Option<Vec<ToolSpec>> ↔ Vec<ToolSpecProto> (name, description, parameters_json)
- response_format: Option<ResponseFormat> ↔ proto ResponseFormat oneof (Text/Json/JsonSchema)
- temperature, top_p, timeout: Option<f64> ↔ f64 (0.0 sentinel = not set)
- max_output_tokens: Option<i64> ↔ i32 (0 sentinel = not set, with overflow clamp)
- stream: Option<bool> ↔ bool (false sentinel = not set)
- conversation_id, model, reasoning_effort, stop: Option ↔ string/repeated (empty = not set)
- metadata: Option<HashMap> ↔ metadata_json string
- tool_choice: Option<ToolChoice> ↔ String (JSON object strings parsed back to ToolChoice::Object)

Tests: 7 roundtrip tests covering minimal, full-fields, tools, all three
ResponseFormat variants, ToolChoice::Object, and multiple-message requests.

Task 3 of 17.
- Add native_chat_response_to_proto() mapping Vec<ContentBlock> to JSON
  string, ToolCall arguments to arguments_json, Usage/Degradation via
  existing From impls, and finish_reason/metadata with empty-string sentinels
- Add proto_chat_response_to_native() reversing all mappings
- Add 4 roundtrip tests: minimal, full fields, tool_calls, empty content
Add native_hook_result_to_proto() in generated/conversions.rs that maps
all 14 HookResult fields from native models to proto types:

- action: HookAction enum → proto i32 (all 5 variants)
- context_injection_role: ContextInjectionRole → proto i32 (all 3 variants)
- approval_default: Allow/Deny → proto Approve/Deny i32
- user_message_level: UserMessageLevel → proto i32 (all 3 variants)
- approval_timeout: f64 → Option<f64> (always Some, preserves value)
- approval_options: Option<Vec<String>> → Vec<String> (None → empty)
- All Option<String> fields → String (None → empty string)
- data: Option<HashMap> → data_json String (None → empty)
- Bool fields: ephemeral, suppress_output, append_to_last_tool_result (direct)
- extensions: dropped (proto has no extensions field)

Also expose proto_to_native_hook_result as pub(crate) on GrpcHookBridge
to enable roundtrip tests in conversions.rs.

Tests added (12 total):
- Default field mapping verification
- All HookAction variant mappings
- All ContextInjectionRole variant mappings
- All ApprovalDefault variant mappings (Allow→Approve, Deny→Deny)
- All UserMessageLevel variant mappings
- String option fields → proto string fields
- Bool fields mapping
- approval_options Some/None → vec/empty vec
- approval_timeout → Some(f64)
- data_json Some/None → JSON string/empty
- Full roundtrip via GrpcHookBridge::proto_to_native_hook_result
- Replace value_to_proto_message() hand-rolled lossy conversion with
  serde_json::from_value::<Message>() + native_message_to_proto(),
  preserving role, name, tool_call_id, metadata, and all ContentBlock
  variants. Non-parseable values fall back to text-only with a warning.

- Replace proto_message_to_value() text-only mapping (BlockContent → Null)
  with proto_message_to_native() + serde_json::to_value(), producing a
  fully-typed roundtrip for all ContentBlock variants.

- Fix get_messages_for_request() to populate provider_name from
  provider.name() instead of always sending an empty string.

- Remove all TODO(grpc-v2) markers — replaced with proper doc-comments.

- Update tests: replace old tests that documented the lossy behaviour
  (proto_message_to_value_block_content_is_null → not null;
  proto_message_to_value_text_content_roundtrip → uses typed Message path)
  and add full-fidelity tests for role/name/tool_call_id preservation and
  BlockContent round-trip.
…ChatResponse conversion

Replace the Phase-2 stub that returned 'not yet implemented' with a
working implementation that:

1. Converts the native ChatRequest to proto using native_chat_request_to_proto()
2. Calls self.client.lock().await.complete(proto_request).await via tonic
3. Maps gRPC Status errors to ProviderError::Other { message: "gRPC call failed: ..." }
4. Converts the proto ChatResponse back to native using proto_chat_response_to_native()
5. Returns Ok(native_response)

Also adds:
- GrpcProviderBridge::new_for_testing() (#[cfg(test)]) to allow constructing
  the bridge with a pre-built tonic client (lazy channel) without a live server
- complete_attempts_grpc_call_not_stub async test that verifies the method no
  longer returns the stub error message (fails RED before fix, passes GREEN after)

Task 7 of 17.
- Add session_id: String field to GrpcOrchestratorBridge struct
- Update connect() to accept session_id parameter for callback routing
- Populate session_id in OrchestratorExecuteRequest from self.session_id
- Replace TODO(grpc-v2) markers with clear doc comments explaining that
  remote orchestrators access context/providers/tools/hooks/coordinator
  via KernelService callbacks using session_id routing
- Add load_grpc_orchestrator() to transport.rs (parallel to load_grpc_tool)
- Update tests: replace old TODO-presence assertions with new tests that
  verify session_id field declaration, self.session_id usage, and absence
  of the String::new() placeholder

All 329 tests passing, clippy clean.
…rnelService

- Change Session.coordinator field from Coordinator to Arc<Coordinator>
- Add coordinator_shared() method returning Arc<Coordinator> for sharing
  with KernelServiceImpl (prerequisite for Tasks 10-15)
- Update coordinator() to deref through Arc (Arc<T>: Deref<Target=T>)
- Update coordinator_mut() to use Arc::get_mut() with lifecycle docs
  (panics if called after Arc has been shared — setup-only API)
- Add import: use std::sync::Arc in session.rs module scope
- Add 3 tests: coordinator_shared_returns_arc_to_same_coordinator,
  coordinator_and_coordinator_mut_still_work,
  coordinator_shared_reflects_mounted_modules

All 332 tests pass, clippy clean.
- Add proto_chat_request_to_native and native_chat_response_to_proto imports
- Implement complete_with_provider: look up provider by name, convert
  proto ChatRequest to native, call provider.complete(), convert native
  ChatResponse back to proto
- Return Status::not_found when provider is not mounted
- Return Status::invalid_argument when request field is missing
- Return Status::internal on ProviderError

Tests added:
- complete_with_provider_returns_response_from_mounted_provider
- complete_with_provider_not_found_returns_not_found_status
- complete_with_provider_missing_request_returns_invalid_argument
- complete_with_provider_records_call_in_provider
- Source code was already clean (all markers removed in Tasks 0-15)
- Update audit-fix-design.md: mark S-1 through S-4 as RESOLVED with fix details
- Confirmed zero TODO(grpc-v2) markers remain in amplifier-core/crates/

All 15+ TODO(grpc-v2) markers and 8 stubbed KernelService RPCs fixed:
- Context bridge: full bidirectional conversions for all message fields
- Orchestrator bridge: session_id routing, KernelService RPC callbacks
- Approval bridge: optional double timeout in proto
- Conversions: optional int32 for token counts
- All 9 KernelService RPCs implemented

366 tests passing, clippy clean.
@bkrabach
Copy link
Collaborator Author

bkrabach commented Mar 7, 2026

Closing — this work has been merged into the dev/cross-language-sdk integration branch. Will be delivered to main via a single PR when all cross-language SDK work is dogfooted and ready for release.

@bkrabach bkrabach closed this Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant