docs(rfd): mid-turn input via session/inject (queue and steer)#1261
docs(rfd): mid-turn input via session/inject (queue and steer)#1261kennethsinder wants to merge 9 commits into
Conversation
Lift discussion agentclientprotocol#1220 into the v2 RFD bucket per @benbrandt's offer. One method, two modes (queue and steer), one capability. Rides on the v2 prompt lifecycle (user_message echo, state_change, agent-owned messageId). Supersedes PR agentclientprotocol#484 with credit to @SteffenDE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SteffenDE
left a comment
There was a problem hiding this comment.
Some comments from my side. Don't see me as authoritative here though. Happy to close the promptQueueing PR, as it makes more sense to wait for v2 to land.
|
|
||
| The missing piece is not queueing alone. It is the wire shape that distinguishes "deliver soon" from "deliver later," gives each delivery a stable handle for ack and revocation, and echoes the delivery into the session history so multi-client and replay stay coherent. The v2 prompt lifecycle gives us that base. This RFD adds the call sites on top. | ||
|
|
||
| PR [#484](https://github.com/agentclientprotocol/agent-client-protocol/pull/484) (open since February 2026) takes a thinner cut: a `promptQueueing` capability plus an `end_turn` early-finish signal, with prompts queued via a parallel `session/prompt` call. It doesn't address steer, doesn't define delivery semantics, and predates the v2 prompt lifecycle's `state_change` and `user_message` notifications. The follow-up thread also asks how clients edit a queued message. This RFD answers that directly: editing is a pending-inject operation before `user_message`, not a transcript rewrite after delivery. The author also flags that the Claude Agent SDK had no public hook for "when was my queued message inserted," which is precisely what `user_message` echoing solves at the protocol level. The proposal below is intended to supersede #484, with credit to [@SteffenDE](https://github.com/SteffenDE) for raising the question first. |
There was a problem hiding this comment.
It doesn't address steer, doesn't define delivery semantics
I'd say this is not a correct representation of that PR.
promptQueueing was always meant to be steering. If you just want to queue for sending after the current turn, this is trivial to implement by ACP clients.
The early end_turn was the signal to the client when the new message was injected into the context.
There was a problem hiding this comment.
Fair pushback, that was a sloppier read than I should have given #484. Rewrote the paragraph to call your early-end_turn what it is: a steer-via-yield design where the agent yields and the client's parallel session/prompt becomes the next turn. The honest delta vs. this RFD is that yield-and-re-prompt forces a turn boundary at every steer, so a tool call mid-stream surfaces as completing a turn instead of being absorbed, and the client drives the re-prompt rather than handing the agent a payload. Both reach steer; the trade-offs differ.
| } | ||
| ``` | ||
|
|
||
| The agent responds when the inject has been accepted for pending delivery, not when the model has processed it. The response carries the agent-assigned `messageId`, matching the v2 prompt lifecycle pattern: |
There was a problem hiding this comment.
Just as a note: this won't work for the Claude Agent SDK, because it will only generate the id at the point of delivery. In practice, this means that ACP message IDs would be an adapter concern, requiring the ACP code to store its own state persistently somewhere in order to get the same IDs for later loads.
We already have similar problems for message IDs in streaming chunks, so at some point this is probably inevitable.
There was a problem hiding this comment.
Yeah, this is the part where I think the adapter has to pay the cost and we should just say so out loud. Added an FAQ explicitly addressing it: the wire ID has to come back synchronously, so any adapter bridging to an SDK that doesn't expose a pre-delivery ID needs to mint a wire ID at accept-time and remember the mapping until the underlying SDK delivers. Same shape as the streaming-chunk situation message-id already accepts. The two alternatives (client-provided ID, deferred ID-on-delivery) are worse for different reasons that the FAQ spells out.
|
|
||
| ### Semantics | ||
|
|
||
| **`queue`** is the simple one. The agent buffers the content. The agent delivers it as a `user_message` notification once `state_change: idle` fires for the current turn. FIFO across multiple queued injects. If a queue lands on an already-idle session, the agent treats it as a normal user message and starts a turn, though clients should prefer `session/prompt` in that case. |
There was a problem hiding this comment.
Maybe worth discussing if this needs to be part of ACP in the first place, as clients need to implement logic for displaying queued messages anyway, so adding the logic to buffer those client-side and do regular prompt on end_turn might be enough and keeps the protocol smaller. No strong opinions on my side there though.
There was a problem hiding this comment.
Good question and one I went back and forth on. Added a dedicated FAQ ("Does queue need to be in the protocol at all?") that tries to be honest about it: for one client driving one agent, you're right, client-side buffering covers the user-visible behavior and the protocol stays smaller. The cases I think justify the protocol-level queue are multi-client fan-out (with #533, the agent has to be the ordering authority because client-side queues can't agree on FIFO), session/load reconnect (a queue at the agent survives client restart), and headless agents serving multiple thin surfaces. Open to dropping it if the consensus is that those cases are too speculative for v2.
|
|
||
| An agent that only supports buffering at end-of-turn declares `["queue"]`. A tool-loop agent that can break safely between tool calls declares `["queue", "steer"]`. A streaming-only agent that can't do either declares the absence of the capability and clients fall back to `session/cancel`. | ||
|
|
||
| Revoke is part of the pending-inject contract. `pending.replace` is optional. Clients should only offer edit-in-place for already-sent pending messages when it is advertised; otherwise they can keep drafts client-side longer or revoke and send a new inject when losing queue position is acceptable. |
There was a problem hiding this comment.
This doesn't seem clear to be. Is revoke required or not? Can an agent omit pending and therefore signal that revoke is not supported?
There was a problem hiding this comment.
Good catch, the wording was vague. Tightened it: revoke is mandatory if you support session/inject at all. Any agent that can accept a pending inject can drop one before delivery; worst case is a tombstone flag plus skipping the user_message emit. Replace stays opt-in via pending.replace because content editing is genuinely harder than dropping (the content may already be partially serialized into a prompt envelope by the time the replace lands). The asymmetry is now spelled out instead of implied.
|
|
||
| Normal session errors still apply. If the session is unknown, closed, or no longer accepts input, return the same error the agent uses for other session requests. | ||
|
|
||
| Delivery is the point where the agent commits the inject to session history and emits, or has irrevocably queued, the matching `user_message`. From that point, the content may already be in the next model input. Revoke must return `already_delivered`, and the client should expect the `user_message` if it has not seen it yet. There is no separate "revoke a delivered message" surface; that is [session rewind (#1214)](https://github.com/agentclientprotocol/agent-client-protocol/pull/1214) territory. |
There was a problem hiding this comment.
or has irrevocably queued
I'd say an agent should always emit the user_message at the point where it belongs in the chat. If a message is only "irrevocably queued" meaning that is can't be edited any more, but belongs to a later point in history, I'd say that should be an error on revoke / replace.
There was a problem hiding this comment.
Agreed, that hedge created a state that doesn't actually exist on the wire. Tightened the definition: delivery is exactly the user_message emission. Before that, revoke succeeds. At or after that, revoke returns already_delivered. Any internal commit the agent does before emitting is purely for its own bookkeeping.
| ### Interaction with other in-flight requests | ||
|
|
||
| - **`session/cancel` during a pending steer.** Codex [#22815](https://github.com/openai/codex/issues/22815) is a real bug surface here, and we should resolve it in the spec rather than leave it implicit. Recommended behavior: cancel applies to the in-flight turn only. Pending injects (both queue and steer) survive `session/cancel` and deliver as normal once the agent reaches idle. Clients that want to clear everything can call `session/revoke_inject` for pending injects first, then `session/cancel`. Agents should document if they deviate. | ||
| - **`session/request_permission` blocking the turn.** While the agent is awaiting a permission decision, the turn is paused but not idle. Queue accumulates as normal. Steer is held until the permission resolves, then delivered at the next break-point (which may be immediately if the permission decision is the break-point). Agents that prefer to drop steers during permission waits may do so, capability-advertised. |
There was a problem hiding this comment.
Agents that prefer to drop steers during permission waits may do so, capability-advertised.
I'd drop that. If an Agent (SDK) can't handle this, the ACP adapter code should buffer rather than just dropping. But maybe I'm misunderstanding the point.
There was a problem hiding this comment.
Fair, that carve-out was a cop-out and pushed work onto agents that should sit in the adapter. Removed it. The spec is now uniform: pending steer is held during a permission wait and delivered at the next break-point after the decision resolves. If the underlying runtime can't hold it, the ACP adapter buffers. No capability flag for this.
Resolves PR agentclientprotocol#1261 inline comments from @SteffenDE: - Reframe PR agentclientprotocol#484 comparison: acknowledge that agentclientprotocol#484's early end_turn is a legitimate steer-via-yield design. Spell out the trade-off (forced turn boundary at every steer, client drives the re-prompt) rather than dismissing it as not addressing steer. - Make revoke mandatory if session/inject is supported; keep replace opt-in via pending.replace. Explain the asymmetry (drop is cheap, content edit is harder once partially serialized). - Drop the "irrevocably queued" hedge. Delivery is defined strictly as user_message emission; no phantom committed-but-not-delivered state. - Drop the agent-may-drop-steers-during-permission-wait carve-out. Adapter buffers when the underlying runtime can't. - Add FAQ on adapter burden for messageId: addresses Claude Agent SDK not having a stable ID until delivery, explains why the alternatives (client-provided ID, deferred ID) are worse. - Add FAQ on why queue lives in the protocol rather than client-side: multi-client FIFO ordering authority, replay/reconnect, headless agents. Also a prose pass: tighten em-dashes to colons where natural, drop rule-of-three flourishes, remove "not X. It is Y" constructions, match the matter-of-fact tone of the v2 prompt lifecycle RFD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the careful review, @SteffenDE! Replied to each thread inline. Quick summary of what changed in 48a629f + c3c9980:
Open to more rounds. The biggest open design question I think is the queue itself: whether the multi-client / replay / headless cases justify protocol-level standardization at v2 stabilization time, or whether it's cleaner to ship steer alone and let queue land later once #533 firms up. |
Cover the convergence that landed in the past quarter: - Codex app-server `turn/steer` and `turn/interrupt` methods (the closest existing analogue to session/inject, by name). - Cursor 3 "Glass" keybind split (Alt+Enter queues, Cmd+Enter interrupts-and-sends). - Replit Agent 4 "Queue" with explicit queue/steer separation. - Claude Managed Agents user.message events + idle/running/etc. statuses. - Devin sessions API as the single-mode counterpoint. Replace the now-superseded Gemini CLI #17197 reference with #18782 (experimental steering hints, system-role variant), and call out the user-role vs system-role design split in the session/remind FAQ. Strengthen the adapter FAQ with the concrete bug symptom from claude-agent-sdk-typescript#67: messages yielded into the async generator get processed but never appear in the visible transcript, which is exactly the gap user_message echo + stable wire ID closes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Correct Cursor 1.4 vs Cursor 3 "Glass" conflation: the Alt/Cmd+Enter keybind split is from the 1.4 changelog, not the Glass IDE rebrand. - Fix Replit attribution: drop "Agent 4" and "explicit queue/steer framing" — neither is in the source. Update the redirected blog URL. - Add FAQ comparing to Codex's `turn/steer` / `turn/interrupt`: documents the threadId/expectedTurnId/input shape, calls out that Codex has no revoke/replace, and articulates why ACP's pending-edit surface matters for editor clients. - Collapse the 2026-05-24 revision history entry to one line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Capability shape tightened: `modes` is required and non-empty when `inject` is present. Added the `steer_in_stream` capability (`["interrupt"]`, `["finish"]`, or both) so clients can know whether a mid-stream steer truncates the assistant turn or waits for it to finish — closes the previously "agent-defined" interop gap that produced visibly different transcripts for the same client action. - Multi-client ordering claim weakened to per-controller FIFO with agent-defined cross-controller order observable via `user_message` delivery — what the protocol can actually enforce. - Replace position semantics clarified to "within that mode's pending order" now that modes are segregated. - Error code recommendation moved from `-32602 Invalid params` (request wasn't malformed) into the JSON-RPC server-error range (`-32000`–`-32099`), with the exact code left to schema-definition time and `error.data.reason` as the discriminator. - Steer on an idle session is now explicitly an error; clients use `session/prompt` for input that starts a turn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vetted against ACP's existing ErrorCode union and the patterns in request-cancellation, next-edit-suggestions, and the multi-client-attach RFDs. The "exact code TBD" placeholder was duct tape; ACP RFDs assign specific codes paired with a `data.reason` discriminator string. Concrete assignments: - `-32002 "Resource not found"` (already defined): reused for `unknown_message_id`, since the semantics match. - `-32010 "Inject precondition failed"` (new): covers `already_delivered`, `no_running_turn`, `replace_not_supported`, with `error.data.reason` discriminating. Number chosen to leave -32001 through -32009 available for future generic-protocol codes. - `-32601 "Method not found"` (standard JSON-RPC): an acceptable response when `pending.replace` was not advertised, as an alternative to returning -32010 with `reason: "replace_not_supported"`. Updated all four error-response sites and added a schema-additions entry documenting the ErrorCode extension. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts discussion #1220 (
session/injectfor mid-turn queue + steer) into the formal RFD process, per @benbrandt's offer to put it in the v2 bucket.Summary
One method (
session/inject), two modes (queue,steer), one capability. Built on the v2 prompt lifecycle: agent-ownedmessageId,user_messageecho notification, andstate_change. Specifies the protocol shape for queue/steer behavior already present across Cursor, Codex CLI, Claude Code, Windsurf Cascade, Gemini CLI, and others.Relation to existing work
session/remind(system-role context injection; same break-point machinery, different payload contract).Looking for
A champion. @benbrandt offered in the discussion thread; tagging here. I can iterate on framing, split scope, or rework around the v2 prompt lifecycle if a different factoring lands better.
Changes in this PR vs. the discussion post
messageIdownership to agent-owned, returned in the inject response, matching the v2 prompt lifecycle's resolution.$/cancel_requestrevocation withmessageId-based pending operations:session/revoke_inject, optionalsession/replace_inject, and defined sad paths for already-delivered and unknown messages.session/cancelby default; steer is held, not dropped, duringsession/request_permission./injectproposal), and the explicit cross-ecosystem "disambiguate queue/steer" discussions.Process note: drafting used AI for outline and prior-art search; cited links were checked.