feat(opencode): cache-aligned compaction to reuse prefix cache#25100
Open
lloydzhou wants to merge 7 commits intoanomalyco:devfrom
Open
feat(opencode): cache-aligned compaction to reuse prefix cache#25100lloydzhou wants to merge 7 commits intoanomalyco:devfrom
lloydzhou wants to merge 7 commits intoanomalyco:devfrom
Conversation
…ges prefix for prefix cache hit
Contributor
|
The following comment was made by an LLM, it may be inaccurate: Based on my search results, no duplicate PRs were found. The only PR matching these searches is PR #25100 itself (the current PR being analyzed). While there are related PRs that touch on cache optimization and compaction functionality (such as PR #24842 about caching messages and PR #14743 about Anthropic prompt cache hit rates), they address different aspects and are not duplicates of this specific cache-aligned compaction feature. No duplicate PRs found |
1 task
Author
|
dev模式 |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Close #25120
Type of change
What does this PR do?
Compaction currently builds its own LLM request with an empty system prompt, no tools, and a filtered message history. Because the request structure differs from normal chat requests, none of the historical messages hit the provider's prompt cache — they're all charged at full input price.
This PR aligns the compaction request to share the exact same prefix as the main agent loop: same system prompt, same tool definitions, same message serialization. The compaction request becomes indistinguishable from a normal chat request up to the point where the summary instruction is appended. This lets the provider serve
[system] + [tools] + [dropped messages]from cache, cutting compaction cost by roughly 90%.See the design rationale: Cache-Aligned Summarization
Cost example (Claude Sonnet 4 pricing, compacting 45K tokens of history):
~90% cost reduction per compaction. The cached portion (system + tools + dropped messages) drops from $3/MTok to $0.30/MTok — only the short summary instruction pays full price.
Conditions for the optimized path (otherwise falls back to original behavior unchanged):
json_schemastructured output requestChanges:
prompt.ts: ExtractresolveStreamContext(shared system+tools resolution), makeprocessoroptional inresolveToolsso compaction can call it without a live message handle, compute resolved context before calling compactioncompaction.ts: Accept optionalresolvedcontext with agent/system/tools/user. When present, skiphiddenfiltering andstripMedia/toolOutputMaxCharsso serialized output matches the main loop exactly. SettoolChoice: "none"to prevent tool execution.How did you verify your code works?
bun typecheckpasses inpackages/opencode[system] + [tools] + [messages]prefix is identical between compaction and main loop when resolved context is usedScreenshots / recordings
N/A — backend-only change.
Checklist