Skip to content

feat(opencode): cache-aligned compaction to reuse prefix cache#25100

Open
lloydzhou wants to merge 7 commits intoanomalyco:devfrom
lloydzhou:dev
Open

feat(opencode): cache-aligned compaction to reuse prefix cache#25100
lloydzhou wants to merge 7 commits intoanomalyco:devfrom
lloydzhou:dev

Conversation

@lloydzhou
Copy link
Copy Markdown

@lloydzhou lloydzhou commented Apr 30, 2026

Issue for this PR

Close #25120

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Compaction currently builds its own LLM request with an empty system prompt, no tools, and a filtered message history. Because the request structure differs from normal chat requests, none of the historical messages hit the provider's prompt cache — they're all charged at full input price.

This PR aligns the compaction request to share the exact same prefix as the main agent loop: same system prompt, same tool definitions, same message serialization. The compaction request becomes indistinguishable from a normal chat request up to the point where the summary instruction is appended. This lets the provider serve [system] + [tools] + [dropped messages] from cache, cutting compaction cost by roughly 90%.

See the design rationale: Cache-Aligned Summarization

Cost example (Claude Sonnet 4 pricing, compacting 45K tokens of history):

Tokens Without cache alignment With cache alignment
System prompt ~2K Full: $0.006 Cached: $0.0006
Tools ~3K Full: $0.009 Cached: $0.0009
Dropped messages ~40K Full: $0.120 Cached: $0.012
Summary instruction ~200 Full: $0.0006 Full: $0.0006
Total ~45.2K $0.136 $0.014

~90% cost reduction per compaction. The cached portion (system + tools + dropped messages) drops from $3/MTok to $0.30/MTok — only the short summary instruction pays full price.

Conditions for the optimized path (otherwise falls back to original behavior unchanged):

  • Main model and compaction model share the same provider + model ID
  • The original user message is not a json_schema structured output request

Changes:

  • prompt.ts: Extract resolveStreamContext (shared system+tools resolution), make processor optional in resolveTools so compaction can call it without a live message handle, compute resolved context before calling compaction
  • compaction.ts: Accept optional resolved context with agent/system/tools/user. When present, skip hidden filtering and stripMedia/toolOutputMaxChars so serialized output matches the main loop exactly. Set toolChoice: "none" to prevent tool execution.

How did you verify your code works?

  • bun typecheck passes in packages/opencode
  • Fallback path (model mismatch / json_schema) is completely unchanged from current behavior
  • Reviewed that serialized [system] + [tools] + [messages] prefix is identical between compaction and main loop when resolved context is used

Screenshots / recordings

N/A — backend-only change.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search results, no duplicate PRs were found. The only PR matching these searches is PR #25100 itself (the current PR being analyzed). While there are related PRs that touch on cache optimization and compaction functionality (such as PR #24842 about caching messages and PR #14743 about Anthropic prompt cache hit rates), they address different aspects and are not duplicates of this specific cache-aligned compaction feature.

No duplicate PRs found

@lloydzhou
Copy link
Copy Markdown
Author

dev模式bun run dev使用mimo-v2.5-pro进行测试,读取sqlite数据库记录显示最后一条 compaction 的 cache.read = 14,656,命中缓存。
缓存命中率 = 14,656 / 17,267 ≈ 84.9%

sqlite3 /Users/xxxx/.local/share/opencode/opencode-local.db "SELECT json_extract(data, '$.modelID') as model, json_extract(data, '$.agent') as agent, json_extract(data, '$.role') as role, json_extract(data, '$.tokens') as tokens, datetime(json_extract(data, '$.time.created')/1000, 'unixepoch', 'localtime') as created FROM message ORDER BY json_extract(data, '$.time.created') DESC LIMIT 20

mimo-v2.5-pro|compaction|assistant|{"total":17267,"input":784,"output":513,"reasoning":1314,"cache":{"write":0,"read":14656}}|2026-05-01 19:27:27
|build|user||2026-05-01 19:27:27
mimo-v2.5-pro|build|assistant|{"total":14870,"input":140,"output":15,"reasoning":59,"cache":{"write":0,"read":14656}}|2026-05-01 19:27:17
|build|user||2026-05-01 19:27:17
mimo-v2.5-pro|build|assistant|{"total":14788,"input":13644,"output":13,"reasoning":107,"cache":{"write":0,"read":1024}}|2026-05-01 19:27:00
|build|user||2026-05-01 19:27:00
mimo-v2.5-pro|compaction|assistant|{"total":1478,"input":939,"output":513,"reasoning":26,"cache":{"write":0,"read":0}}|2026-05-01 19:26:14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: ~90% of compaction cost is avoidable cache miss

1 participant