Skip to content

fix(tokens): image blocks as flat ~1500 tokens; compute real contextUsagePct#54

Closed
KillerQueen-Z wants to merge 1 commit into
mainfrom
fix/image-context-token-estimation
Closed

fix(tokens): image blocks as flat ~1500 tokens; compute real contextUsagePct#54
KillerQueen-Z wants to merge 1 commit into
mainfrom
fix/image-context-token-estimation

Conversation

@KillerQueen-Z
Copy link
Copy Markdown
Collaborator

Summary

Three related issues that combined to make /context and the renderer's context-window ring widely inaccurate on image-bearing sessions.

Reproduced empirically: same 4-message session with one ~100KB image,

  • Before this fix: /context shows ~75k / 200k (37.8%)
  • After this fix: /context shows ~1.9k / 200k (1.0%) ← matches Anthropic's true count

40x discrepancy.

Root causes

1. `estimateContentPartTokens` JSON.stringify-ed image blocks

When `tool_result.content` was a `[{text}, {image}]` array, the whole array was `JSON.stringify`-ed and the base64 `data` field counted as text. A single normalized image (~140KB base64) became ~70k phantom tokens. Anthropic actually bills `(w*h)/750` ≈ 1100-1500 per image.

Fix: walk the content array block-by-block. Text blocks count as text; image blocks count as a flat 1500 tokens (close to Anthropic's real billing — `Read` tool caps long edge so normalized images land near 1024×768 ≈ 1050 tokens). Unknown block types still stringify, but with `source.data` redacted to `` so future block kinds don't regress.

2. `getAnchoredTokenCount` returned `contextUsagePct: 0` always

Both return paths hardcoded the field. Agent loop emits this verbatim via `kind: 'usage'` events, so the desktop/extension renderer's context ring sat at 0% regardless of real fullness. `/context` was unaffected because the CLI command re-derives `pct` from `estimated` itself.

Fix: compute `(estimated / contextWindow) * 100` using the current model's window from `getContextWindow(_currentModel)`.

3. `loop.ts` rounded `contextPct` to integer

A 200-message session at 0.4% rounded to 0 and froze the renderer's ring. Match `/context`'s `.toFixed(1)` fidelity by keeping one decimal.

Test plan

  • `npm run build` — passes
  • Manual: ran image-bearing session, compared `/context` before/after
  • Verified renderer ring updates correctly on short conversations

Impact

  • Wallet deductions: NOT affected — gateway has its own (working) image handling via `imagePlaceholder` and uses its own input estimate.
  • CLI `/context` display: more accurate — matches Anthropic's real `input_tokens` within ~5%.
  • Desktop/extension context ring: now functional — was stuck at 0%.
  • `/compact` decisions: less spurious — pre-fix, one image could push the perceived fullness past internal thresholds.

…sagePct

Three related issues that combined to make /context and the renderer's
context-window ring widely inaccurate on image-bearing sessions:

1. estimateContentPartTokens: when tool_result.content was an array
   containing an image block, the whole array was JSON.stringify-ed and
   tokenized as text. That counted the base64 `data` field literally, so
   a single ~100KB normalized image was estimated at ~70k tokens (Anthropic
   actually bills (w*h)/750 ≈ 1100-1500). One image read pushed a fresh
   session's /context to 35%+ and could trigger spurious /compact loops.

   Fix: walk the content array block-by-block. Text blocks count as text;
   image blocks count as a flat 1500 tokens (close enough to Anthropic's
   real billing for normalized images — Read tool caps long edge so most
   images land near 1024x768 ≈ 1050 tokens). Unknown block types still
   stringify, but with `source.data` redacted to '<bytes>' to prevent the
   same blow-up.

2. getAnchoredTokenCount: returned `contextUsagePct: 0` hardcoded on both
   return paths. The agent loop emits this verbatim via `kind: 'usage'`
   events, so the desktop/extension renderer's context ring sat at 0%
   regardless of how full the model's context actually was. `/context`
   was unaffected because the CLI command re-derives pct from `estimated`
   itself.

   Fix: compute (estimated / contextWindow) * 100, using the current
   model's window from getContextWindow(_currentModel).

3. loop.ts emitted contextPct rounded to an integer. A 200-message
   session at 0.4% rounded to 0 and froze the renderer's ring. Match
   /context's `.toFixed(1)` fidelity by keeping one decimal place.

Verified locally: a 4-message session with one ~100KB image read now
shows ~1.9k / 200k (1.0%) on /context, matching Anthropic's true input
token count within ~5%. Pre-fix the same session showed ~75k / 200k
(37.8%).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant