feat(agent): token-aware history truncation alongside message cap by TatsuKo-Tsukimi · Pull Request #488 · dataelement/Clawith

TatsuKo-Tsukimi · 2026-04-27T08:27:05Z

Summary

Add agents.context_window_tokens field (default 50000) and a new truncate_by_token_budget helper that bounds in-context history by both estimated token cost (primary) and message count (safety cap), preserving assistant.tool_calls ↔ role=tool pairs intact via the same walker as truncate_by_message_count.

Why: message-count alone is a wildly variable proxy for token cost — one 50KB tool result eats more context than 100 short user messages. Token budget gives predictable behavior across heterogeneous traffic; message cap remains as a safety net against pathological tiny-message floods.

Changes

models/agent.py: + context_window_tokens (Integer, default=50000) + DEFAULT_CONTEXT_WINDOW_TOKENS constant
schemas/schemas.py: AgentOut, AgentUpdate (1000 ≤ tokens ≤ 500000)
alembic/versions/add_context_window_tokens.py: idempotent ADD COLUMN IF NOT EXISTS, default 50000 backfills existing agents
services/history_window.py: + truncate_by_token_budget, refactored common walker, JSON-serialized char→token estimate via existing services/token_tracker.estimate_tokens_from_chars (chars/3 — overestimates safely)
api/websocket.py: pass tok_budget to helper, raise DB load to max(ctx_size, 500) so helper has room to choose
api/feishu.py: same pattern at 2 sites (web chat + IM channel paths)
frontend: AgentDetail Settings slider + i18n + types

10 new tests covering token-budget mode (huge-message dropped, both-bounds interaction, atomic pair preservation, orphan defense). 26/26 history_window tests pass.

Other channels (dingtalk/discord/slack/teams/wecom/whatsapp) still use DB-level message-count limit only — they don't get token awareness in this PR but won't crash. Migrating them is follow-up scope.

Stack

PR 2/4 of context-budget hygiene series.

fix(history): atomic pair-aware truncation for tool_call blocks #487 — pair-aware truncation (must merge first, this PR depends on it)
#THIS_PR — token-aware truncation (this PR)
follow-up — tool result truncation
follow-up — parallel tool exec + consolidate

Diff against main shows commits from #487 too (cumulative). Review the delta by looking at the latest commit only. Once #487 merges, this PR's diff naturally shrinks.

Test plan

pytest backend/tests/test_history_window.py — 26/26 pass (16 from fix(history): atomic pair-aware truncation for tool_call blocks #487 + 10 new)
alembic upgrade head SQL shape verified (idempotent IF NOT EXISTS)
Reviewer: default 50000 tokens — too aggressive / too conservative? (covers ~10-15K system prompt + ~35K conversation on 128K context models)
Reviewer: chars/3 token estimator over-counts for English, under-counts for CJK — acceptable risk?
Reviewer: confirm raising DB-level limit to max(ctx_size, 500) won't cause memory issues for tenants with very long conversation history

🤖 Generated with Claude Code

Replace naive `conversation[-ctx_size:]` slicing with a walker that treats assistant.tool_calls and its matching role="tool" messages as one atomic block. Naive slicing can leave an orphan role="tool" at the head when the cut lands mid-pair — OpenAI rejects this with "No tool call found for function call output" (issue dataelement#446). New helper `services/history_window.truncate_by_message_count` walks tail backward, identifies blocks, and includes blocks whole or not at all. Orphan tools are silently dropped regardless of budget. Replaces head-only pop guard in: - app/api/websocket.py (web chat) - app/api/feishu.py (feishu channel) Leaves app/services/llm/caller.py:626 untouched (call_agent_llm short-reply path's hardcoded [-10:] is intentional). Tests: 15 covering empty input, budget edges, parallel tool_calls, multi-pair, head/mid orphans, realistic 60-message invariant. Addresses dataelement#446 failure mode. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 01aabbd906

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T08:32:07Z

+                        _truncated = truncate_by_token_budget(
+                            conversation, tok_budget, message_cap=ctx_size,
+                        )


Keep latest user turn outside history token cap

This truncates conversation after appending the current user message, so the token budget is applied to the newest turn as well as history. When the incoming turn is large (for example a vision message containing [image_data:...base64,...]), truncate_by_token_budget can reject the first block and return [], causing call_llm_with_failover to run without the user’s prompt. That is a functional regression: the model can respond off-context or fail for oversized user inputs, even though this setting is described and implemented elsewhere as a history budget.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 433a359689

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T08:50:32Z

            ctx_size = agent.context_window_size or 100
-            logger.info(f"[WS] Agent: {agent_name}, type: {agent_type}, model_id: {agent.primary_model_id}, ctx: {ctx_size}")
+            from app.models.agent import DEFAULT_CONTEXT_WINDOW_TOKENS
+            tok_budget = getattr(agent, "context_window_tokens", None) or DEFAULT_CONTEXT_WINDOW_TOKENS


Re-read token budget for each websocket turn

websocket_chat reloads agent/model config on each incoming message, but tok_budget is only set once at connection setup, so active sockets keep using the old context_window_tokens value for all later turns. This makes the new setting effectively inert until reconnect (for example, lowering the budget during an incident will not constrain prompt size on existing sessions), which defeats the runtime tuning added in this change.

Useful? React with 👍 / 👎.

Add `agents.context_window_tokens` field (default 50000) and a new `truncate_by_token_budget` helper that bounds in-context history by both estimated token cost (primary) and message count (safety cap), preserving assistant.tool_calls ↔ role=tool pairs intact via the same walker as truncate_by_message_count. Why: message-count alone is a wildly variable proxy for token cost — one 50KB tool result eats more context than 100 short user messages. Token budget gives predictable behavior across heterogeneous traffic; message cap remains as a safety net against pathological tiny-message floods. Changes: - models/agent.py: + context_window_tokens (Integer, default=50000) + DEFAULT_CONTEXT_WINDOW_TOKENS constant - schemas/schemas.py: AgentOut, AgentUpdate (1000 <= tokens <= 500000) - alembic: add_context_window_tokens.py (idempotent IF NOT EXISTS) - services/history_window.py: + truncate_by_token_budget, refactored common walker, JSON-serialized char->token estimate via existing estimate_tokens_from_chars (chars/3 — overestimates safely) - api/websocket.py: pass tok_budget to helper, raise DB load to max(ctx_size, 500) so helper has room to choose - api/feishu.py: same pattern at 2 sites (web chat + IM channel paths) - frontend: AgentDetail Settings slider + i18n + types 10 new tests covering token-budget mode (huge-message dropped, both-bounds interaction, atomic pair preservation, orphan defense). 25/25 pass. Other channels (dingtalk/discord/slack/teams/wecom/whatsapp) still use DB-level message-count limit only — they don't get token awareness in this PR but won't crash. Migrating them is follow-up scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

This was referenced Apr 27, 2026

feat(tools): truncate large tool results with workspace spill #489

Open

perf(tools): parallelize read-only tool calls and consolidate three loops #490

Open

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

TatsuKo-Tsukimi force-pushed the feat/context-window-tokens branch from 01aabbd to 433a359 Compare April 27, 2026 08:44

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

TatsuKo-Tsukimi force-pushed the feat/context-window-tokens branch from 433a359 to 2a0040f Compare April 27, 2026 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): token-aware history truncation alongside message cap#488

feat(agent): token-aware history truncation alongside message cap#488
TatsuKo-Tsukimi wants to merge 2 commits intodataelement:mainfrom
TatsuKo-Tsukimi:feat/context-window-tokens

TatsuKo-Tsukimi commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TatsuKo-Tsukimi commented Apr 27, 2026

Summary

Changes

Stack

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant