perf(tools): parallelize read-only tool calls and consolidate three loops by TatsuKo-Tsukimi · Pull Request #490 · dataelement/Clawith

TatsuKo-Tsukimi · 2026-04-27T08:28:07Z

Summary

Add services/tool_loop.py as single source of truth for tool execution. execute_tool_calls runs whitelisted reads concurrently via asyncio.gather; everything else serializes in original order. Tool result messages come back in tool_calls[] order regardless of completion order (provider-strict).

Replaces three drifted loop copies:

llm/caller.py:_process_tool_call (call_llm)
llm/caller.py inner loop in call_agent_llm_with_tools
services/heartbeat.py tool loop

heartbeat.py:_TOOLS_REQUIRING_ARGS was a strict superset of caller.py's (included web_search/jina_search/jina_read) — now unified.

Safety: A2A serial preserved

TOOLS_PARALLELIZABLE is conservative — only stateless reads (read_file, list_files, read_document, search_jina/jina_search/jina_read, web_fetch/web_search, extract_pdf, list_triggers).

A2A tools (send_message_to_agent, send_file_to_agent), filesystem writes (write_file/delete_file/edit_file), external messaging, trigger lifecycle, plaza writes, execute_code all stay serial — preserves relationship checks, dedup invariants, max_fires guarantees. Static isdisjoint test enforces this.

Heartbeat plaza rate limiter (1 post / 2 comments per tick) becomes a pre_execute_hook closure. Plaza tools are non-parallelizable so counters remain race-free.

Performance

A 5-read research turn now completes in max(latencies) instead of sum(latencies) — typically 3-5x speedup on multi-tool rounds.

Tests

14 new tests:

Static A2A safety isdisjoint check
Reverse-latency order preservation (tools complete in reverse but output order matches input)
Parallel wall-clock concurrency (5 reads × 80ms ≤ 200ms)
Mixed parallel + serial
Exception isolation (one tool raises, siblings unaffected)
Empty-args guard
Pre-execute hook short-circuit (heartbeat plaza rate limiter)
on_tool_call running+done events
Callback exception robustness

Stack

PR 4/4 of context-budget hygiene series.

fix(history): atomic pair-aware truncation for tool_call blocks #487 — pair-aware truncation
feat(agent): token-aware history truncation alongside message cap #488 — token-aware truncation
feat(tools): truncate large tool results with workspace spill #489 — tool result truncation
#THIS_PR — parallel tool exec + consolidate (this PR)

This is the last PR in the series. Cumulative diff against main shows all 4 PRs' commits; review only the latest commit's delta until #487-#489 merge.

Test plan

pytest backend/tests/test_tool_loop.py — 14/14 pass
All hygiene unit tests — 56/56 pass (16 + 26 + 16 + 14 across 4 PRs)
Reviewer: confirm TOOLS_PARALLELIZABLE whitelist is conservative enough — any read-shaped tool with hidden side effects we've missed?
Reviewer: callback re-entrancy — on_tool_call may fire from concurrent coroutines, verify frontend handles out-of-order events keyed by call_id
Reviewer: heartbeat plaza counter increments before execute (instead of after) — slightly more conservative; acceptable?

Lines net: caller.py -130, heartbeat.py -29. New helper file (~250 lines) + tests (~280 lines). Despite adding the helper, removed three drifted copies.

🤖 Generated with Claude Code

Replace naive `conversation[-ctx_size:]` slicing with a walker that treats assistant.tool_calls and its matching role="tool" messages as one atomic block. Naive slicing can leave an orphan role="tool" at the head when the cut lands mid-pair — OpenAI rejects this with "No tool call found for function call output" (issue dataelement#446). New helper `services/history_window.truncate_by_message_count` walks tail backward, identifies blocks, and includes blocks whole or not at all. Orphan tools are silently dropped regardless of budget. Replaces head-only pop guard in: - app/api/websocket.py (web chat) - app/api/feishu.py (feishu channel) Leaves app/services/llm/caller.py:626 untouched (call_agent_llm short-reply path's hardcoded [-10:] is intentional). Tests: 15 covering empty input, budget edges, parallel tool_calls, multi-pair, head/mid orphans, realistic 60-message invariant. Addresses dataelement#446 failure mode. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b49ab53947

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T08:34:14Z

+    # Run parallel reads concurrently
+    parallel_outcomes: list[Any] = []
+    if parallel_tcs:
+        parallel_outcomes = await asyncio.gather(
+            *[_run_one(tc, ctx) for tc in parallel_tcs],


Preserve tool execution order across mixed call types

Running all TOOLS_PARALLELIZABLE calls first and only then executing the serial bucket reorders side effects relative to the model-issued tool_calls sequence. In a mixed round like write_file followed by read_file, the read can now execute before the write and return stale/missing data even though the response messages are reassembled in original order. This is a behavior regression from the previous strictly sequential loop and can break tool chains that depend on in-round ordering.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-27T08:34:14Z

+                        plaza_posts_made += 1
                    elif tool_name == "plaza_add_comment":
                        if plaza_comments_made >= 2:
-                            tool_result = "[BLOCKED] You have already made 2 comments this heartbeat. Do not comment again."
-                        else:
-                            tool_result = await execute_tool(tool_name, args, agent_id, agent_creator_id)
-                            plaza_comments_made += 1
-                    else:
-                        tool_result = await execute_tool(tool_name, args, agent_id, agent_creator_id)
-
-                    llm_messages.append(LLMMessage(
-                        role="tool",
-                        tool_call_id=tc["id"],
-                        content=str(tool_result),
-                    ))
+                            return "[BLOCKED] You have already made 2 comments this heartbeat. Do not comment again."
+                        plaza_comments_made += 1


Increment plaza quotas only after successful execution

The heartbeat plaza hook increments plaza_posts_made/plaza_comments_made before execute_tool runs, so a failed plaza_create_post or plaza_add_comment attempt still consumes quota and blocks retries in the same heartbeat tick. Previously counters were incremented only after a successful tool call, so this change can turn transient tool failures into hard false rate-limit blocks.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85ac09002c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T08:55:21Z

+        outs = await asyncio.gather(
+            *[_run_one(tc, ctx) for tc in pending_parallel],
+            return_exceptions=True,


Serialize callback dispatch for parallel tool batches

execute_tool_calls now runs read-only tools via asyncio.gather, so each _run_one emits on_tool_call events in completion order rather than issued order. In the current web clients (frontend/src/pages/Chat.tsx and frontend/src/pages/AgentDetail.tsx), tool_call updates are reconciled by tool name (not call_id), so two same-name calls in one round (e.g. multiple read_file) can have their results/statuses attached to the wrong pending entry when finishes are reversed. This is a regression introduced by parallel batching and will misreport tool outputs in real multi-read turns.

Useful? React with 👍 / 👎.

Add `agents.context_window_tokens` field (default 50000) and a new `truncate_by_token_budget` helper that bounds in-context history by both estimated token cost (primary) and message count (safety cap), preserving assistant.tool_calls ↔ role=tool pairs intact via the same walker as truncate_by_message_count. Why: message-count alone is a wildly variable proxy for token cost — one 50KB tool result eats more context than 100 short user messages. Token budget gives predictable behavior across heterogeneous traffic; message cap remains as a safety net against pathological tiny-message floods. Changes: - models/agent.py: + context_window_tokens (Integer, default=50000) + DEFAULT_CONTEXT_WINDOW_TOKENS constant - schemas/schemas.py: AgentOut, AgentUpdate (1000 <= tokens <= 500000) - alembic: add_context_window_tokens.py (idempotent IF NOT EXISTS) - services/history_window.py: + truncate_by_token_budget, refactored common walker, JSON-serialized char->token estimate via existing estimate_tokens_from_chars (chars/3 — overestimates safely) - api/websocket.py: pass tok_budget to helper, raise DB load to max(ctx_size, 500) so helper has room to choose - api/feishu.py: same pattern at 2 sites (web chat + IM channel paths) - frontend: AgentDetail Settings slider + i18n + types 10 new tests covering token-budget mode (huge-message dropped, both-bounds interaction, atomic pair preservation, orphan defense). 25/25 pass. Other channels (dingtalk/discord/slack/teams/wecom/whatsapp) still use DB-level message-count limit only — they don't get token awareness in this PR but won't crash. Migrating them is follow-up scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Tool calls returning ~50 KB of search hits or PDF extracts used to enter the in-context history verbatim, burning tokens for content the model samples one slice of. New `services/tool_result_truncation.py` spills oversized payloads to `agent_data/<agent_id>/_tool_results/<call_id>.txt` and replaces the in-context body with a head excerpt + a marker pointing the model at `read_file` for retrieval. Threshold: 4000 estimated tokens (~12 KB at chars/3) — soft-start to give the model time to learn the read_file follow-up; can tighten to ~2000 once telemetry confirms reliable use of the marker. Smart head: - JSON-shape responses with a known array key (results/items/data/ entries/hits/documents) keep metadata + first 5 items + a synthetic _truncated_<key>_count field, so search/list responses stay useful. - Otherwise plain head-cut (preferring a line boundary when one is near the cut point). - Vision-injected list payloads (image_data markers) pass through. Sandbox: spill files live under each agent's existing AGENT_DATA_DIR sandbox; cross-agent leak is prevented by the existing read_file boundary. Applied at three sites to prevent drift: - llm/caller.py _process_tool_call (call_llm path) - llm/caller.py call_agent_llm_with_tools inner loop - services/heartbeat.py tool loop agent_context.py rule 3 augmented with a paragraph teaching the model to recognize the marker and use read_file for full content rather than fabricating from prior knowledge. 16 new unit tests covering pass-through, spill, utf-8 round-trip, threshold boundary, JSON-shape preservation, write-failure fallback, and a realistic 50-result jina_search end-to-end. 41/41 total pass. Note: 3 tool loops still exist (caller.py:432, caller.py:763, heartbeat.py:335). Drift prevention is partial here — PR 3 consolidates them into a single execute_tool_calls helper. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…oops Add `services/tool_loop.py` as single source of truth for tool execution. `execute_tool_calls` runs whitelisted reads concurrently via asyncio.gather; everything else serializes in original order. Tool result messages come back in tool_calls[] order regardless of completion order (provider-strict). Replaces three drifted loop copies: - llm/caller.py:_process_tool_call (call_llm) - llm/caller.py inner loop in call_agent_llm_with_tools - services/heartbeat.py tool loop heartbeat.py:_TOOLS_REQUIRING_ARGS was a strict superset of caller.py's (included web_search/jina_search/jina_read) — now unified. TOOLS_PARALLELIZABLE is conservative — only stateless reads (read_file, list_files, search_jina, web_fetch, etc.). A2A tools, write/delete, external messaging, trigger lifecycle, plaza writes, execute_code stay serial, preserving relationship checks, dedup, max_fires invariants. Heartbeat plaza rate limiter (1 post / 2 comments per tick) becomes a pre_execute_hook closure. Plaza tools are non-parallelizable so counters remain race-free. 14 new tests: static A2A safety isdisjoint, reverse-latency order preservation, parallel wall-clock concurrency, mixed parallel+serial, exception isolation, empty-args guard, pre-execute hook short-circuit, on_tool_call events, callback exception robustness. 55/55 pass. Performance: a 5-read research turn now completes in max(latencies) instead of sum(latencies) — typically 3-5x speedup on multi-tool rounds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ebdaab0a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T09:59:51Z

+    if ws_path is not None:
+        from app.services.tool_result_truncation import maybe_truncate_tool_result
+        tool_content = maybe_truncate_tool_result(
+            tool_content, call_id=tc["id"], agent_workspace=ws_path,
+        )


Bypass truncation when reading spill files

This unconditionally re-applies maybe_truncate_tool_result to all tool outputs, including read_file responses. When the model follows the truncation marker and calls read_file("_tool_results/<call_id>.txt"), large files are truncated again here and replaced with a new marker, so the promised “full output” is never actually returned in-context. This is especially problematic for single-line blobs (e.g., minified JSON), where read_file’s line-based paging cannot target subsections, so repeated reads can never recover the missing content.

Useful? React with 👍 / 👎.

The spill-marker design assumes the model can recover oversized tool output by calling read_file on the spill path. But read_file's own output was being passed back through maybe_truncate_tool_result, so a 60KB spill would be re-truncated, the spill file overwritten with its own marker, and the model could never get the original content back — infinite read loop with no progress. Bypass list (read_file, read_document) lives next to TOOLS_PARALLELIZABLE in tool_loop. Both already support offset/limit pagination, so an oversized retrieval is the model's responsibility to page, not ours to silently truncate. Adds three regression tests: - read_file output passes through unchanged, no spill written - read_document same - jina_search (control) still spills and returns the marker Reported by Codex auto-review on dataelement#490.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1979c02190

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T14:19:42Z

+                "status": "done",
+                "result": result,
+                "reasoning_content": ctx.full_reasoning_content,


Emit truncated tool payload in done callback

This callback publishes the raw result even after tool_content may have been replaced with a [truncated ... _tool_results/<call_id>.txt] marker. In our websocket flow, the done payload is what gets persisted to chat history, so oversized tool outputs are later replayed as a clipped fragment without the spill-file pointer, which diverges from what the model saw in-turn and prevents later rounds from recovering the full saved output.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-27T14:19:42Z

+        spill_root.mkdir(parents=True, exist_ok=True)
+        full_path.write_text(tool_content, encoding="utf-8")


Add retention policy for spilled tool-result files

Each oversized tool response is written to a new _tool_results/<call_id>.txt file, and call IDs are per-invocation, so this path grows monotonically. There is no corresponding cleanup in this change, which means long-lived agents with frequent large tool outputs can steadily consume workspace disk until writes/read-modify tools start failing due to exhausted storage.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

TatsuKo-Tsukimi force-pushed the perf/parallel-tool-execution branch from b49ab53 to 85ac090 Compare April 27, 2026 08:47

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

TatsuKo-Tsukimi and others added 3 commits April 27, 2026 17:51

TatsuKo-Tsukimi force-pushed the perf/parallel-tool-execution branch from 85ac090 to 3ebdaab Compare April 27, 2026 09:51

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(tools): parallelize read-only tool calls and consolidate three loops#490

perf(tools): parallelize read-only tool calls and consolidate three loops#490
TatsuKo-Tsukimi wants to merge 5 commits intodataelement:mainfrom
TatsuKo-Tsukimi:perf/parallel-tool-execution

TatsuKo-Tsukimi commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		spill_root.mkdir(parents=True, exist_ok=True)
		full_path.write_text(tool_content, encoding="utf-8")

Conversation

TatsuKo-Tsukimi commented Apr 27, 2026

Summary

Safety: A2A serial preserved

Performance

Tests

Stack

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant