[Repo Assist] test(chat): add regression coverage for failed-tool + final-assistant sequence by github-actions[bot] · Pull Request #692 · openclaw/openclaw-windows-node

github-actions · 2026-06-05T01:41:42Z

🤖 This PR was created by Repo Assist, an automated AI assistant.

Summary

Adds 5 new ChatTimelineReducer unit tests that pin the expected state for the sequence:

ChatToolStartEvent → ChatToolErrorEvent → ChatMessageEvent(final) → ChatTurnEndEvent

Motivation

This is regression coverage for issue #672 (Windows Companion Chat shows stale failed tool event while web UI shows completed assistant response).

The bug is in the rendering layer — OpenClawChatTimeline.cs reorders entries within each turn so all ToolCall entries appear after all non-ToolCall entries. For the failure case, this puts the failed tool event at the visual bottom (where auto-scroll lands), while the final assistant response is above it and off-screen.

The reducer state is already correct — this PR locks that behavior with tests so any future fix can proceed with confidence.

Tests added (`ChatTimelineReducerTests.cs`)

Test	Verifies
`ToolError_ThenFinalAssistant_ProducesToolAndAssistantEntries`	Both ToolCall (Error) and Assistant entries exist
`ToolError_ThenFinalAssistant_AssistantHasFinalText`	Assistant entry has the expected final text
`ToolError_ThenFinalAssistant_TurnIsEnded`	Turn is ended cleanly, active IDs cleared
`ToolError_ThenFinalAssistant_ToolEntryPrecedesAssistantInState`	ToolCall entry is at index 0, Assistant at index 1 (state insertion order)
`ToolError_ThenFinalAssistant_ToolOutputContainsErrorText`	ToolCall entry carries the error text

Relationship to issue #672

The ToolEntryPrecedesAssistantInState test explicitly documents the state insertion order. This is the baseline for a future rendering fix: the rendering layer should keep the final assistant response as the last visible item after a failed tool.

Test Status

✅ dotnet test OpenClaw.Tray.Tests — 941 total, 939 passed, 2 skipped (infrastructure-related), 0 failed
✅ dotnet test OpenClaw.Shared.Tests — 8 pre-existing failures on master (unrelated to these changes), confirmed identical on both master and this branch
⚠️ ./build.ps1 — not run (Windows PowerShell build script, not available in Linux CI environment; this is an infrastructure limitation)

Generated by 🌈 Repo Assist. Learn more.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@97143ac59cb3a13ef2a77581f929f06719c7402a

…ant sequence (#672) Add 5 new ChatTimelineReducer tests that pin the expected state for the sequence: ToolStart → ToolError → final ChatMessageEvent → ChatTurnEnd. This is regression coverage for issue #672 (Companion shows stale failed tool event while web UI shows completed response). The bug is in the rendering layer (OpenClawChatTimeline reorders ToolCall entries to appear after non-ToolCall entries within a turn, placing the error at visual bottom while the assistant reply scrolls off screen), but the reducer state is correct — these tests document and lock that. Tests added: - ToolError_ThenFinalAssistant_ProducesToolAndAssistantEntries - ToolError_ThenFinalAssistant_AssistantHasFinalText - ToolError_ThenFinalAssistant_TurnIsEnded - ToolError_ThenFinalAssistant_ToolEntryPrecedesAssistantInState - ToolError_ThenFinalAssistant_ToolOutputContainsErrorText Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

clawsweeper · 2026-06-05T01:42:42Z

Codex review: found issues before merge. Reviewed June 4, 2026, 9:52 PM ET / 01:52 UTC.

Summary
This PR adds five ChatTimelineReducer unit tests for the failed-tool followed by final-assistant sequence.

Reproducibility: yes. for the reducer baseline from source inspection: the ToolStart, ToolError, final ChatMessage, and TurnEnd path produces the state shape the PR tests. The actual Windows Companion visual bug remains represented by the linked issue rather than reproduced here in a live UI run.

Review metrics: 2 noteworthy metrics.

Changed files: 1 test file changed. The patch is test-only, so it cannot resolve the linked visible renderer bug by itself.
Added tests: 5 reducer tests added. The new coverage pins reducer state for the failed-tool/final-assistant sequence that the renderer later reorders.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🌊 off-meta tidepool
Patch quality: 🦐 gold shrimp
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] Run or collect the repository-required Windows validation: ./build.ps1, shared tests, and tray tests.
[P2] Decide whether to land this reducer baseline separately or include it with the renderer fix for the linked issue.

Risk before merge

[P1] The PR can land useful baseline coverage, but it does not repair the user-visible Companion renderer issue tracked at Windows Companion Chat shows stale failed tool event while web UI shows completed assistant response #672.
[P1] Repository-required validation is incomplete in the submitted evidence: ./build.ps1 was not run, and shared tests reportedly still had pre-existing failures.

Maintainer options:

Decide the mitigation before merge
Land this reducer baseline only if maintainers want a separate checkpoint, then fix the visible renderer ordering or scroll behavior in the linked issue with Windows Companion proof.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P2] The remaining action is maintainer judgment plus validation, not a narrow automated code repair for this PR branch.

Security
Cleared: The diff only adds unit tests under tests/ and does not change production code, dependencies, CI, secrets handling, or build scripts.

Review findings

[P2] Complete required validation before merge — tests/OpenClaw.Tray.Tests/ChatTimelineReducerTests.cs:263

Review details

Best possible solution:

Land this reducer baseline only if maintainers want a separate checkpoint, then fix the visible renderer ordering or scroll behavior in the linked issue with Windows Companion proof.

Do we have a high-confidence way to reproduce the issue?

Yes for the reducer baseline from source inspection: the ToolStart, ToolError, final ChatMessage, and TurnEnd path produces the state shape the PR tests. The actual Windows Companion visual bug remains represented by the linked issue rather than reproduced here in a live UI run.

Is this the best way to solve the issue?

Unclear as a complete fix: the tests are a reasonable baseline, but the visible bug lives in OpenClawChatTimeline.cs. Maintainers should decide whether to land this checkpoint separately or fold it into the renderer fix.

Full review comments:

[P2] Complete required validation before merge — tests/OpenClaw.Tray.Tests/ChatTimelineReducerTests.cs:263
The PR body says ./build.ps1 was not run and shared tests still had failures. AGENTS.md requires the full build, shared tests, and tray tests after changes, so this needs completed Windows validation or an explicit maintainer override before merge.
Confidence: 0.87

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 99efc50cbc22.

Label changes

Label changes:

add rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The PR is authored by an automation bot and only adds reducer unit tests, so the external-contributor real behavior proof gate does not apply.
remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦐 gold shrimp, so this older rating label is no longer current.
remove status: 👀 ready for maintainer look: Current PR status label is status: ⏳ waiting on author.

Label justifications:

P3: This is low-risk test-only baseline coverage for an existing chat display issue, not the user-visible fix itself.
rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The PR is authored by an automation bot and only adds reducer unit tests, so the external-contributor real behavior proof gate does not apply.

Evidence reviewed

Acceptance criteria:

[P1] ./build.ps1.
[P1] dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restore.
[P1] dotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restore.

What I checked:

PR diff is test-only: PR head commit adds 80 lines to ChatTimelineReducerTests.cs and changes no production files. (tests/OpenClaw.Tray.Tests/ChatTimelineReducerTests.cs:263, 0bad769c9271)
Reducer state supports the baseline: Current reducer marks tool errors on the tool entry, appends or reconciles final assistant messages, and clears active IDs at turn end. (src/OpenClaw.Chat/ChatTimelineReducer.cs:226, 99efc50cbc22)
Renderer still owns the visible ordering behavior: Current OpenClawChatTimeline reorders each turn segment by rendering non-ToolCall entries first and ToolCall entries afterward, matching the linked issue's renderer concern. (src/OpenClaw.Tray.WinUI/Chat/OpenClawChatTimeline.cs:2398, 99efc50cbc22)
PR is not already on current master or a release tag: No local branch or tag contains the PR head commit, so the added tests are not already implemented on current master or the latest local release tag. (0bad769c9271)
Repository validation policy applies: AGENTS.md requires ./build.ps1, shared tests, and tray tests after changes; the PR body reports the Windows build was not run and shared tests had existing failures. (AGENTS.md:1, 99efc50cbc22)
Area history: Blame for the reducer, renderer ordering block, and existing reducer tests points to commit 85445c7 in the current local history. (src/OpenClaw.Tray.WinUI/Chat/OpenClawChatTimeline.cs:2398, 85445c78066b)

Likely related people:

Christine Yan: Current blame for the reducer, renderer ordering block, and existing reducer tests points to the same recent area commit. (role: recent area contributor; confidence: medium; commits: 85445c78066b; files: src/OpenClaw.Chat/ChatTimelineReducer.cs, src/OpenClaw.Tray.WinUI/Chat/OpenClawChatTimeline.cs, tests/OpenClaw.Tray.Tests/ChatTimelineReducerTests.cs)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

github-actions Bot added automation enhancement New feature or request repo-assist labels Jun 5, 2026

github-actions Bot mentioned this pull request Jun 5, 2026

[Repo Assist] Monthly Activity 2026-06 #618

Open

27 tasks

clawsweeper Bot mentioned this pull request Jun 5, 2026

[Repo Assist] fix(chat): preserve insertion order for failed-tool turns in timeline #699

Merged

shanselman marked this pull request as ready for review June 7, 2026 22:23

shanselman merged commit 559f411 into master Jun 7, 2026
7 checks passed

shanselman deleted the repo-assist/improve-tool-error-then-final-assistant-tests-29903afd9bf40cc1 branch June 7, 2026 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repo Assist] test(chat): add regression coverage for failed-tool + final-assistant sequence#692

[Repo Assist] test(chat): add regression coverage for failed-tool + final-assistant sequence#692
shanselman merged 1 commit into
masterfrom
repo-assist/improve-tool-error-then-final-assistant-tests-29903afd9bf40cc1

github-actions Bot commented Jun 5, 2026

Uh oh!

clawsweeper Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions Bot commented Jun 5, 2026

Summary

Motivation

Tests added (ChatTimelineReducerTests.cs)

Relationship to issue #672

Test Status

Uh oh!

clawsweeper Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tests added (`ChatTimelineReducerTests.cs`)

clawsweeper Bot commented Jun 5, 2026 •

edited

Loading