[Repo Assist] fix(setup): cancel wizard session before disconnect to prevent stale-session retry errors#718
Conversation
…session retry errors When wizard.next timed out (e.g. Teams channel selection hanging), EnterWizardErrorAsync called DisconnectAsync which nulled _client, then showed "Start wizard again" / "Skip wizard" buttons. CancelCurrentSessionAsync checked _client != null and skipped the wizard.cancel call — leaving the server-side session active. Subsequent "Start wizard again" clicks then hit a gateway "wizard already running" error. Fix: replace await DisconnectAsync() with await CancelCurrentSessionAsync() in both EnterWizardErrorAsync and StartWizardAsync. CancelCurrentSessionAsync sends wizard.cancel (best-effort, catch ignored) then calls DisconnectAsync, so the disconnect still happens. The session cancel is a no-op when _client is already null or _sessionId is empty, so the first-start path is unaffected. Closes #709 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Codex review: needs maintainer review before merge. Reviewed June 7, 2026, 10:05 PM ET / 02:05 UTC. Summary Reproducibility: Not fully. Source inspection shows current master disconnects before cancellation in the reported recovery path, but I do not have a high-confidence live reproduction against a real gateway in this read-only review. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land a narrow cancel-before-disconnect fix only after redacted live wizard recovery proof and the full repository validation set demonstrate that retry and skip work without leaving a gateway session behind. Do we have a high-confidence way to reproduce the issue? Not fully. Source inspection shows current master disconnects before cancellation in the reported recovery path, but I do not have a high-confidence live reproduction against a real gateway in this read-only review. Is this the best way to solve the issue? Likely yes, but not proven. Reusing AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against d1b136347e95. Label changesLabel justifications:
Evidence reviewedAcceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
🤖 This PR was created by Repo Assist, an automated AI assistant.
Closes #709
Root Cause
When the Teams channel setup wizard hit a timeout (
EnterWizardErrorAsync), the code calledDisconnectAsync()immediately — which nulls_clientand tears down the connection — before giving the user any recovery UI. ThenCancelCurrentSessionAsync()checks_client != nullfirst and short-circuits, so nowizard.cancelRPC is ever sent to the gateway. The server-side wizard session stays alive.When the user retries,
StartWizardAsyncsends a freshwizard.startinto an already-active session and receives "wizard already running".A secondary path:
StartWizardAsyncitself also calledDisconnectAsync()at the top (to clean up any pre-existing connection) — same problem if the client was connected when an error payload arrived viaApplyPayloadAsync.Fix
src/OpenClaw.SetupEngine.UI/Pages/WizardPage.xaml.cs(2 locations)EnterWizardErrorAsync: replaceawait DisconnectAsync()withawait CancelCurrentSessionAsync().CancelCurrentSessionAsyncalready callsDisconnectAsync()at the end, uses a 10-second timeout withcatch {}for resilience, and is a no-op when_clientis null or_currentSessionIdis empty — so it is safe to call here and also sendswizard.cancelwhile the connection is still live.StartWizardAsync: same replacement for the "clean up before starting" call at the top of the method.Trade-offs
wizard.cancelwas already the correct cleanup path.CancelCurrentSessionAsynchas a 10-second internal timeout — acceptable for an error-recovery path where we're already in a degraded state.Test Status
./build.ps1skipped — GitVersion MSBuild task requiresGITHUB_ENVpath present on CI runner; not available in agent environment (pre-existing infrastructure limitation, not caused by this change)Add this agentic workflows to your repo
To install this agentic workflow, run