Fix Windows daemon restart and self-update flow#1076
Conversation
| self.shutdown_condvar.notify_all(); | ||
| } |
There was a problem hiding this comment.
🟡 Explicit Shutdown control request does not reset shutdown_action, causing unintended restart
When the update check loop or uptime check calls request_restart() or request_restart_after_update(), they store a non-Stop value in shutdown_action before calling request_shutdown(). However, an explicit ControlRequest::Shutdown (e.g. from git-ai bg shutdown) at src/daemon.rs:7389 only calls request_shutdown(), which never resets shutdown_action back to Stop. If the timing aligns (update/uptime check sets a restart intent, then user sends an explicit shutdown before the daemon finishes draining), the daemon will still attempt to restart in src/commands/daemon.rs:199-200 despite the explicit stop request. The request_shutdown method at src/daemon.rs:3760 should reset shutdown_action to DaemonExitAction::Stop so that an explicit shutdown always overrides a previously-set restart intent.
(Refers to lines 3760-3773)
Prompt for agents
The problem is in ActorDaemonCoordinator::request_shutdown (src/daemon.rs around line 3760). This method sets shutting_down to true and notifies waiters, but it does NOT reset shutdown_action to DaemonExitAction::Stop. The request_restart and request_restart_after_update methods store a non-Stop value in shutdown_action BEFORE calling request_shutdown. An explicit ControlRequest::Shutdown (processed at src/daemon.rs:7389) calls request_shutdown() which preserves the restart intent.
To fix: request_shutdown should reset shutdown_action to Stop, ensuring that any explicit shutdown overrides a previously set restart intent. The request_restart and request_restart_after_update methods already store their desired action before calling request_shutdown, so the ordering needs to be: request_restart stores the action, then request_shutdown would normally reset it — this won't work.
Alternative approach: Instead of modifying request_shutdown, add a separate request_stop method that resets shutdown_action to Stop and then calls request_shutdown, and use it from the ControlRequest::Shutdown handler at line 7389. Or, reverse the order in request_restart/request_restart_after_update to call request_shutdown first and then store the action (using a compare-and-swap to avoid overwriting a Stop that was set by an explicit shutdown). The cleanest approach is probably to make the Shutdown control handler explicitly store DaemonExitAction::Stop before calling request_shutdown.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Patched this by adding an explicit request_stop() path and using it for ControlRequest::Shutdown, so a user-initiated stop now overrides any previously queued restart or restart-after-update intent. I also reran the daemon restart/update lifecycle tests locally after the change.
d4e99ce to
19515ab
Compare
There was a problem hiding this comment.
🚩 Health check calls plain request_shutdown(), preserving a possibly stale RestartAfterUpdate action
The daemon_socket_health_check_loop at src/daemon.rs:7994 calls coordinator.request_shutdown() (the plain version) rather than coordinator.request_stop(). If the update check loop at src/daemon.rs:8033 had previously stored RestartAfterUpdate before the health check detected socket failure, the health check's spawn_self_restart() at src/daemon.rs:7983 would start a detached daemon AND handle_run at src/commands/daemon.rs:191-207 would also attempt a self-update+restart. The daemon lock file prevents two instances from running, and is_shutting_down() checks at src/daemon.rs:7956 and src/daemon.rs:7964 make the race window extremely narrow. Still, using request_stop() here (as the control listener does at src/daemon.rs:7607) would be more explicit about the health-check's intent.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
🚩 Bare request_shutdown() from error conditions preserves prior restart action
Internal error conditions in the trace ingest worker (lines 4604, 4626, 4644, 4797) and Windows pipe workers (lines 7254, 7269, 7447, 7460) call request_shutdown() rather than request_stop(). Since request_shutdown() does not modify shutdown_action, if the update check loop had previously set RestartAfterUpdate or Restart, these error-triggered shutdowns will preserve that action. This means the daemon could restart after a buffer overflow or duplicate sequence error. The window for this race is very narrow (update check sets action, then error occurs before shutdown completes), and a fresh daemon process wouldn't inherit the corrupt state. Still, if there's a concern about restart-cycling after persistent errors, the error call sites could be changed to request_stop() to force a clean stop.
Was this helpful? React with 👍 or 👎 to provide feedback.
19515ab to
d6e61e2
Compare
a4a405f to
31f33dc
Compare
Summary
$PIDloop bug and add a regression testTesting
cargo test --package git-ai --test windows_install_script -- --nocapturecargo fmt -- --checkManual validation
$PIDloop variable