Skip to content

fix(graph): treat cancel_node as control flow, not fatal error#2258

Open
Zelys-DFKH wants to merge 1 commit intostrands-agents:mainfrom
Zelys-DFKH:fix/graph-cancel-node-graceful-skip
Open

fix(graph): treat cancel_node as control flow, not fatal error#2258
Zelys-DFKH wants to merge 1 commit intostrands-agents:mainfrom
Zelys-DFKH:fix/graph-cancel-node-graceful-skip

Conversation

@Zelys-DFKH
Copy link
Copy Markdown
Contributor

Description

When cancel_node is set in a BeforeNodeCallEvent hook, the intent is to skip a node and keep the graph running. What actually happens: _execute_node raises RuntimeError after emitting MultiAgentNodeCancelEvent, which propagates through _execute_nodes_parallel and kills the graph. Downstream nodes never run, the overall status becomes FAILED, and on interrupt-and-resume workflows the session is left with no recovery path — the next call starts a fresh graph instead of continuing.

The fix marks the skipped node as Status.COMPLETED, records it in state.completed_nodes, state.results, and state.execution_order, yields MultiAgentNodeStopEvent, and returns. _find_newly_ready_nodes picks up downstream nodes from completed_batch as usual. get_agent_results() returns [] for the RuntimeError result, so downstream nodes see no output from the skipped node. No new status enum needed. Also corrected the cancel_node docstring, which still said "status FAILED".

Related Issues

Resolves #2240

Documentation PR

N/A

Type of Change

Bug fix

Testing

Ran hatch run prepare. Updated test_graph_cancel_node (parametrized ×2) to check for normal completion instead of raising. Added test_graph_cancel_node_downstream_executes: cancels step_a via hook and verifies step_b still runs, with both nodes in completed_nodes and results. No public API changes, so no impact on consuming repos.

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] cancel_node in BeforeNodeCallEvent raises RuntimeError that kills the entire graph on resume

1 participant