Skip to content

Also enable AODV route discovery on source-routed unicasts#719

Draft
TheJulianJES wants to merge 1 commit intozigpy:devfrom
TheJulianJES:tjj/source_routing_aps_options
Draft

Also enable AODV route discovery on source-routed unicasts#719
TheJulianJES wants to merge 1 commit intozigpy:devfrom
TheJulianJES:tjj/source_routing_aps_options

Conversation

@TheJulianJES
Copy link
Copy Markdown
Contributor

@TheJulianJES TheJulianJES commented May 4, 2026

DRAFT/TODO: Check this and tidy up AI comment.

I've been somewhat accidentally using this for years by turning on source routing in the zigpy config once, then disabling it. This still caused the coordinator to set source routes, but apparently it never refreshed the network addresses this way.

But only with this was source routing actually worth using in my network: as zigpy thought source routing to be off, it still set APS_OPTION_ENABLE_ROUTE_DISCOVERY as well.

According to some sources referencing "BUGZID 12261", this is supposedly incorrect. This should be looked at with more detail.

AI summary (check if accurate)

When CONF_SOURCE_ROUTING is enabled, bellows previously set only APS_OPTION_ENABLE_ADDRESS_DISCOVERY on outgoing NWK unicasts. The two flags are independent and address different stack subsystems:

  • ENABLE_ROUTE_DISCOVERY (NWK frame flag) — initiates AODV route discovery if and only if the stack has no known route to the destination (per sl_zigbee_types.h: "causes a route discovery to be initiated if no route to the destination is known"). Contrast with FORCE_ROUTE_DISCOVERY which initiates discovery unconditionally.
  • ENABLE_ADDRESS_DISCOVERY (APS frame flag) — sends a ZDO NWK_addr_req to resolve the destination NWK ID from its EUI64 if not already cached. It does not discover routes; it resolves addresses.

When source routing is on but a destination isn't yet in the NCP's source-route table (e.g. immediately after startup, before MTORR has propagated, or after a device rejoin), the previous configuration left the stack with neither a source route nor permission to fall back to AODV. The packet was dropped at the NWK layer instead of recovering via on-demand route discovery.

This change makes source-routed unicasts also set ENABLE_ROUTE_DISCOVERY, mirroring what zigbee-herdsman's ember adapter does (emberAdapter.ts DEFAULT_APS_OPTIONS):

DEFAULT_APS_OPTIONS = RETRY | ENABLE_ROUTE_DISCOVERY | ENABLE_ADDRESS_DISCOVERY;
// "Removing ENABLE_ROUTE_DISCOVERY leads to devices that won't reconnect/go
//  offline, and various other issues."

Why this is correct (source routing is still preferred)

ENABLE_ROUTE_DISCOVERY does not run alongside source routing on every send. The SDK header text is explicit: discovery happens "if no route is known." The stack's outbound path for a concentrator is:

  1. sl_zigbee_af_override_append_source_route_cb runs first. If a source route exists for the destination, it's prepended to the NWK header and each hop along the way uses that explicit route. AODV is not consulted.
  2. If no source route exists, the NWK route table is checked. If a previously discovered AODV route is there, it's used.
  3. Only if both tables come up empty does the ENABLE_ROUTE_DISCOVERY flag actually trigger an AODV route request.

Net effect: source routing remains the primary path; AODV is engaged strictly as a fallback for destinations the source-route table doesn't yet know about.

When `CONF_SOURCE_ROUTING` is enabled, bellows previously set only
`APS_OPTION_ENABLE_ADDRESS_DISCOVERY` on outgoing unicasts. The two
APS option flags are independent in the stack:

- `ENABLE_ADDRESS_DISCOVERY` triggers a network-broadcast address
  resolution if the destination NWK isn't known.
- `ENABLE_ROUTE_DISCOVERY` initiates AODV route discovery if no route
  exists.

If a packet is sent to a destination that isn't yet in the NCP's
source-route table (e.g. immediately after startup, before MTORR
has propagated), `ADDRESS_DISCOVERY` alone is not enough — the
delivery falls back to whatever default the stack chooses. Setting
both flags lets AODV act as a transparent fallback.

zigbee-herdsman's ember adapter applies both flags to all unicasts
for this reason — its `DEFAULT_APS_OPTIONS` in `emberAdapter.ts`
includes them together with the comment "Removing
`ENABLE_ROUTE_DISCOVERY` leads to devices that won't reconnect/go
offline, and various other issues."

The non-source-routing path is untouched: it already includes
`ENABLE_ROUTE_DISCOVERY`, and the explicit `FORCE_ROUTE_DISCOVERY`
caller-override path also stays as-is.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.54%. Comparing base (4b97a6d) to head (4d56aed).

Additional details and impacted files
@@           Coverage Diff           @@
##              dev     #719   +/-   ##
=======================================
  Coverage   99.54%   99.54%           
=======================================
  Files          61       61           
  Lines        4147     4147           
=======================================
  Hits         4128     4128           
  Misses         19       19           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TheJulianJES
Copy link
Copy Markdown
Contributor Author

And/or we should consider not ignoring the TransmitOptions.FORCE_ROUTE_DISCOVERY zigpy option when source routing is enabled, like we do now:

if self.config[zigpy.config.CONF_SOURCE_ROUTING]:
# Source routing uses address discovery to discover routes
aps_frame.options |= t.EmberApsOption.APS_OPTION_ENABLE_ADDRESS_DISCOVERY
elif zigpy.types.TransmitOptions.FORCE_ROUTE_DISCOVERY in packet.tx_options:
# Forcing route discovery requires retrying
aps_frame.options |= t.EmberApsOption.APS_OPTION_FORCE_ROUTE_DISCOVERY
aps_frame.options |= t.EmberApsOption.APS_OPTION_RETRY
else:
aps_frame.options |= t.EmberApsOption.APS_OPTION_ENABLE_ROUTE_DISCOVERY

@TheJulianJES
Copy link
Copy Markdown
Contributor Author

TODO: Check if the stack removes broken/outdated source routes, or just keeps using them until the next MTOR happens.

@MattWestb
Copy link
Copy Markdown
Contributor

Little more experience / problem with routing in ZHA / Bellows.
ZHA was working great then IKEA was using group binding for remotes and motion senors (broadcast commands).
After redoing binding i my production system to unicast / device binding i have 2 problem and i think one is one neighbor using microwave owen and blocking the RF and the next is commands from controllers is not working for short or longtime. Normally after one light have rebooted and getting one new MWK (after the network have refusing using the old one that is 99% one Silabs bug).
Like in my WC with 1 IKEA motion sensor and one dimmer switch sending unicast bond commands to 3 X GU10, Recent spot and one Äskvöder for steering the fan controller.
Symptom is some command not finding its device, Controllers is bonding devices with IEEE and then the controller shall getting the updated / current NWK from its parent and sending unicast to all NWKs but the routers / mesh is not handling it OK so so not all commands is going thru from the motion sensor.
The dimmer switch with the same bindings as the motion sensor is getting different patterns then is getting different reponce from its parent / mesh so you is on the right way !!

Next is 1 tuya 4 way switch that oft is not responding on commands (Zigbee transmission failed) from automatons and is working OK after repower it. Was sniffing it last week and (Z)HA was not sending the on commands that automation was set to do also the sending one command from device card = ZHA is not having one know rout to the device and is not trying sending one command only error in the log.

The system is for very long time running with source_routing: true but the problem have being larger with the time.

By the way: forcing manual source routers is one very danger thing then its disabling the self healing functionality in the mesh network so if one router is getting problem then some device can being not reatshe ball and cant being fixed in one easy way what i knowing so if implanting it pleas put strong warnings in the instructions !!!

Great work done !!!!

@MattWestb
Copy link
Copy Markdown
Contributor

PS: Coordinator hard and software = Connected via [IKEA Billy EZSP 6.10.7.0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants