Skip to content

fix: Python 3.14 compat — use inspect.iscoroutinefunction in ThreadsafeProxy#711

Closed
aautem wants to merge 11 commits intozigpy:devfrom
aautem:fix/none-gateway-startup-reset
Closed

fix: Python 3.14 compat — use inspect.iscoroutinefunction in ThreadsafeProxy#711
aautem wants to merge 11 commits intozigpy:devfrom
aautem:fix/none-gateway-startup-reset

Conversation

@aautem
Copy link
Copy Markdown

@aautem aautem commented Apr 18, 2026

Summary

  • Replace asyncio.iscoroutinefunction() with inspect.iscoroutinefunction() in ThreadsafeProxy — the asyncio version was removed in Python 3.14
  • Raise ConnectionError instead of silently returning None when the proxy's event loop is closed
  • Add a defensive null guard in _startup_reset() for when the gateway is None

Root Cause

ThreadsafeProxy.__getattr__ uses asyncio.iscoroutinefunction() to detect async methods and dispatch them cross-thread via run_coroutine_threadsafe. In Python 3.14, asyncio.iscoroutinefunction() was removed. When the check fails, async methods like Gateway.wait_for_startup_reset() fall through to the sync code path, which returns None. The caller then does await None, producing:

TypeError: 'NoneType' object can't be awaited

A secondary issue: when the proxy's event loop is closed (e.g. connection lost during startup), the proxy also silently returned None instead of raising an error.

Changes

File Change
bellows/thread.py asyncio.iscoroutinefunctioninspect.iscoroutinefunction; raise ConnectionError on closed loop
bellows/ezsp/__init__.py Null guard in _startup_reset()
tests/test_thread.py Update closed-loop test to expect ConnectionError
tests/test_ezsp.py Add tests for null gateway in _startup_reset() and disconnect()

Context

Reproducible on Home Assistant 2026.4.0 with bellows 0.49.0 / Python 3.14. The integration enters a failed state and retries indefinitely, never recovering.

Related: home-assistant/core#168432

Test plan

  • All 426 tests pass
  • test_proxy_loop_closed — verifies ConnectionError is raised on closed loop
  • test_startup_reset_gw_none — verifies EzspError on null gateway
  • test_disconnect_gw_none — verifies disconnect() handles null gateway

🤖 Generated with Claude Code

When bellows connects to an EZSP coordinator over a TCP socket on
Python 3.14, the gateway's wait_for_startup_reset() can fail with
TypeError due to asyncio threading behavior changes. Add a null guard
in _startup_reset() to raise a clear EzspError instead of an opaque
TypeError, allowing the retry logic in startup_reset() to handle the
failure cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 86.36364% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.47%. Comparing base (0d5b1b7) to head (097a9bc).

Files with missing lines Patch % Lines
bellows/ezsp/__init__.py 71.42% 2 Missing ⚠️
bellows/uart.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #711      +/-   ##
==========================================
- Coverage   99.54%   99.47%   -0.08%     
==========================================
  Files          61       61              
  Lines        4147     4157      +10     
==========================================
+ Hits         4128     4135       +7     
- Misses         19       22       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aautem aautem marked this pull request as draft April 18, 2026 18:27
asyncio.iscoroutinefunction() was removed in Python 3.14. The
ThreadsafeProxy was using it to detect async methods and dispatch
them cross-thread via run_coroutine_threadsafe. When the check fails,
the async method falls through to the sync path which returns None,
causing `TypeError: 'NoneType' object can't be awaited`.

Also raise ConnectionError instead of silently returning None when
the secondary event loop is closed, so callers get a meaningful
error instead of a TypeError.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aautem aautem changed the title fix: Guard against None gateway in _startup_reset() fix: Python 3.14 compat — use inspect.iscoroutinefunction in ThreadsafeProxy Apr 18, 2026
aautem and others added 9 commits April 18, 2026 13:51
…ncio APIs

- EZSP.disconnect() now catches ConnectionError from the proxy when the
  secondary thread's event loop is already closed, preventing cascading
  errors during cleanup
- Replace deprecated asyncio.get_event_loop() calls with
  asyncio.get_running_loop() in thread.py and uart.py for Python 3.14
  forward compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The EZSP coordinator sends a reset frame immediately after TCP
connection. Previously, _startup_reset_future was only created when
wait_for_startup_reset() was dispatched from the main thread through
the ThreadsafeProxy. In Python 3.14, different event loop scheduling
means the reset frame arrives before the future exists, causing
reset_received() to call enter_failed_state() and close the
secondary thread's event loop.

Fix by pre-creating _startup_reset_future in _connect() right after
the connection is established, so reset frames arriving during the
proxy dispatch window are captured instead of treated as unexpected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pre-created future was placed after wait_until_connected(), but
in Python 3.14 the reset frame can arrive and be processed in the
same event loop iteration that resolves the connection — before the
_connect coroutine resumes. Move the future creation to before
create_serial_connection() so it exists before any data can flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The EventLoopThread and ThreadsafeProxy exist because pyserial uses
blocking I/O that must run in a separate thread. TCP socket
connections use native asyncio I/O and don't need threading at all.

Running TCP connections on the main event loop eliminates the
cross-thread race conditions that cause startup failures on
Python 3.14: coroutine dispatch races, event loop lifecycle races,
and future cancellation cleanup errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert the use_thread=False change for TCP sockets — the main HA
event loop is too busy for serial protocol timing, causing transport
disconnects.

Instead, properly handle coroutine lifecycle in ThreadsafeProxy: if
run_coroutine_threadsafe fails because the loop closed between the
is_closed() check and the dispatch, close the coroutine to prevent
the "coroutine was never awaited" RuntimeWarning and raise a clean
ConnectionError.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Temporarily elevate connection_lost, reset_received, and
error_received logging to WARNING level to diagnose why the
secondary event loop closes during startup on Python 3.14.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AshProtocol.eof_received() was returning None (falsy), which tells
asyncio to auto-close the transport when the remote end signals EOF.
For serial-over-TCP connections (e.g. ser2net), the remote end may
signal EOF during initialization in Python 3.14 without intending
to fully close the connection. This caused connection_lost(None) to
fire immediately, closing the secondary event loop before startup
could complete.

Return True from eof_received() to keep the transport open and let
bellows manage the connection lifecycle explicitly via disconnect().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Log data_received, eof_received, and send_reset at WARNING level
to diagnose whether data is flowing between bellows and the
coordinator during startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the secondary event loop is dead and disconnect() cannot
dispatch through the ThreadsafeProxy, the TCP socket stays open.
This prevents ser2net from releasing the serial port, causing
subsequent connection attempts to fail with "Device open failure".

Force-close the underlying OS socket directly when the proxy
dispatch fails, so ser2net detects the disconnect and releases
the serial port for the next connection attempt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aautem aautem closed this Apr 19, 2026
silenthooligan added a commit to silenthooligan/bellows that referenced this pull request May 8, 2026
Per puddly's review: returning True from eof_received was carried over
from zigpy#711's commit body, which claimed an init-time spurious EOF on
serial-over-TCP that auto-close turned terminal. The asyncio docs are
explicit that EOF means the remote sent FIN; there is no
"EOF-without-intent-to-close" path. ASH/EZSP needs full duplex anyway,
so half-close keep-alive doesn't help us even when the asyncio pattern
is technically valid for other protocols.

The hunk wasn't reproducible against the actual ZBT-2 over stream_server
(stock 0.49.1 + Python 3.14.2 completes EZSP startup cleanly on first
connect, no spurious eof_received firing), so dropping it loses no
demonstrable functionality. If a concrete EOF-during-init case for
some bridge config surfaces later it can be addressed separately.

Tests: 428 pass on Python 3.14.2, no change in count.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant