Summary
ydb/_topic_reader/topic_reader_asyncio_test.py::TestReaderReconnector::test_reconnect_on_repeatable_error fails intermittently in CI with a timeout. Recurring flake — has been seen on several unrelated PRs.
Failure
ydb._topic_common.test_helpers.WaitConditionError: Bad condition in test
ydb/_topic_common/test_helpers.py:67: WaitConditionError
The test drives ReaderReconnector through one repeatable error (Overloaded) and then a healthy stream, and waits via:
await wait_for_fast(reconnector.wait_message()) # topic_reader_asyncio_test.py:1561
wait_for_fast → wait_condition budgets 1s wall / 1000 loop iterations (test_helpers.py:46-67). When reconnector.wait_message() does not resolve within that budget the helper raises WaitConditionError. So this is a timing-sensitive timeout, not an assertion about reconnect behavior.
A teardown side-effect also shows up after the failure (likely secondary, from the aborted coroutine):
PytestUnraisableExceptionWarning: Exception ignored in: <coroutine object Queue.get ...>
RuntimeError: Event loop is closed
Environment / occurrence
Hypothesis / directions to investigate
- The reconnect path (error stream → recreate → first message) occasionally needs more than the 1s / 1000-iteration budget under CI load, especially on 3.12's event-loop scheduling. Bumping the budget would only mask it.
- Worth checking whether
ReaderReconnector does extra await asyncio.sleep(0) hops on the reconnect path on 3.12, or whether a backoff/retry delay leaks into the loop and eats the budget.
- Confirm the
Event loop is closed teardown warning is purely secondary (GC of the parked wait_forever/Queue.get coroutine) and not contributing to the hang.
Workaround for now
Re-run the job; failure is intermittent and not a regression from the triggering PR.
Summary
ydb/_topic_reader/topic_reader_asyncio_test.py::TestReaderReconnector::test_reconnect_on_repeatable_errorfails intermittently in CI with a timeout. Recurring flake — has been seen on several unrelated PRs.Failure
The test drives
ReaderReconnectorthrough one repeatable error (Overloaded) and then a healthy stream, and waits via:wait_for_fast→wait_conditionbudgets 1s wall / 1000 loop iterations (test_helpers.py:46-67). Whenreconnector.wait_message()does not resolve within that budget the helper raisesWaitConditionError. So this is a timing-sensitive timeout, not an assertion about reconnect behavior.A teardown side-effect also shows up after the failure (likely secondary, from the aborted coroutine):
Environment / occurrence
ydb/convert.pyresult-set rows).Hypothesis / directions to investigate
ReaderReconnectordoes extraawait asyncio.sleep(0)hops on the reconnect path on 3.12, or whether a backoff/retry delay leaks into the loop and eats the budget.Event loop is closedteardown warning is purely secondary (GC of the parkedwait_forever/Queue.getcoroutine) and not contributing to the hang.Workaround for now
Re-run the job; failure is intermittent and not a regression from the triggering PR.