Skip to content

103 await any or fail deadlocks when only later futures complete#104

Merged
EdmondDantes merged 3 commits into
mainfrom
103-await_any_or_fail-deadlocks-when-only-later-futures-complete
May 9, 2026
Merged

103 await any or fail deadlocks when only later futures complete#104
EdmondDantes merged 3 commits into
mainfrom
103-await_any_or_fail-deadlocks-when-only-later-futures-complete

Conversation

@EdmondDantes
Copy link
Copy Markdown
Contributor

No description provided.

Bug: await_any_or_fail([$f1, $f2]) deadlocked when only $f2 was
completed; it worked fine when $f1 was completed.

Root cause is in async_API.c::async_await_futures():

  * The iteration loop registers a callback per trigger:
      - if the trigger is closed already, REPLAY fires the callback
        synchronously (callback updates resolved_count, results, and
        calls ZEND_ASYNC_RESUME on the awaiting coroutine);
      - otherwise zend_async_resume_when() registers a real listener
        on the trigger.

  * After the loop, the function checks
      coroutine->waker->events.nNumOfElements > 0
    and unconditionally calls ZEND_ASYNC_SUSPEND().

  * If a *later* element of the iterable is already closed, the
    earlier elements have already had pending listeners installed.
    REPLAY for the closed element synchronously satisfies the
    waiting condition (resolved_count >= waiting_count), but the
    function still suspends because waker->events is non-empty —
    and now no one will ever fire those leftover triggers, so the
    awaiter is stuck. With Coroutine triggers the bug rarely
    surfaces because the producer coroutine has not had a chance to
    run before iteration starts; with Future triggers (which can be
    completed synchronously by another coroutine before iteration
    even reaches them) it is easy to hit.

Fix: before suspending, check AWAIT_ITERATOR_IS_FINISHED. If the
condition is already satisfied, skip ZEND_ASYNC_SUSPEND but still
call zend_async_waker_clean() so the leftover callbacks on the
unresolved triggers are removed and refcounts decremented.

Discovered by the chaos test harness (#102, fuzzy_tests/await/
await_any.feature). Adds regression test
tests/await/093-awaitAnyOrFail_with_future_triggers.phpt that
exercises every slot of a 3-Future array.

Verified: full ext/async test suite (927/927) and new regression
test pass.
Architectural complement to the previous async_API.c fix. The waker
itself was vulnerable to the same class of bug whenever a registered
event closed before the awaiter actually entered SUSPEND — most easily
reachable via Future, but conceptually possible for any event type
(timer firing during synchronous setup, I/O completing in the same
scheduler tick, etc.).

start_waker_events() is called by SUSPEND immediately before the real
context switch. Previously it just invoked event->start() on every
trigger, which is a no-op for Coroutine/Future events and only does
something useful for libuv-backed events. Closed events were therefore
ignored, even though their callbacks were already in the waker — the
coroutine would suspend with stale callbacks that never fire.

Now: for every trigger whose event is already closed, replay each
registered callback right here, in scheduler context. RESUME from
inside the callback hits the short path (in_scheduler_context &&
coroutine == current) and sets waker->status = WAKER_RESULT, which
the existing fast-return check directly below the call uses to skip
the actual suspension.

Open events still go through the normal start() path.

Verified:
  ext/async/tests/                 928 / 928  (167 skipped — externals)
  ext/async/fuzzy_tests/            44 / 44   per scheduler
  fuzz matrix (6 schedulers × 44)  264 / 264
@EdmondDantes EdmondDantes self-assigned this May 9, 2026
@EdmondDantes EdmondDantes linked an issue May 9, 2026 that may be closed by this pull request
The cache survived across requests, while enum cases live in the
request-scoped constants table; on the next request the cached
pointers were dangling and the slots were reused (typically by
a Channel object), producing

  Cannot assign Async\Channel to property
  Async\ChannelException::\$reason of type
  Async\ChannelCloseReason

and downstream SEGVs in zend_verify_property_type during
shutdown_destructors.

Resolve every call; lookup is a hash fetch.
@EdmondDantes EdmondDantes merged commit ca67726 into main May 9, 2026
1 check passed
@EdmondDantes EdmondDantes deleted the 103-await_any_or_fail-deadlocks-when-only-later-futures-complete branch May 9, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

await_any_or_fail deadlocks when only later futures complete

1 participant