Skip to content

Commit 1dc0f35

Browse files
committed
Spawn the job-reap test's grandchild via the base interpreter
The instrumented CI runs showed the grandchild booted (its startup marker reached stderr on some legs) and then died tracelessly before it could connect — no traceback, no SDK escalation, no job-handle close. The remaining kill mechanism in the chain is the venv launcher layer: sys.executable in a uv-managed venv is a trampoline that runs the real interpreter inside its own kill-on-close Job, and that private job machinery proved fatal to grandchildren on the CI runners. The server now spawns its child through sys._base_executable, removing the foreign launcher layer from the grandchild chain. The contract under test is unchanged: the child still inherits the SDK's Job Object at CreateProcess and must die when the job handle closes at shutdown. Also report the child's poll() status after stdin EOF ends the server (child-rc on stderr, zero cost on the healthy path), so any remaining failure message names the child's exit status - the discriminator between "still alive but unreachable" and "killed, by this code".
1 parent 2223efd commit 1dc0f35

1 file changed

Lines changed: 25 additions & 6 deletions

File tree

tests/transports/stdio/test_windows.py

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,13 @@ async def test_a_gracefully_exited_servers_child_is_reaped_when_the_job_handle_c
5858
`TerminateJobObject` — the two kills are indistinguishable on the socket.
5959
6060
The server connects back too (not just the child), the child's stderr is routed
61-
into the server's, and both are captured through `errlog`; the child also prints
62-
a startup marker there. A timeout failure then reports how many connections
63-
arrived (so which process never showed), how long the spawn took, and the
64-
captured stderr verbatim — xdist swallows subprocess stderr on CI, so without
65-
the capture a broken spawn chain is undiagnosable there.
61+
into the server's, and both are captured through `errlog`; the child prints a
62+
startup marker there, and the server reports the child's `poll()` status after
63+
stdin EOF ends it. A timeout failure then reports how many connections arrived
64+
(so which process never showed), how long the spawn took, and the captured
65+
stderr verbatim — including the child's fate — since xdist swallows subprocess
66+
stderr on CI, and without the capture a broken spawn chain is undiagnosable
67+
there.
6668
"""
6769
async with AsyncExitStack() as stack:
6870
sock, port = await open_liveness_listener()
@@ -79,16 +81,28 @@ async def test_a_gracefully_exited_servers_child_is_reaped_when_the_job_handle_c
7981
# synchronously after the spawn returns, while the server's interpreter is
8082
# still cold-starting — long before it can Popen the child (job membership
8183
# is inherited at CreateProcess, never acquired retroactively).
84+
#
85+
# The child is spawned through the base interpreter, not `sys.executable`:
86+
# in launcher-wrapped venvs (uv's `python.exe` is a trampoline that runs
87+
# the real interpreter inside its own Job machinery) the extra launcher
88+
# layer proved fatal to grandchildren on CI runners — they booted and then
89+
# died tracelessly inside the launcher's private job. The contract under
90+
# test is unchanged: the child still inherits the SDK's Job at
91+
# CreateProcess. After stdin EOF ends the server, it reports the child's
92+
# `poll()` status — `None` means the child was alive when the server
93+
# exited; an exit or NTSTATUS code names whatever killed it.
8294
server = (
8395
f"import socket, subprocess, sys\n"
96+
f"exe = getattr(sys, '_base_executable', None) or sys.executable\n"
8497
f"try:\n"
85-
f" subprocess.Popen([sys.executable, '-c', {child!r}], stderr=sys.stderr)\n"
98+
f" p = subprocess.Popen([exe, '-c', {child!r}], stderr=sys.stderr)\n"
8699
f"except BaseException as exc:\n"
87100
f" print(exc, file=sys.stderr, flush=True)\n"
88101
f" raise\n"
89102
f"s = socket.create_connection(('127.0.0.1', {port}))\n"
90103
f"s.sendall(b'alive')\n"
91104
f"sys.stdin.read()\n"
105+
f"print('child-rc:%s' % p.poll(), file=sys.stderr, flush=True)\n"
92106
)
93107
server_params = StdioServerParameters(command=sys.executable, args=["-c", server])
94108

@@ -114,6 +128,11 @@ def server_stderr() -> str:
114128
stack.push_async_callback(stream.aclose)
115129
streams.append(stream)
116130
except TimeoutError:
131+
# By the time this clause runs, `stdio_client.__aexit__` has already
132+
# completed its shielded shutdown on the way out of the `async
133+
# with`: stdin closed, the server printed its `child-rc` line and
134+
# exited. The stderr read below therefore carries the child's fate,
135+
# not a mid-flight snapshot.
117136
missing_leg = "the server never ran its connect line" if not streams else "the child never connected"
118137
spawn_split = (
119138
"the context never entered"

0 commit comments

Comments
 (0)