Skip to content

Feat/multishot alloc cb#7

Merged
EdmondDantes merged 39 commits intomainfrom
feat/multishot-alloc-cb
May 4, 2026
Merged

Feat/multishot alloc cb#7
EdmondDantes merged 39 commits intomainfrom
feat/multishot-alloc-cb

Conversation

@EdmondDantes
Copy link
Copy Markdown
Contributor

No description provided.

Switches reads onto the new zend_async_io_t::alloc_cb hook (added in
php-async). The reactor now invokes http_connection_alloc_cb on every
chunk to pick the next free slice of conn->read_buffer, so the libuv
multishot read stays armed across requests, pipelined tails and
partial headers without any uv_read_stop/start cycle.

What changes:
- Add http_connection_alloc_cb wired into conn->io->alloc_cb at
  connection create (with conn back-pointer in io->user_data).
- request_in_flight bit gates the read callback: while a handler is
  in flight (incl. a coroutine awaiting), the read callback only
  appends to read_buffer and skips the parser. Cleared by handler
  dispose, which then drains any pipelined tail synchronously.
- Drop the per-request event.stop()/dispose(req) on dispatch and the
  matching http_connection_read() re-arm in the dispose tail. They
  existed to avoid UV_EALREADY from a second uv_read_start; with the
  alloc hook the multishot req is reused as designed.
- EOF deferral: when the peer shuts down its write half while a
  request is in flight or a pipelined tail is buffered, flag
  keep_alive=false and let dispose tear the conn down once the chain
  drains — instead of cutting mid-flight responses.
- Reorder dispose: drain pipelined bytes BEFORE the keep_alive-based
  destroy, so an EOF-induced close still answers every request the
  peer sent.

Effect (single thread, GET /pipeline, no NAT):
- epoll_ctl per request: 2 (DEL+ADD) → 0.013 (only at accept/close)
- c=128 throughput: 35k → 40k req/s (+14%)
- c=128 p99 latency: 6.5ms → 3.6ms (-44%)
- Closes most of the per-core gap to Swoole (was ~2x, now ~1.3x).

Tests: 17/17 H1 phpt PASS (parse errors, smuggling, drain, chunked,
SSE, admission). Core/H2/H3/TLS not yet validated against this branch.
multi-worker.php — a self-contained ThreadPool-based example that
mirrors the HttpArena entry: N PHP threads each binding the same
TCP/UDP listeners with SO_REUSEPORT, plus optional TLS / HTTP/3
when certs are present. WORKERS env defaults to
Async\available_parallelism() (no artificial cap).

Diagnostic endpoints used during the alloc_cb investigation:
  /bare           — minimum handler (one setStatusCode + setBody)
  /pipeline       — minimal hot-path benchmark target
  /baseline       — query/body sum
  /json/{n}       — clamped to dataset size
  /json3-static   — pre-encoded JSON, isolates encode cost
  /json3-encode   — runtime json_encode of a fixed array
  /opcache        — runtime opcache + JIT stats
  /pid            — pid + per-worker static counter (REUSEPORT visibility)

The handler also logs the four CPU-source values it can read
(available_parallelism, nproc, /proc/cpuinfo, cgroup cpu.max) on
startup so cgroup v1 vs v2 mismatches surface before tuning.

smoke-test.sh + docker/ — 17-check runner used during the
investigation (curl + wrk + nc + h3client). Defaults assume the
auto-built docker image at examples/docker/Dockerfile.

PERFORMANCE_ANALYSIS.md — the document that drove this branch:
strace deltas (5 syscalls/req → 3, then to 2 with alloc_cb),
per-core baseline vs Swoole, and the libuv level-triggered epoll
constraint that bounds the remaining gap.
Side-by-side perf comparison of TrueAsync's coroutine path against
Swoole 6.2 with enable_coroutine=true. Same one-statement handler,
same runtime topology (1 reactor + 1 worker), no NAT.

Findings:
- TrueAsync per-thread is 0.79x of Swoole's (43k vs 54k req/s).
- Coroutine machinery accounts for 9.49% on TrueAsync vs 3.65% on
  Swoole — 2.6x more CPU for the same conceptual work.
- The scheduler-suspend path (2.63%) is the single largest
  contributor and runs even on synchronous handlers that never
  actually yield.
- fiber_entry boilerplate (2.51%) compares to Swoole's main_func
  at 0.30% — ~8x cheaper there.

Names four optimisation targets (A-D) with rough ROI estimates that
together would close most of the coroutine-overhead delta.
snprintf("%zu", body_len) and snprintf("%d.%d", major, minor) were
3 calls per request that all routed through glibc's vfprintf/
format_converter. perf identified format_converter at 2.23% self
time on the minimal-handler bench.

Replacements:
- Content-Length:  smart_str_append_unsigned (uses zend_print_ulong_to_buf)
- HTTP version:    direct char writes ('0'+digit), HTTP/1.x only ever has
                   single-digit major and minor

format_converter disappears from perf top-N entirely (was 2.23%, now
< 0.3%, attributed only to http_response_format itself).

Throughput on the minimal handler (median of 3 runs):
- c=64:  ~44k -> ~51k req/s (+15%)
- c=128: ~44k -> ~59k req/s (+34%, high variance)

H1 phpt suite: 17/17 PASS, no regressions.
A single-file server with a one-statement handler:

    $server->addHttpHandler(fn ($req, $res) => $res->setBody('ok'));

Used to compare TAS against Swoole-level minimal handlers without
the URL-parse + switch-tree overhead carried by multi-worker.php.
WORKERS=1 + this handler is the apples-to-apples shape against
Swoole's $http->on('request', fn ($req, $res) => $res->end('ok')).
Adds http_connection_send_str_owned that hands a zend_string body to
the reactor via ZEND_ASYNC_IO_WRITE_EX with a release callback. The
HTTP/1 dispose hot path uses it for plaintext sends: no suspend, no
per-request write deadline timer, no while-loop around partial-write
(libuv's uv_write is all-or-error from the caller's view, the loop
was dead code). Response zend_string is released by io_pipe_write_cb
when the kernel-level write completes.

TLS path keeps the existing copying-into-encrypt-ring semantics —
ownership-transfer applies only to plaintext.

Per-thread bench (wrk -t4 -c64 -d30s, 3 runs median):
  c=16:  TAS 62.2k vs Swoole 58.7k (+6%)
  c=64:  TAS 65.6k vs Swoole 63.5k (+3%)
  c=128: TAS 64.7k vs Swoole 64.4k (tied)
TAS p99 is consistently better. Baseline TAS @ c=64 was 44.2k → +48%.

PLAN.md tracks the broader optimization roadmap (TLS migration,
per-conn write watchdog, vector-write, coroutine pool, h2/h3 hot
paths, zero-copy for large bodies).
Decouple http_server_object lifetime from the PHP object lifetime by
introducing a thin wrapper struct and refcounting the C-state.

  struct http_server_php {
      http_server_object *server;   // pemalloc'd, refcounted
      zend_object         std;
  };

The PHP wrapper holds 1 ref; each live conn that stores a back-pointer
in conn->server holds 1 ref. http_server_free no longer needs to walk
the conn list and NULL back-pointers — late libuv shutdown drains that
fire post-wrapper-free now see a still-valid C-state because at least
one conn still holds a ref. The last release frees the C-state struct
itself (pefree).

http_server_addref/release exposed in php_http_server.h so the
connection layer can ref it. Per-worker, no atomics. Transfer_obj
adapted: TRANSFER pemallocs the wrapper via default_fn and pemallocs
the core directly; LOAD pathway runs through create_object which
emallocs both.

Unblocks step 3.1: a slab arena for http_connection_t can now share
lifetime with the C-state instead of the PHP wrapper, so conn slot
memory stays valid until the last conn returns its slot — a
prerequisite the previous wrapper-embedded layout did not satisfy.

Tested: full phpt suite — 0 new regressions. Pre-existing failures
on TLS / h2-tls / h3 / h2spec-gate reproduce on baseline; smoke
benchmark unchanged at 132k rps c=4.

PLAN.md updated with the expanded step 3 design.
Replaces per-conn ecalloc/efree (and the O(N) single-linked conn_list
walk in http_server_register/unregister_connection) with a chunked
slab allocator. Each chunk holds CONN_ARENA_CHUNK_SLOTS=256
http_connection_t slots; pemalloc'd once and chained.

Same struct fields play a dual role to keep the design intrusive:
  - while a slot is FREE, only `next_conn` is meaningful — single-
    linked freelist link, rest of the slot is undefined.
  - while a slot is ALIVE, `next_conn` and `prev_conn` are doubly-
    linked alive-list links, all other fields are valid.

Lifetime is tied to the refcounted C-state introduced in dc548c0:
each live conn holds one ref on http_server_object, so the arena's
chunks stay valid past PHP-wrapper-free until the last conn returns
its slot. The finalizer (http_server_state_finalize) calls
conn_arena_cleanup on the last release.

The alive list is now an O(1) doubly-linked list available for the
upcoming periodic deadline_tick walk (step 3.3) and for future
WebSocket broadcast / admission sweep / graceful shutdown paths.

Public API delta:
  - http_server_arena(server) → conn_arena_t*
  - http_server_bind_connection — new (replaces register_connection
    for hot-path slice binding; arena_alloc handles list insertion)
  - http_server_register/unregister_connection — REMOVED

Tested: full phpt suite — same failure list as baseline (13, all
pre-existing TLS/h2-tls/h3 plus h1/005-mid-stream-body). Smoke
bench 141k @ c=4 (vs 132k step1). c=64 median 67.1k rps p99=1.4ms
across 3 runs.
One periodic libuv timer per worker thread walks the conn arena's
alive list every tick_ms milliseconds and force-closes any conn
whose deadline_ns has elapsed. Replaces the per-conn write_timer
arm/stop dance and the per-await read-timeout creation with a
single shared timer.

  tick_ms = max(250, min(read, write, keepalive) / 2)

Worst-case lateness on idle-conn reaping is half the smallest
configured timeout — fine for HTTP-scale defaults (15 s read /
write, 60 s keepalive → tick=7.5 s, lateness ≤ 7.5 s).

Deadline updates on the conn are pure stores at state transitions:
http_connection_spawn sets the read deadline at conn create, and
the http1 dispose tail sets either the keepalive deadline (next
request expected) or 0 (going to close). No libuv heap operations
on the hot path.

Lifecycle:
  - http_server_deadline_tick_start: armed at server start().
  - http_server_deadline_tick_stop: called from $server->stop()
    BEFORE wait_event resolution — without it the periodic timer
    keeps the libuv loop alive past stop and the script never
    exits (caught by phpt 004-server-start-stop). Also called
    from free_obj's running-cleanup branch as a safety net.

Tested: full phpt suite — same failure list as baseline (13, all
pre-existing TLS / h2-tls / h3 / h1-mid-stream-body). Bench c=64
60s: 66.3k vs step3.1 66.95k = noise; expected, hot path was
already free of write_timer ops after step 1's fire-and-forget.
The watchdog's value is architectural — single timer per worker
instead of per-conn / per-await, plus enforces keepalive timeout
which the previous code stored but never enforced.
Replaces the per-request zend_hrtime() calls in http_connection_spawn
and the dispose-end keepalive deadline with ZEND_ASYNC_NOW(), which
reads the reactor's cached loop time in milliseconds. No syscall, no
vDSO — a single load.

http_connection_t::deadline_ns → deadline_ms. The watchdog
deadline_tick callback compares against ZEND_ASYNC_NOW() too.

For our HTTP timeouts (15 s read/write, 60 s keepalive) the ms-grain
loss-of-precision is irrelevant; the watchdog tick runs at a coarser
cadence anyway (min/2 with a 250 ms floor). Telemetry samples (CoDel
sojourn / service) keep zend_hrtime() since they need sub-ms precision.
Both http_server_on_request_sample (CoDel window-now) and
http_server_drain_evaluate / should_drain_now used to take their own
zend_hrtime() inside. The handler-end stamp (req->end_ns) is already
available at every callsite — pass it through and skip the redundant
syscall.

API delta (all internal):
  - on_request_sample gains a now_ns parameter.
  - should_drain_now gains a now_ns parameter.
  - drain_evaluate gains a now_ns parameter.

H1 / H2 / H3 dispatch all updated to thread req->end_ns (or fresh
zend_hrtime where no stamped req is available).

Perf delta on the c=64 plaintext bench:
  __vdso_clock_gettime: 1.25% → 0.74%  (−0.51%)

That is about 5 ms/sec of CPU saved at 60k rps, on the bench where
2 of the 5 zend_hrtime callsites collapsed into reuses of req->end_ns.
The remaining 3 (req->enqueue_ns, req->start_ns, req->end_ns) stay
unconditional for now — they're the actual measurements; gating them
behind a "telemetry off" config knob is the next step but not done
in this commit.

Tested: phpt parity with baseline (88 pass, 14 pre-existing fails,
0 new regressions).
When CoDel is off (backpressure_target_ms == 0) and telemetry is
disabled, the three per-request stamps (enqueue_ns / start_ns /
end_ns) have no consumer — drain has its own fresh-hrtime fallback.
Compute sample_stamps_enabled once at server start as
(codel_target_ns != 0) || telemetry_enabled, expose it via
http_server_sample_stamps_enabled(), and gate every stamp + the
on_request_sample call on the H1/H2/H3 hot paths.

total_requests counting is split out into http_server_count_request
so request counts keep working when stamps are gated off. Saves three
zend_hrtime() calls per request on the minimal-config hot path
(PLAN.md step 4.1).
Matrix of CoDel × telemetry shows on_request_sample disappears from
the perf profile when both consumers are off, and __vdso_clock_gettime
drops 0.81% → 0.65% (−20%). rps run-to-run noise dominates the gap;
perf is the cleaner signal that the optimization landed.
Hot-path HTTP/1 dispose now branches on body size: small bodies keep
the legacy http_response_format → send_str_owned path (no regression
on hello-world / JSON-stub microbench), large bodies use the new
http_response_format_parts → send_strv_owned path that submits headers
and body as separate iovec entries via ZEND_ASYNC_IO_WRITEV. Saves
the emalloc + memcpy that the legacy concat formatter spends merging
the two strings.

Threshold is HTTP_WRITEV_THRESHOLD = 1024 bytes, picked from the A/B
matrix in docs/PERF_2026_05_02_STEP_10.md: writev is wash on rps
between 4 and 16 KiB with a small p99 improvement, and clearly wins
at 64 KiB+ (+18% rps, −5% p99) and 256 KiB (+24% rps, −10% p99).
1 KiB cleanly excludes hello-world responses from writev where the
submit overhead would dominate the savings.

TLS path keeps the single-buffer formatter — the encryption ring
needs contiguous bytes, vectored output would just force an extra
copy. http_response_format stays for legacy callers (HEAD, error
responses, etc.) and for the TLS branch.
call_user_function passes a NULL fci_cache, forcing zend_call_function
to run zend_is_callable_ex + zend_get_executed_filename_ex on every
request. The handler's fci_cache is already populated when
addHttpHandler() ran Z_PARAM_FUNC, so we just build a per-call fci on
the stack and pass &handler->fci_cache directly.

Drops zend_get_executed_filename_ex from 0.28% → 0.12% in perf and
removes zend_is_callable_ex from the hot-path top symbols entirely.
~0.16% CPU saved, no rps regression on the minimal bench.

Applied to all three dispatch paths: HTTP/1, HTTP/2, HTTP/3.
TLS-vs-Swoole bench attempt on 2026-05-02 surfaced that the server
closes every TLS connection immediately after TCP accept (TLSv1.3
alert "decode error", unexpected eof). Same failure on baseline
tas-final without any of my changes — pre-existing, matches the 10
TLS phpt + 2 H2-over-TLS already-red phpt set.

Marked Шаг 4.4 (TLS-write refactor) and Шаг 9 (kTLS / SSL_sendfile)
as blocked by Шаг 0 until handshake is fixed. Шаг 4.4 also gets a
note that we deliberately skipped it on the perf side — pure
cleanup, ~0% rps impact.
http_connection_alloc_cb was registered unconditionally; libuv_io_alloc_cb
honors it before req->base.buf, so ciphertext landed in the plaintext
read_buffer while tls_commit_cipher_in advanced the BIO write head as if
the bytes were in the ring. SSL_do_handshake then read zero-init garbage
and alerted decode_error. Clear conn->io->alloc_cb after the TLS-only
CLR_MULTISHOT so the per-req buffer (BIO ring slot) is actually used.

14/15 TLS phpt + both H2-over-TLS now pass (the remaining SKIP needs
ext/openssl in the CLI build for ssl:// transport).
Migrate http_server_sample_stamps_enabled and http_server_count_request
to static inline helpers in php_http_server.h, taking the cached slice
pointer (conn->view / conn->counters) instead of the opaque server
object. Move the underlying fields out of http_server_object into the
already-public view_t and counters_t structs.

perf -F999 / wrk -t4 -c64 -d20s on minimal-server:
  http_server_count_request:           0.48% -> inlined
  http_server_sample_stamps_enabled:   0.34% -> inlined
  http_server_on_request_sample:       0.37% -> 0.20%
  http_handler_coroutine_dispose:      1.22% -> 1.25%
  Net: ~1.0% CPU saved on the hot path (gate is now const-foldable
  per call site).

RPS medians stay within WSL2 noise floor (62k vs 64k bimodal); the
win is in the CPU budget, not RPS at this load.
Add http_now_coarse_ns() inline helper (clock_gettime(CLOCK_MONOTONIC_COARSE)
on Linux, zend_hrtime() fallback elsewhere). Coarse clock costs ~5ns vs
~25ns for the high-res path — fine for any decision measured in seconds,
unsafe for CoDel sample stamps which still call zend_hrtime().

Three drain sites switched: H1 dispose fallback (when stamps gated off
in minimal config), H2 commit drain decision, http_server_trigger_drain
cooldown check. Drop a leftover unused http_server_object pointer in
h3_handler_coroutine_dispose left over from the previous step.

perf -F999 / wrk -t4 -c64 -d18s:
  __vdso_clock_gettime:           2.29% -> 2.21%
  http_server_should_drain_now:   0.49% -> 0.43%
  http_handler_coroutine_dispose: 1.25% -> 1.09%
  Net: ~0.30% CPU on H1 hot path.

RPS medians stay within WSL2 noise.
Replace the per-write staging-buffer copy in tls_fsm_send_kick with a
zero-copy submission via ZEND_ASYNC_IO_WRITE_EX: peek one contiguous
span out of the BIO output ring, hand the slot pointer directly to
libuv, and consume the ring in the fire-and-forget free_cb on
completion. Wrap spans ship on the next kick, fired automatically
from the free_cb's tls_advance_state call.

Drop the NOTIFY-based write completion branch in tls_fsm_io_callback_fn
along with the cb->write_req / write_buf / write_buf_len fields it
tracked — fire-and-forget WRITE_EX never routes back through that
callback. Add conn->tls_zc_write_n: the in-flight write size, consumed
by the free_cb and used as the double-submission gate (BIO would
re-peek bytes the in-flight write still owns).

perf -F999 / h2load --h1 -c64 -n1M on a TLS hello-world handler:
  _emalloc:                       2.63% -> 2.19% (-0.44%)
  _emalloc_16:                    0.19% -> 0.06% (-0.13%)
  tls_fsm_send_kick:              0.40% -> 0.11% (-0.29%)
  Net: ~-0.85% CPU on TLS hot path.

TLS RPS (6 pairs, alternating):
  median:  15095 -> 15242 rps (+1.0%)
  mean:    14983 -> 15307 rps (+2.2%)

This closes the architectural gap with Swoole (whose socket-BIO design
skips the same memcpy). Full step 9 (socket-BIO + kTLS) deferred —
Swoole itself does not enable kTLS, the hello-world delta is small,
and the rewrite would re-litigate the just-fixed handshake regression.
Add http_connection_send_batched: amortises N back-to-back writes to
one socket into a single in-flight uv_write + a growing pending
buffer. Subsequent sends append to the buffer instead of submitting
another uv_write; the completion callback drains the pending buffer in
one more write — chain continues until pending is empty.

Pairs with the new CLOCK_MONOTONIC_COARSE reactor throttle in
ext/async scheduler (commit 870e264 in true-async/php-async). Without
the throttle the libuv loop ticks between every two completed
coroutines and clears the in_flight flag before the next coroutine
appends — batching never kicks in. With the throttle, in_flight stays
true across multiple stream commits in the same scheduler tick window;
~96% of commits hit the append path in measurement.

Effect (HTTP/2 hello-world h2load -c32 -m32, single worker):
  before: 117k req/s (await) / 50k req/s (naive fire-and-forget)
  after:  ~400k req/s (3.4x over await, 8x over naive FF)

H/2 commit drain (commit_stream_response) and h2_drain_to_socket on
the plaintext path now drain nghttp2 frames into a heap buffer in one
pass and submit via send_batched. TLS path keeps the await-based send
(its own BIO-pair zero-copy submit handles batching).

Connection destroy is deferred while a batched write is in flight
(parallel to the existing TLS deferral via tls_zc_write_n) — libuv
holds a pointer into the heap buf until completion.

Tested: H/1 phpt, H/2 phpt, TLS phpt — only pre-existing failures
(h1/005 chunked-413, h2/009 h2spec gate). H/1 also benefits from the
scheduler throttle: 64k -> 117k req/s.
H3 streams now come from a per-listener slab pool (64 slots/chunk,
freelist via list_next, alive list via streams_head). H2 streams
keep ecalloc/efree semantics but participate in the same embedding.

Both http2_stream_t and http3_stream_t place http_request_t as their
first field (`_request_storage`); the existing `request` pointer
becomes an alias to that storage, so every existing `s->request->X`
call site keeps working without changes. Code that takes an
`http_request_t *` (PHP HttpRequest wrapper, log path, multipart
processor) gets `&s->_request_storage` — same byte address as the
enclosing stream slot, so a callback can recover the stream via a
plain `(http_stream_t *)req` cast.

http_request_t now carries a `release` callback. When non-NULL,
http_request_destroy invokes it instead of efree at refcount=0,
letting the embedder return the slot to its pool (H3) or efree the
whole slot (H2). Two-refcount lifetime preserved: stream-side
holders track via s->refcount, request-side holders (PHP wrapper)
via the embedded request->refcount; the slot is reclaimed only when
both reach zero — the release callback fires from whichever path
hits the final decrement.

Net allocation reduction per H3 request:
  - 1 ecalloc + 1 efree for http3_stream_t  → pool_alloc + pool_free
  - 1 ecalloc + 1 efree for http_request_t  → eliminated (embedded)

H2 keeps ecalloc/efree for the slot but eliminates the separate
http_request_t alloc; that's −1 ecalloc/efree per request.

Bench (h2load c=1 m=100 n=50000, h3 over QUIC): 29.3k → 29.7k rps,
within noise — ZendMM bin allocator absorbs the 600-byte allocs at
~5ns/pair, so the architectural gain doesn't move steady-state perf.
The user-visible bottleneck remains user-space ngtcp2 transport
(~10% CPU, conn_write_pkt + recv_ack + write_vmsg + map_find),
which is fundamental to QUIC and outside this change's scope.

Tests: 89/90 PASS across H1+H2+H3+core phpt suites (the lone fail is
the pre-existing h1/005-parse-error-mid-stream-body, recorded in
PLAN.md). 25/25 H3 phpt PASS, 20/20 H2 phpt PASS.

Files:
  include/http1/http_parser.h     — release callback + struct release
  src/http1/http_parser.c         — destroy invokes release vs efree
  include/http2/http2_stream.h    — _request_storage first field
  src/http2/http2_stream.c        — alias setup, release callback
  include/http3/http3_stream.h    — _request_storage first field, pool ptr
  src/http3/http3_stream.c        — alias setup, two-phase teardown
  include/http3/http3_stream_pool.h, src/http3/http3_stream_pool.c
                                  — slab allocator (chunk=64 slots)
  src/http3/http3_listener.{c,h}  — embedded pool, accessor
  src/http3/http3_callbacks.c     — pass conn to stream_new
  src/http3/http3_dispatch.c      — comment refresh
  config.m4                       — http3_stream_pool.c in build list
Mid-stream body limits (e.g. chunked POST > max_body_size) cancel the
in-flight handler from the read tick that hit the parser error. The
multishot reader stays armed though — the kernel's already-buffered
tail of the same connection keeps arriving after the cancel, each
delivery re-feeds the (sticky-error) parser. With current_request
nulled by dispose, those follow-up ticks fell through to
emit_parse_error, double-bumping parse_errors_*_total for one logical
event. http1/005-parse-error-mid-stream-body.phpt was failing on
parse_errors_413_total=2 vs the expected 1.

Latch a per-conn parse_error_handled bit on the first tick that takes
either branch (cancel_handler_for_parse_error / emit_parse_error) and
short-circuit subsequent ticks before either runs. Both plaintext
(http_connection.c) and TLS (http_connection_tls.c) feed-error paths
gated.
The unit-test scaffold drifted from src/: the parser/multipart/h2
TUs grew references to symbols (http_log_emitf, http_known_header_lookup,
http_request_parse_trace_context, http_server_globals_id) and a
re-typed http_server_on_request_sample whose old test stubs broke the
build. http2_strategy also picked up cross-TU calls
(http_connection_send_batched, http_connection_destroy,
http_connection_on_request_ready, http_server_view, alt-svc helpers).

- multipart_stubs.c: add no-op stubs for http_log_emitf,
  http_known_header_lookup, http_request_parse_trace_context; drop the
  obsolete http_server_on_request_shed stub (now static-inline in the
  header).
- tests/common/php_sapi_test.c: declare the http_server module
  globals and register the TSRM slot in php_test_runtime_init so
  HTTP_SERVER_G(parser_pool).max_body_size resolves under ZTS;
  realign the weak stubs with real signatures and drop the ones that
  migrated to inline.
- tests/unit/CMakeLists.txt: pull src/ into test_multipart_processor's
  include path (for log/http_log.h) and link multipart_stubs.c.
- tests/unit/http2/test_http2_strategy.c: fix the
  http_server_on_request_sample stub (now takes now_ns) and add the
  new cross-TU stubs.
- tests/e2e/HttpTestCase.php + tests/.gitignore: add the shared e2e
  helper that was missing from the import (e2e/**/*.php was
  swallowing it as a phpt artifact); whitelist via !-rule.
- tests/phpt/server/h2/009-h2-h2spec-gate.phpt: replace the hardcoded
  /home/edmond/php-http-server fallback (a stale sibling that linked
  an ABI-incompatible .so and segfaulted the spawned bench server)
  with TEST_PHP_SRCDIR + getcwd().

unit  8/8 pass; phpt+e2e 135 pass / 1 skip / 0 fail.
Drop the ecalloc fallback branches in http_connection_create and
http3_stream_new — no caller ever passes NULL server/conn; tests stub
the functions entirely. Matching NULL guards in destroy paths removed.

Remove ~30 comments that describe obvious WHAT (Pop, Push, Unlink,
forward-decl notices, changelog notes). Keep comments that explain
non-obvious WHY: offset-0 casting invariants, deferred-destroy
ordering, multishot re-entrancy guards, writev threshold rationale,
zero-copy BIO lifetime constraints.
EdmondDantes and others added 7 commits May 3, 2026 15:32
Moved src/core/tls_layer.c from the unconditional source list to the
_http_server_openssl_ok conditional block alongside http_connection_tls.c.
Removed the #ifdef HAVE_OPENSSL / #endif wrapper from the file body —
the build system is now the single gate; the wrapper was redundant and
produced an empty TU on non-TLS builds.
Parsing query parameters previously required manual strpos+parse_str in
every handler. The new methods cache the split on first access — NULL
sentinel on path/query_params fields signals "not yet parsed". Parser is
php_default_treat_data(PARSE_STRING, ...), the same path PHP uses for
$_GET, so percent-decode, '+'-as-space, and foo[bar][] array notation
all work out of the box.

Works identically for HTTP/1, HTTP/2 (:path pseudo-header), and HTTP/3
— all three protocols write the full path+query into req->uri.
multishot armed for the connection lifetime keeps async_io_t + the
outstanding read req alive (and through them a ref on the C-state)
until the conn is destroyed. http_server_free stopped listeners but
left live conns alone, so anything still on the arena at script exit
leaked the whole conn / io / parser / strategy / h2-session chain.

Sweep alive_head with http_connection_destroy() before releasing
scope_object: idle conns free immediately, conns with a handler in
flight set destroy_pending and finalise when the scope release
cancels their coroutine.

Fixes the debug-ZTS leak reports on
core/009-http-exception-from-handler and h2/017-h2-streaming-cancel.
Triple-dot (A...B) needs merge-base(A,B); shallow checkout (depth=200)
often doesn't reach it, so git fails with "Invalid symmetric difference
expression" and the coverage-compare step never runs. Two-dot diff
between the two tip commits gives the same result for a PR (BASE is an
ancestor of HEAD) and doesn't need merge-base.
actions/checkout for pull_request checks out the merge ref, not the
PR head — so the head SHA isn't guaranteed to live in the local
shallow clone. Previous fix only fetched BASE; now HEAD failed with
"bad object". Fetch both before diffing.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Coverage

Total lines: 80.37% → 80.46% (+0.09 pp)

File Baseline Current Δ Touched
src/core/conn_arena.c 0.00% 100.00% +100.00 pp
src/core/http_connection.c 78.33% 77.59% -0.74 pp
src/core/http_connection_tls.c 70.26% 71.63% +1.38 pp
src/core/tls_layer.c 76.84% 76.84% +0.00 pp
src/http1/http_parser.c 81.45% 81.86% +0.41 pp
src/http2/http2_strategy.c 79.07% 78.64% -0.43 pp
src/http2/http2_stream.c 94.29% 94.59% +0.31 pp
src/http3/http3_callbacks.c 79.37% 79.37% +0.00 pp
src/http3/http3_dispatch.c 87.61% 87.93% +0.32 pp
src/http3/http3_listener.c 68.48% 68.76% +0.28 pp
src/http3/http3_stream.c 95.00% 95.56% +0.56 pp
src/http3/http3_stream_pool.c 0.00% 100.00% +100.00 pp
src/http_request.c 94.23% 94.14% -0.09 pp
src/http_response.c 84.41% 82.82% -1.59 pp
src/http_server_class.c 84.03% 82.61% -1.41 pp

❌ Regression in touched files (> 1.0 pp drop)

  • src/http_response.c dropped -1.59 pp
  • src/http_server_class.c dropped -1.41 pp

Add [coverage-drop-ok] to a commit message in this PR to override.

@EdmondDantes EdmondDantes merged commit d03b898 into main May 4, 2026
2 checks passed
@EdmondDantes EdmondDantes deleted the feat/multishot-alloc-cb branch May 4, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant