8 http responserequest compression gzip via zlib ng phase 1#10
Merged
EdmondDantes merged 9 commits intomainfrom May 6, 2026
Merged
Conversation
Adds --enable-http-compression (default yes, fail-soft) with zlib-ng preferred and zlib as fallback in both config.m4 and CMakeLists.txt. Introduces http_encoder_t vtable under include/compression/ and the gzip backend stub under src/compression/ — wiring only, real deflate streaming lands in commit 4. The registry, codec-token lookup and build-engine identifier are already callable so subsequent commits can build against stable symbols.
Five PHP-visible setters with matching getters: setCompressionEnabled, setCompressionLevel (1..9, default 6), setCompressionMinSize (default 1 KiB), setCompressionMimeTypes (replaces wholesale, nginx semantics) and setRequestMaxDecompressedSize (anti-zip-bomb cap, default 10 MiB). Default MIME whitelist lives in src/compression/http_compression_defaults.c with the matching #defines in include/compression/http_compression_defaults.h so policy edits land as a focused diff. The whitelist is materialised into the config HashTable at init, so getCompressionMimeTypes() always returns the live policy. setCompressionMimeTypes() normalises entries (lowercase, stripped of `;` parameters, trimmed) once at setter time. Frozen-snapshot machinery is extended: scalars copied directly, the whitelist is deep-copied to a persistent zend_string array so cross- thread LOAD reconstructs without touching source-side HashTables. phpt: tests/phpt/server/compression/001-config-setters.phpt covers defaults, set/get round-trip, MIME normalisation + dedup, validation of out-of-range/empty inputs, and locked-config guard via HttpServer construction.
…liser (#8) Pure-C state machine for RFC 9110 §12.5.3 negotiation: q-values, identity;q=0, *;q=0 with the special "*;q=0 excludes identity unless identity has its own entry" rule. Tolerant of LWS, accept-ext params after q=, malformed q (treated as q=1), and unknown codings (ignored so phase-2 backends drop in without changes here). The select() function returns HTTP_CODEC__COUNT when nothing is acceptable — caller emits 406. MIME normaliser strips params, trims, lowercases at setter/match time so the per-request match is a single allocation-free zend_hash_str_exists call in the response pipeline (commit 5). 20-case cmocka suite covers default/empty/explicit/wildcard combinations, case-insensitive matching, malformed q tolerance, and MIME edge cases (buffer-too-small, params-only). Built standalone — no Zend dep — so the test compiles without the PHP runtime.
Replaces the commit-1 stub with a working deflate pipeline. windowBits
= MAX_WBITS+16 selects the gzip wrapper (RFC 1952) — 10-byte header,
CRC32+ISIZE trailer — instead of zlib's adler32 wrap. memLevel=8 and
the caller-clamped 1..9 level keep the encoder allocation-friendly.
Streaming contract:
write() returns OK on progress, NEED_OUTPUT when the output buffer
filled before all input was consumed (caller drains and
re-calls with the same input pointer advanced by *in_consumed).
finish() returns NEED_OUTPUT until the trailer fits, then DONE.
Z_BUF_ERROR after Z_FINISH is folded into NEED_OUTPUT —
zlib's normal "give me more output" signal.
zlib-ng vs zlib: the TU compiles against either, selected by
HAVE_ZLIB_NG at build time (#define maps zng_* → z* equivalents).
Round-trip cmocka suite (5 cases) feeds the production encoder and
inflates the result with the same library: short text, empty body
(footer-only path), 256 KiB mixed-entropy body crossing chunk
boundaries, 16-byte output buffer that forces NEED_OUTPUT looping
through every call, and out-of-range level clamping (0/-1/10 must
not crash). Every output stream is also checked for the gzip magic
1f 8b prefix.
…out (#8) Wires the gzip encoder into HttpResponse for HTTP/1. H2 and H3 land in a follow-up commit using the same hook points. State module (src/compression/http_compression_response.c) carries: attach() — stashes request + server cfg on the response, called by H1 dispatch right after install_stream_ops apply_buffered()— rewrites smart_str body in place via the encoder, invoked from http_response_format / format_parts so every buffered emit path benefits with no per-emitter changes install_stream_wrapper() — on first send(), swaps stream_ops with a compressing wrapper; wrapper feeds chunks through the encoder, drains finish() in mark_ended, and delegates get_wait_event so backpressure stays correct decide() — single source of truth combining Accept-Encoding (parsed via the negotiate TU), HEAD / Range / status / handler-set Content-Encoding, MIME whitelist, body-size threshold, and per-response opt-out HttpResponse::setNoCompression() — BREACH-mitigation opt-out usable from PHP; idempotent. Default policy change: http_accept_encoding_init_default now resolves to identity-only when no Accept-Encoding header was sent. RFC 9110 §12.5.3 strictly permits any coding in that case, but real-world clients without AE are usually probes/scripts that may not handle gzip, and BREACH risk argues for opt-in. nginx ships the same default; rationale is in the impl comment. Header mutation on commit: Content-Encoding: gzip, Vary appended with Accept-Encoding (or set fresh), Content-Length stripped (recomputed on buffered, absent on chunked). phpt: 010-h1-buffered-gzip golden path: gzip + Vary + gunzip(1) round-trip, identity when AE absent, identity on gzip;q=0 011-h1-buffered-skips image/png skip, below-threshold skip, setNoCompression, handler-set Content-Encoding, HEAD, Range, 204 012-h1-streaming-gzip chunked + gzip round-trip across four send() chunks; checks Content-Length absent and no leaks via the existing memory tracer Public API: http_server_get_config(http_server_object*) added to include/php_http_server.h so the H1 dispatch can hand the live cfg to the compression attach helper without leaking the object layout.
Mirrors the H1 hooks landed in 4e3c01a: - H2: http_compression_attach() at dispatch in http2_strategy.c right after install_stream_ops; http_compression_apply_buffered() at the top of http2_commit_stream_response, before headers flatten, so Content-Encoding/Vary/Content-Length mutations ride the HEADERS frame. - H3: same pair — attach() in http3_dispatch.c, apply_buffered() in http3_stream_submit_response (callbacks.c) before the QPACK header emit. Streaming path on H3 is handled by the stream wrapper that HttpResponse::send() installs via the protocol-agnostic ops vtable. H2 buffered phpt (020-h2-buffered-gzip): curl --http2-prior-knowledge with Accept-Encoding: gzip → text/html gets Content-Encoding: gzip + Vary; image/png stays identity. Verifies the per-stream MIME match fires even when several streams share one connection's config. H3 e2e phpt deferred to follow-up — the embedded h3client harness doesn't yet support request headers (no -H flag), so we can't drive Accept-Encoding from there. The wiring itself is symmetric with H1 and H2; H1 (010/011/012) and H2 (020) cover the architecture. Regression: 73/73 across server/core, server/h2, server/h3.
#8) Adds gzip request body decoding (Content-Encoding: gzip from clients — common with REST APIs, gRPC-web, webhooks). Decoder runs at the top of every handler-coroutine entry (H1/H2/H3) before the user's PHP function is invoked. Failures emit a canned text/plain error response and skip the handler: unknown coding → 415 Unsupported Content-Encoding bomb-cap exceeded → 413 Payload Too Large malformed inflate → 400 Anti-bomb cap is enforced at the inflate layer: every Z_OK loop checks decoded size against cfg->request_max_decompressed_size before growing the output buffer, so a 1 MiB compressed body cannot expand past the configured ceiling. Default cap is 10 MiB; setter established in commit 2. `x-gzip` (RFC 9110 historical alias) is decoded as gzip; identity is a no-op; absent header is a no-op. Also: hot-path config access. conn->config is now cached at http_server_bind_connection alongside conn->view / conn->counters, following the same null-on-server-free discipline. The H1 and H2 dispatch hooks in commit 5 lose the http_server_get_config() call chain in favour of one direct load. H3 dispatch keeps the function call (its conn type doesn't share the same struct), since it's already off the hottest path. http_response_set_error() — internal helper that sets status + text/plain content-type + body without going through the PHP-facing guards. Used only when dispatch rejects a request before any handler runs. phpt 030-h1-request-gzip-in covers all four paths: gzip round-trip (handler observes decoded length), 200 KiB gzip-bomb → 413, br coding → 415, identity → no-op pass-through. Regression: 108/109 across server/. 1 skip is a pre-existing ssl-transport gap unrelated to this change.
apply_buffered:
- Pre-size out smart_str to body_len + 64 so the common single-pass
case is one allocation; smart_str_alloc only kicks in on the rare
incompressible branch where deflate slightly inflates the input.
- Drop the per-iteration smart_str_alloc(4096) and the dead
"if (cap < 256)" follow-up that did nothing useful (just allocated
again immediately after a 4 KiB grow). Cap-aware grow with explicit
< 64 / < 32 thresholds now drives realloc only when needed.
- UNEXPECTED on grow + ENC_ERROR; EXPECTED on the DONE branch of
finish().
Streaming wrapper:
- Replace per-iteration char[8192] stack buffer + multi
zend_string_init / per-pass underlying append_chunk with a single
smart_str accumulator. Each user-facing send() call now produces
ONE downstream chunk, not N — relevant for chunked H1 (one size
line + CRLF on the wire instead of N) and H2 (one DATA frame
instead of N). Encoder still streams: smart_str grows on demand.
- Same accumulator pattern in mark_ended for the gzip trailer.
Header mutation:
- put_header_string / delete_header switched to zend_hash_str_update
/ zend_hash_str_del, which take a raw (name, len) pair and skip
the per-call zend_string_init for the key. Saves one alloc per
Content-Encoding / Content-Length / Vary mutation.
- merge_vary_accept_encoding builds the merged "vary, Accept-Encoding"
value in one zend_string_alloc instead of going through smart_str
+ a second copy via put_header_string. Bound on the substring
search corrected to (cl - AE_LEN + 1) so the worst case scans
exactly the legitimate window.
decide():
- Throwaway compound-literal out-args replaced with a dedicated
request_has_header() that goes straight through zend_hash_str_exists.
request_header_value (renamed) keeps the value-out path for
Accept-Encoding which actually needs the bytes.
- const-correctness: request struct + headers HT pointers are read-only
on this path; signatures updated.
http_response_set_error:
- zend_hash_str_update for the content-type insertion (skips the
key alloc) and one fewer paired release.
UNEXPECTED hints concentrated on:
- allocation failures (vt->create returning NULL, encoder->ERROR)
- rare-branch state checks (first_chunk_done == false on per-chunk
callback, NULL request/headers on lookup helpers)
Regression: 108/109 server suite, 20/20 + 5/5 compression unit tests
unchanged. No behavioural changes — purely allocation accounting.
- CHANGELOG.md: [Unreleased] section with the full feature surface — build flag, five HttpServerConfig setters with defaults and ranges, per-response opt-out, negotiation rules, inbound decoding contract, zlib-ng vs zlib backend selection. - docs/COMPRESSION.md: user-facing reference covering build, knobs, default MIME whitelist, opt-out, RFC 9110 §12.5.3 negotiation (with the two pragmatic deviations: identity-only when AE absent, identity over 406 when nothing is acceptable), skip rules, request-side decoding outcomes (415 / 413 / 400), streaming framing behaviour, engine-name reporting, phase-2 scope. - README.md: feature row + zlib-ng badge so the capability is visible in the top-level pitch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.