Skip to content

8 http responserequest compression gzip via zlib ng phase 1#10

Merged
EdmondDantes merged 9 commits intomainfrom
8-http-responserequest-compression-gzip-via-zlib-ng-phase-1
May 6, 2026
Merged

8 http responserequest compression gzip via zlib ng phase 1#10
EdmondDantes merged 9 commits intomainfrom
8-http-responserequest-compression-gzip-via-zlib-ng-phase-1

Conversation

@EdmondDantes
Copy link
Copy Markdown
Contributor

No description provided.

Adds --enable-http-compression (default yes, fail-soft) with zlib-ng
preferred and zlib as fallback in both config.m4 and CMakeLists.txt.
Introduces http_encoder_t vtable under include/compression/ and the
gzip backend stub under src/compression/ — wiring only, real deflate
streaming lands in commit 4. The registry, codec-token lookup and
build-engine identifier are already callable so subsequent commits
can build against stable symbols.
Five PHP-visible setters with matching getters: setCompressionEnabled,
setCompressionLevel (1..9, default 6), setCompressionMinSize (default
1 KiB), setCompressionMimeTypes (replaces wholesale, nginx semantics)
and setRequestMaxDecompressedSize (anti-zip-bomb cap, default 10 MiB).

Default MIME whitelist lives in src/compression/http_compression_defaults.c
with the matching #defines in include/compression/http_compression_defaults.h
so policy edits land as a focused diff. The whitelist is materialised
into the config HashTable at init, so getCompressionMimeTypes() always
returns the live policy. setCompressionMimeTypes() normalises entries
(lowercase, stripped of `;` parameters, trimmed) once at setter time.

Frozen-snapshot machinery is extended: scalars copied directly, the
whitelist is deep-copied to a persistent zend_string array so cross-
thread LOAD reconstructs without touching source-side HashTables.

phpt: tests/phpt/server/compression/001-config-setters.phpt covers
defaults, set/get round-trip, MIME normalisation + dedup, validation
of out-of-range/empty inputs, and locked-config guard via HttpServer
construction.
…liser (#8)

Pure-C state machine for RFC 9110 §12.5.3 negotiation: q-values,
identity;q=0, *;q=0 with the special "*;q=0 excludes identity unless
identity has its own entry" rule. Tolerant of LWS, accept-ext params
after q=, malformed q (treated as q=1), and unknown codings (ignored
so phase-2 backends drop in without changes here). The select()
function returns HTTP_CODEC__COUNT when nothing is acceptable —
caller emits 406.

MIME normaliser strips params, trims, lowercases at setter/match time
so the per-request match is a single allocation-free zend_hash_str_exists
call in the response pipeline (commit 5).

20-case cmocka suite covers default/empty/explicit/wildcard combinations,
case-insensitive matching, malformed q tolerance, and MIME edge cases
(buffer-too-small, params-only). Built standalone — no Zend dep — so
the test compiles without the PHP runtime.
Replaces the commit-1 stub with a working deflate pipeline. windowBits
= MAX_WBITS+16 selects the gzip wrapper (RFC 1952) — 10-byte header,
CRC32+ISIZE trailer — instead of zlib's adler32 wrap. memLevel=8 and
the caller-clamped 1..9 level keep the encoder allocation-friendly.

Streaming contract:
  write()  returns OK on progress, NEED_OUTPUT when the output buffer
           filled before all input was consumed (caller drains and
           re-calls with the same input pointer advanced by *in_consumed).
  finish() returns NEED_OUTPUT until the trailer fits, then DONE.
           Z_BUF_ERROR after Z_FINISH is folded into NEED_OUTPUT —
           zlib's normal "give me more output" signal.

zlib-ng vs zlib: the TU compiles against either, selected by
HAVE_ZLIB_NG at build time (#define maps zng_* → z* equivalents).

Round-trip cmocka suite (5 cases) feeds the production encoder and
inflates the result with the same library: short text, empty body
(footer-only path), 256 KiB mixed-entropy body crossing chunk
boundaries, 16-byte output buffer that forces NEED_OUTPUT looping
through every call, and out-of-range level clamping (0/-1/10 must
not crash). Every output stream is also checked for the gzip magic
1f 8b prefix.
…out (#8)

Wires the gzip encoder into HttpResponse for HTTP/1. H2 and H3 land
in a follow-up commit using the same hook points.

State module (src/compression/http_compression_response.c) carries:
  attach()        — stashes request + server cfg on the response,
                    called by H1 dispatch right after install_stream_ops
  apply_buffered()— rewrites smart_str body in place via the encoder,
                    invoked from http_response_format / format_parts
                    so every buffered emit path benefits with no
                    per-emitter changes
  install_stream_wrapper() — on first send(), swaps stream_ops with a
                    compressing wrapper; wrapper feeds chunks through
                    the encoder, drains finish() in mark_ended, and
                    delegates get_wait_event so backpressure stays
                    correct
  decide()        — single source of truth combining
                    Accept-Encoding (parsed via the negotiate TU),
                    HEAD / Range / status / handler-set Content-Encoding,
                    MIME whitelist, body-size threshold, and per-response
                    opt-out

HttpResponse::setNoCompression() — BREACH-mitigation opt-out usable
from PHP; idempotent.

Default policy change: http_accept_encoding_init_default now resolves
to identity-only when no Accept-Encoding header was sent. RFC 9110
§12.5.3 strictly permits any coding in that case, but real-world
clients without AE are usually probes/scripts that may not handle
gzip, and BREACH risk argues for opt-in. nginx ships the same
default; rationale is in the impl comment.

Header mutation on commit: Content-Encoding: gzip, Vary appended with
Accept-Encoding (or set fresh), Content-Length stripped (recomputed
on buffered, absent on chunked).

phpt:
  010-h1-buffered-gzip      golden path: gzip + Vary + gunzip(1) round-trip,
                            identity when AE absent, identity on gzip;q=0
  011-h1-buffered-skips     image/png skip, below-threshold skip,
                            setNoCompression, handler-set Content-Encoding,
                            HEAD, Range, 204
  012-h1-streaming-gzip     chunked + gzip round-trip across four send()
                            chunks; checks Content-Length absent and
                            no leaks via the existing memory tracer

Public API: http_server_get_config(http_server_object*) added to
include/php_http_server.h so the H1 dispatch can hand the live cfg
to the compression attach helper without leaking the object layout.
Mirrors the H1 hooks landed in 4e3c01a:

- H2: http_compression_attach() at dispatch in http2_strategy.c
  right after install_stream_ops; http_compression_apply_buffered()
  at the top of http2_commit_stream_response, before headers flatten,
  so Content-Encoding/Vary/Content-Length mutations ride the HEADERS
  frame.

- H3: same pair — attach() in http3_dispatch.c, apply_buffered() in
  http3_stream_submit_response (callbacks.c) before the QPACK header
  emit. Streaming path on H3 is handled by the stream wrapper that
  HttpResponse::send() installs via the protocol-agnostic ops vtable.

H2 buffered phpt (020-h2-buffered-gzip): curl --http2-prior-knowledge
with Accept-Encoding: gzip → text/html gets Content-Encoding: gzip
+ Vary; image/png stays identity. Verifies the per-stream MIME match
fires even when several streams share one connection's config.

H3 e2e phpt deferred to follow-up — the embedded h3client harness
doesn't yet support request headers (no -H flag), so we can't drive
Accept-Encoding from there. The wiring itself is symmetric with H1
and H2; H1 (010/011/012) and H2 (020) cover the architecture.

Regression: 73/73 across server/core, server/h2, server/h3.
#8)

Adds gzip request body decoding (Content-Encoding: gzip from clients
— common with REST APIs, gRPC-web, webhooks). Decoder runs at the top
of every handler-coroutine entry (H1/H2/H3) before the user's PHP
function is invoked. Failures emit a canned text/plain error response
and skip the handler:

  unknown coding   → 415 Unsupported Content-Encoding
  bomb-cap exceeded → 413 Payload Too Large
  malformed inflate → 400

Anti-bomb cap is enforced at the inflate layer: every Z_OK loop
checks decoded size against cfg->request_max_decompressed_size before
growing the output buffer, so a 1 MiB compressed body cannot expand
past the configured ceiling. Default cap is 10 MiB; setter
established in commit 2.

`x-gzip` (RFC 9110 historical alias) is decoded as gzip; identity is
a no-op; absent header is a no-op.

Also: hot-path config access. conn->config is now cached at
http_server_bind_connection alongside conn->view / conn->counters,
following the same null-on-server-free discipline. The H1 and H2
dispatch hooks in commit 5 lose the http_server_get_config() call
chain in favour of one direct load. H3 dispatch keeps the function
call (its conn type doesn't share the same struct), since it's
already off the hottest path.

http_response_set_error() — internal helper that sets status +
text/plain content-type + body without going through the PHP-facing
guards. Used only when dispatch rejects a request before any handler
runs.

phpt 030-h1-request-gzip-in covers all four paths: gzip round-trip
(handler observes decoded length), 200 KiB gzip-bomb → 413,
br coding → 415, identity → no-op pass-through.

Regression: 108/109 across server/. 1 skip is a pre-existing
ssl-transport gap unrelated to this change.
apply_buffered:
  - Pre-size out smart_str to body_len + 64 so the common single-pass
    case is one allocation; smart_str_alloc only kicks in on the rare
    incompressible branch where deflate slightly inflates the input.
  - Drop the per-iteration smart_str_alloc(4096) and the dead
    "if (cap < 256)" follow-up that did nothing useful (just allocated
    again immediately after a 4 KiB grow). Cap-aware grow with explicit
    < 64 / < 32 thresholds now drives realloc only when needed.
  - UNEXPECTED on grow + ENC_ERROR; EXPECTED on the DONE branch of
    finish().

Streaming wrapper:
  - Replace per-iteration char[8192] stack buffer + multi
    zend_string_init / per-pass underlying append_chunk with a single
    smart_str accumulator. Each user-facing send() call now produces
    ONE downstream chunk, not N — relevant for chunked H1 (one size
    line + CRLF on the wire instead of N) and H2 (one DATA frame
    instead of N). Encoder still streams: smart_str grows on demand.
  - Same accumulator pattern in mark_ended for the gzip trailer.

Header mutation:
  - put_header_string / delete_header switched to zend_hash_str_update
    / zend_hash_str_del, which take a raw (name, len) pair and skip
    the per-call zend_string_init for the key. Saves one alloc per
    Content-Encoding / Content-Length / Vary mutation.
  - merge_vary_accept_encoding builds the merged "vary, Accept-Encoding"
    value in one zend_string_alloc instead of going through smart_str
    + a second copy via put_header_string. Bound on the substring
    search corrected to (cl - AE_LEN + 1) so the worst case scans
    exactly the legitimate window.

decide():
  - Throwaway compound-literal out-args replaced with a dedicated
    request_has_header() that goes straight through zend_hash_str_exists.
    request_header_value (renamed) keeps the value-out path for
    Accept-Encoding which actually needs the bytes.
  - const-correctness: request struct + headers HT pointers are read-only
    on this path; signatures updated.

http_response_set_error:
  - zend_hash_str_update for the content-type insertion (skips the
    key alloc) and one fewer paired release.

UNEXPECTED hints concentrated on:
  - allocation failures (vt->create returning NULL, encoder->ERROR)
  - rare-branch state checks (first_chunk_done == false on per-chunk
    callback, NULL request/headers on lookup helpers)

Regression: 108/109 server suite, 20/20 + 5/5 compression unit tests
unchanged. No behavioural changes — purely allocation accounting.
- CHANGELOG.md: [Unreleased] section with the full feature surface —
  build flag, five HttpServerConfig setters with defaults and ranges,
  per-response opt-out, negotiation rules, inbound decoding contract,
  zlib-ng vs zlib backend selection.

- docs/COMPRESSION.md: user-facing reference covering build, knobs,
  default MIME whitelist, opt-out, RFC 9110 §12.5.3 negotiation
  (with the two pragmatic deviations: identity-only when AE absent,
  identity over 406 when nothing is acceptable), skip rules,
  request-side decoding outcomes (415 / 413 / 400), streaming
  framing behaviour, engine-name reporting, phase-2 scope.

- README.md: feature row + zlib-ng badge so the capability is
  visible in the top-level pitch.
@EdmondDantes EdmondDantes linked an issue May 6, 2026 that may be closed by this pull request
@EdmondDantes EdmondDantes merged commit d48ba6b into main May 6, 2026
3 of 5 checks passed
@EdmondDantes EdmondDantes deleted the 8-http-responserequest-compression-gzip-via-zlib-ng-phase-1 branch May 6, 2026 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP response/request compression (gzip via zlib-ng) — phase 1

1 participant