feat: background workers = non-HTTP workers with shared state by nicolas-grekas · Pull Request #2287 · php/frankenphp

nicolas-grekas · 2026-03-16T18:53:17Z

Note

Description updated to reflect the latest pushes. API names and semantics are final pending review; see the thread for the back-and-forth that led here.

Summary

Background workers are long-running PHP workers that run outside the HTTP cycle. They observe their environment (Redis, DB, filesystem, etc.) and publish variables that HTTP threads (workers or classic requests) read per-request, enabling real-time reconfiguration without restarts or polling.

PHP API

Four functions:

frankenphp_ensure_background_worker(string|array $name, float $timeout = 30.0): void — declares a dependency on one or more background workers. Lazy-starts them if needed, blocks until each has called set_vars() at least once or the timeout expires. Two behaviors depending on caller:
- HTTP worker bootstrap (before frankenphp_handle_request): fail-fast. Any boot failure throws immediately with the captured details instead of waiting for the backoff cycle. Use for strict dependency declaration at boot.
- Everywhere else (inside frankenphp_handle_request, classic request-per-process): tolerant lazy-start. First caller pays the startup cost; later callers see the worker already reserved. Processes only start workers they actually exercise.
frankenphp_set_vars(array $vars): void — publishes vars from a background worker script (persistent memory, cross-thread). Skips all work when data is unchanged (=== check).
frankenphp_get_vars(string $name): array — pure read. Returns the latest published vars. Throws if the worker isn't running or hasn't called set_vars() yet. Generational cache: repeated calls within a single HTTP request return the same array instance (=== is O(1)).
frankenphp_get_worker_handle(): resource — readable stream for shutdown signaling. Closed on shutdown (EOF).

In CLI mode (frankenphp php-cli), none of these functions are exposed (MINIT-level hiding via zend_hash_str_del). function_exists() returns false, so library code can degrade gracefully.

Caddyfile configuration

php_server {
    # HTTP worker (unchanged)
    worker public/index.php { num 4 }

    # Named background worker (auto-started if num >= 1)
    worker bin/worker.php {
        background
        name config-watcher
        num 1
    }

    # Catch-all for lazy-started names
    worker bin/worker.php {
        background
    }
}

background marks a worker as non-HTTP
name specifies an exact worker name; workers without name are catch-all for lazy-started names
Not declaring a catch-all forbids lazy-started ones
max_threads on catch-all sets a safety cap for lazy-started instances (defaults to 16)
max_consecutive_failures defaults to 6 (same as HTTP workers)
max_execution_time automatically disabled for background workers
Each php_server block has its own isolated scope (opaque BackgroundScope type managed by frankenphp.NextBackgroundWorkerScope())

Shutdown

On restart/shutdown, the signaling stream is closed. Workers detect this via fgets() returning false (EOF). Workers have a 5-second grace period. In-flight ensure_background_worker calls unblock on globalCtx.Done() instead of waiting out their timeout.

After the grace period, a best-effort force-kill is attempted:

Linux ZTS: arms PHP's own max_execution_time timer cross-thread via timer_settime(EG(max_execution_timer_timer))
Windows: CancelSynchronousIo + QueueUserAPC interrupts blocking I/O and alertable waits
macOS: no per-thread mechanism available; stuck threads are abandoned

During the restart window, get_vars returns the last published data (stale but available, kept in persistent memory across restarts). A warning is logged on crash.

Boot-failure reporting

When a background worker fails before calling set_vars, ensure_background_worker throws a RuntimeException with the captured details: worker name, resolved entrypoint path, exit status, number of attempts, and the last PHP error (message, file, line) captured from PG(last_error_*).

Forward compatibility

The signaling stream is forward-compatible with the PHP 8.6 poll API RFC. Poll::addReadable accepts stream resources directly; code written today with stream_select will work on 8.6 with Poll, no API change needed.

Architecture

Per-php_server scope isolation via opaque BackgroundScope type. Internal registry is unexported.
Dedicated backgroundWorkerThread handler implementing threadHandler interface, decoupled from HTTP worker code paths.
drain() closes the signaling stream (EOF) for clean shutdown signaling.
Persistent memory (pemalloc) with RWMutex for safe cross-thread sharing.
set_vars skip: uses PHP's === (zend_is_identical) to detect unchanged data, skips validation, persistent copy, write lock, and version bump.
Generational cache: per-thread version check skips lock + copy when data hasn't changed.
Opcache immutable array zero-copy fast path (IS_ARRAY_IMMUTABLE).
Interned string optimizations (ZSTR_IS_INTERNED): skip copy/free for shared-memory strings.
Rich type support: null, scalars, arrays (nested), enums.
Crash recovery with exponential backoff and automatic restart.
ensure_background_worker accepts a batch of names with a shared deadline; fail-fast in bootstrap mode reports the failing worker's details.
$_SERVER['FRANKENPHP_WORKER_NAME'] set for background workers.
$_SERVER['FRANKENPHP_WORKER_BACKGROUND'] set for all workers (true/false).

Example

// Background worker: polls Redis every 5s
$stream = frankenphp_get_worker_handle();
$redis = new Redis();
$redis->connect('127.0.0.1');

while (true) {
    frankenphp_set_vars([
        'maintenance' => (bool) $redis->get('maintenance_mode'),
        'feature_flags' => json_decode($redis->get('features'), true),
    ]);
    $r = [$stream]; $w = $e = [];
    if (false === @stream_select($r, $w, $e, 5)) { break; }
    if ($r && false === fgets($stream)) { break; } // EOF = stop
}

// HTTP worker
frankenphp_ensure_background_worker('config-watcher'); // bootstrap: fail-fast

while (frankenphp_handle_request(function () {
    $config = frankenphp_get_vars('config-watcher'); // pure read
    if ($config['maintenance']) {
        return new Response('Down for maintenance', 503);
    }
    // ...
})) { gc_collect_cycles(); }

// Classic non-worker mode
frankenphp_ensure_background_worker('config-watcher'); // tolerant lazy-start
$config = frankenphp_get_vars('config-watcher');

Test coverage

Unit tests, integration tests, and one Caddy integration test covering: bootstrap fail-fast, runtime tolerant lazy-start, multi-name ensure, get_vars pure read, set_vars validation (types, objects, refs), CLI function hiding, enum support, binary-safe strings, multiple entrypoints, crash-restart reclassification, boot-failure rich errors, signaling stream, worker restart lifecycle, named auto-start with m# prefix, edge cases (empty name, negative timeout, timeout=0).

All tests pass on PHP 8.2, 8.3, 8.4, and 8.5 with -race. Zero memory leaks on PHP debug builds.

Documentation

Full docs at docs/background-workers.md.

AlliBalliBaba · 2026-03-16T21:32:39Z

Interesting approach to parallelism, what would be a concrete use case for only letting information flow one way from the sidekick to the http workers?

Usually the flow would be inverted, where a http worker offloads work to a pool of 'sidekick' workers and can optionally wait for a task to complete.

henderkes · 2026-03-17T02:46:28Z

Thank you for the contribution. Interesting idea, but I'm thinking we should merge the approach with #1883. The kind of worker is the same, how they are started is but a detail.

@nicolas-grekas the Caddyfile setting should likely be per php_server, not a global setting.

nicolas-grekas · 2026-03-17T08:31:12Z

@AlliBalliBaba The use case isn't task offloading (HTTP->worker), but out-of-band reconfigurability (environment->worker->HTTP). Sidekicks observe external systems (Redis Sentinel failover, secret rotation, feature flag changes, etc.) and publish updated configuration that HTTP workers pick up on their next request; with per-request consistency guaranteed via $_SERVER injection. No polling, no TTLs, no redeployment.

Task offloading (what you describe) is a valid and complementary pattern, but it solves a different problem. The non-HTTP worker foundation here could support both.

@henderkes Agreed that the underlying non-HTTP worker type overlaps with #1883. The foundation (skip HTTP startup/shutdown, immediate readiness, cooperative shutdown) is the same. The difference is the API layer and the DX goals:

Minimal FrankenPHP config: a single sidekick_entrypoint in php_server(thanks for the idea). No need to declare individual workers in the Caddyfile. The PHP app controls which sidekicks to start via frankenphp_sidekick_start(), keeping the infrastructure config simple.
Graceful degradability: apps should work correctly with or without FrankenPHP. The same codebase should work on FrankenPHP (with real-time reconfiguration) and on traditional setups (with static or always refreshed config).
Nice framework integration: the sidekick_entrypoint pointing to e.g. bin/console means sidekicks are regular framework commands, making them easy to develop.

Happy to follow up with your proposals now that this is hopefully clarified.
I'm going to continue on my own a bit also :)

dunglas · 2026-03-17T09:01:18Z

Great PR!

Couldn't we create a single API that covers both use case?

We try to keep the number of public symbols and config option as small as possible!

henderkes · 2026-03-17T09:24:06Z

@henderkes Agreed that the underlying non-HTTP worker type overlaps with #1883. The foundation (skip HTTP startup/shutdown, immediate readiness, cooperative shutdown) is the same. The difference is the API layer and the DX goals:

Yes, that's why I'd like to unify the two API's and background implementations into one. Unfortunately the first task worker attempt didn't make it into main, but perhaps @AlliBalliBaba can use his experience with the previous PR to influence this one. I'd be more in favour of a general API, than a specific sidecar one.

nicolas-grekas · 2026-03-17T09:58:26Z

The PHP-side API has been significantly reworked since the initial iteration: I replaced $_SERVER injection with explicit get_vars/set_vars protocol.

The old design used frankenphp_set_server_var() to inject values into $_SERVER implicitly. The new design uses an explicit request/response model:

frankenphp_sidekick_set_vars(array $vars): called from the sidekick to publish a complete snapshot atomically
frankenphp_sidekick_get_vars(string|array $name, float $timeout = 30.0): array: called from HTTP workers to read the latest vars

Key improvements:

No race condition on startup: get_vars blocks until the sidekick has called set_vars. The old design had a race where HTTP requests could arrive before the sidekick had published its values.
Strict context enforcement: set_vars and should_stop throw RuntimeException if called from a non-sidekick context.
Atomic snapshots: set_vars replaces all vars at once. No partial state possible
Parallel start: get_vars(['redis-watcher', 'feature-flags']) starts all sidekicks concurrently, waits for all, returns vars keyed by name.
Works in both worker and non-worker mode: get_vars works from any PHP script served by php_server, not just from frankenphp_handle_request() workers.

Other changes:

sidekick_entrypoint moved from global frankenphp block to per-php_server (as @henderkes suggested)
Removed the $argv parameter: the sidekick name is the command, passed as $_SERVER['argv'][1]
set_vars is restricted to sidekick context only (throws if called from HTTP workers)
get_vars accepts string|array: when given an array, all sidekicks start in parallel
Atomic snapshots: set_vars replaces all vars at once, no partial state
Binary-safe values (null bytes, UTF-8)

nicolas-grekas · 2026-03-17T11:03:41Z

Thanks @dunglas and @henderkes for the feedback. I share the goal of keeping the API surface minimal.

Thinking about it more, the current API is actually quite small and already general:

1 Caddyfile setting: sidekick_entrypoint (per php_server)
3 PHP functions: get_vars, set_vars, should_stop

The name "sidekick" works as a generic concept: a helper running alongside. The current set_vars/get_vars protocol covers the config-publishing use case. For task offloading (HTTP->worker) later, the same sidekick infrastructure could support:

frankenphp_sidekick_send_task(string $name, mixed $payload): mixed
frankenphp_sidekick_receive_task(): mixed

Same worker type, same sidekick_entrypoint, same should_stop(). Just a different communication pattern added on top. No new config, no new worker type.

So the path would be:

This PR: sidekicks with set_vars/get_vars (config publishing)
Future PR: add send_task/receive_task (task offloading), reusing the same non-HTTP worker foundation

The foundation (non-HTTP threads, cooperative shutdown, crash recovery, per-php_server scoping) is shared. Only the communication primitives differ.

WDYT?

nicolas-grekas · 2026-03-17T12:57:31Z

~~I think the failures are unrelated - a cache reset would be needed. Any help on this topic?~~

alexandre-daubois · 2026-03-17T13:34:13Z

Hmm, it seems they are on some versions, for example here: https://github.com/php/frankenphp/actions/runs/23192689128/job/67392820942?pr=2287#step:10:3614

For the cache, I'm not aware of a Github feature that allow to clear everything unfortunately 🙁

alexandre-daubois · 2026-04-21T09:41:18Z

After re-reading the thread, here's my position.

The target use case feels right, and I don't have a concrete need today that would justify use cases beyond the background-to-HTTP flow, like HTTP workers publishing data between themselves. I also have no objection in principle to lazy-starting from PHP: the DX it provides is valuable, and starting a worker can remain conditional on an app-level trigger. Feels a bit like goroutines, I like it.

That said, two points tip me toward an API that cleanly separates the lifecycle from the data store:

Once frankenphp_get_worker_vars($name) ships, the semantic "$name maps to a declared worker" is frozen. I don't see how we extend that function later without either introducing a parallel function or breaking backward compatibility. Decoupling the store from the lifecycle now keeps the door open.
The concern about an opaque diagnostic chain: an unexplained timeout on get_worker_vars, traced back to a non-obvious catch-all in the Caddyfile, then back to the calling script. An explicit start_background_worker() makes the trace linear and readable, even if we keep lazy-start via catch-all as a DX convenience.

Concretely, I'd support an API where the store is exposed in a generic form, and where starting a background worker is a named, explicit operation. Nothing prevents lazy-start via Caddyfile catch-all from still kicking in on a get_vars() call with a known name: that preserves the ergonomic side on the app without freezing the public API's semantics.

On the other points, I'm aligned with what already seems to be converging in the thread.

Sorry if I forgot something, there are quite a few comments 🙂

nicolas-grekas · 2026-04-21T11:40:59Z

Thanks for taking the time @alexandre-daubois. Two things I'd like to push back on, plus one structural argument that I think hasn't been engaged with yet.

On "freezes the $name maps to declared worker semantics": I don't see what extension this actually blocks. Names are the most abstract contract shape possible: DNS hostnames, Redis keys, Symfony service IDs, HTTP endpoints all work this way. The implementation behind a name is free to change without touching the API. A path-based API (start_worker($path)) would freeze far more: the contract becomes "PHP starts this specific local script," locking the implementation. Could you share a concrete example of what extension get_worker_vars($name) would block in the future?

On the diagnostic chain concern: I offered some improvements in my previous reply, I think we can push this further until errors from workers are surfaced better to callers, all compatible with the API I'm proposing. That's the one aspect I think we can improve further.

On the explicit start_background_worker(): I'd like to understand what blocking semantics you have in mind, because the function only makes sense under one of two shapes, and both have issues:

If it blocks until ready: it's functionally equivalent to get_worker_vars() minus the returned data. Same lazy-start, same wait, same ready signal. Adding a second function for that case is pure redundancy.
If it returns immediately after spawning: the caller has no way to tell if the worker actually booted. It could fail 1ms later and the caller would never know. You'd then need a separate wait_ready() primitive to get feedback, more API surface.

The current design sidesteps both: one frankenphp_set_worker_vars([]) call at boot (even for side-effect-only workers) gives callers a real ready-state signal via get_worker_vars, and the failure mode is uniform (rich timeout error with boot-failure details). The "workaround" is actually a feature: it makes the ready contract explicit and symmetric.

The structural argument I'd like your read on: the current model is one-writer-many-readers by construction. Only the background worker owning $name can write to it, enforced at the type level, no CAS needed, no ordering surprises, no divergent writes to reconcile. A generic store where any caller can set_vars($name, $data) puts us in many-writers-many-readers territory: you need CAS (apcu_cas), merge semantics, version vectors. CRDTs exist precisely because shared-memory concurrency without enforced single-writer semantics is hard. This is why I push back on the "generic KV store" framing: it's not a cosmetic API choice, it's a fundamental concurrency model shift. Did you land on "fine, accept the tradeoff" or did that not factor into your position yet?

henderkes · 2026-04-21T12:18:02Z

"get_vars might be hard to debug": the realistic failure modes are: (1) timeout waiting for first set_worker_vars, or (2) the worker throws before calling set_worker_vars at all. Both already surface as RuntimeException from get_worker_vars. We can make those error messages very explicit: include the worker name, the resolved script path, and, why not, the worker's stderr if we captured it. That directly addresses the "trace from failure to root cause" concern, without changing the API shape.

get_vars gets hard to debug when it sometimes starts a worker and sometimes it doesn't. It's not an issue if we were to stick with your worker_get_vars implementation, but with the more general get_vars api we all want.

"start_worker() is more secure because paths are restricted to the project root": I'm not sure why files in the project root would be safer to execute as long-lived workers than any other file (eg files under vendor/). If you meant the public-web root, I wouldn't put worker entrypoints there (like bin/console isn't under public/). With the Caddyfile approach, there's zero attack surface around worker startup from PHP: the infra declares the trusted entrypoints once.

Not necessarily the public webroot, but a root defined by the Caddyfile for sure. The problem with your approach is that it limits to a single background worker script, which is likely in a framework like Symfony with a central kernel and container, but otherwise not.

"It's no worse than exec": that comparison doesn't fully holds. exec() is universally recognized as dangerous: linters, SAST tools, code reviewers, and devs all flag it and know to sanitize input. A new start_worker($path) wouldn't benefit from any of that for years; every composer package using it becomes a potential path-traversal vector that users would need to audit. This is a real ecosystem cost, and keeping input-sensitive functions to a minimum is a principle worth preserving.

It's a fair point, but again, I'm not concerned with security when it's explicitly configurable through some background-worker-directory. We're not giving anyone a gun here, they're stealing it, pointing it at their foot and fire repeatedly when someone actually manages to run into a security issue with it. And they could do the same with a single entrypoint, too.

"start_worker() is more explicit and therefore better": it moves the explicitness from the config file to PHP code, while adding API surface. It also introduces its own traceability hazards: what happens if code path A calls start_worker('foo', 'a.php') and path B calls start_worker('foo', 'b.php'), or both happen concurrently on different threads? Either we error out (bad UX for libraries that don't coordinate), or we silently keep the first registration (a debugging trap and a race condition that don't exist today). With entrypoints-from-Caddyfile, there's exactly one place where a name maps to a script.

While adding API surface is true, but it's unified api surface that we'd most likely add at some point anyways. Then it's better to have an explicit API than magic behaviour on one, but not the other.

There's also a design-level cost: taking a path at the API surface leaks the implementation. Right now, a name is an opaque handle for "some process that publishes this data". The mechanism is up to the infra running the app: on FrankenPHP it starts a local PHP script, on a polyfill we could dispatch to an external process. The moment $path is part of the API, the contract becomes "PHP starts this specific local script", and that abstraction is locked in.

That's a very fair point that I don't have a perfect solution to. It's actually one where I'm going back and fourth between even using names (and just using anonymous lists) in another project I'm working on.

"start_worker is just more explicit about what the get function does": this is probably the root of the disagreement. start_worker() assumes PHP is the process that bootstraps workers. But FrankenPHP isn't like the parallel extension: PHP isn't the master process here, FrankenPHP itself is. Workers survive HTTP requests. The Go side owns their lifecycle. Asking PHP to "start a worker" inverts that ownership model.

Lazy-start from get_worker_vars fits that model naturally: it's not PHP asking Go to create a worker, it's PHP saying "I need this data, please ensure its producer is running." Go decides how and when. PHP doesn't need to care.

See point 1, because at that point it would just be confusing about what issues a lazy start and what doesn't. (And I'm honestly not even sure how useful a lazy start really is, what problem does that solve? The library will have a dependency on the Caddyfile configuration at that point and the worker existing if it's hit once, and if it is, it would never shut down again)

"FrankenPHP should have a generic KV store": I'd argue that's outside FrankenPHP's scope. APCu, Redis, Memcached, and plenty of extensions already do this well. FrankenPHP's unique value is thread and worker management: that's what no PHP extension can replace, because it requires ownership of the process model. The API surface we should expose is the one that couldn't be done without FrankenPHP. That's how we keep it minimal.

On API bloat: if we don't add start_worker(), set_vars(), get_vars(), the count stays at three: set_worker_vars, get_worker_vars, get_worker_handle. Those three cover the entire use case: publish, consume, cooperate. If APCu-like CAS primitives make sense later, they belong in a separate concern, possibly a separate PR, possibly a PHP extension, but either way not coupled to this feature.

These are all more or less the same point of disagreement which is: this PR is locking that decision in "forever". No generic KV store, no matter if it would ever make sense (I'd argue it would, how else would you share vars within the same application on different threads, but guard it from being accessed by other, unrelated applications? Using apcu for this is very dirty and will suffer from heavy fragmentation for a runtime concern.

I think @alexandre-daubois essentially has the same considerations that Alex and I do too.

If it blocks until ready: it's functionally equivalent to get_worker_vars() minus the returned data. Same lazy-start, same wait, same ready signal. Adding a second function for that case is pure redundancy.

It would obviously be blocking until started, but it would a generic API surface that could be reused for task workers, that we're still intending to add. And it would be explicit. And it would solve the inability to reason about what a unified frankenphp_get_vars would do (i.e. never lazy start a worker).

The structural argument I'd like your read on

I just think the actual issue with it is the same as before: worker string names lead to poor reasoning. If library A uses 'redis' and library B uses the same, but both expect different worker scripts, we have the exact same issue that the many-writers, many-readers has. If we don't have conflicting worker names, there's no issue with many-writers-many-readers either.

alexandre-daubois · 2026-04-21T12:28:52Z

Marc has already articulated most of where I land, so I'll stay short and add a few angles I don't think have come up yet.

On the DNS / Redis / Symfony service name analogy, I think the comparison doesn't hold. Those APIs deliberately separate concerns: DNS has getaddrinfo, not "getaddrinfo that spawns a server if the name doesn't resolve." Redis has GET, BLPOP, SUBSCRIBE as distinct primitives. None of them bundle "read + block + lazy-spawn + timeout" into one call. What's under discussion isn't the abstractness of names, it's the composition of side effects on a single primitive, which is where the real lock-in sits.

About testability: a global function whose call can spawn a worker process is hostile to unit tests. Libraries adopting this will either need to wrap it in their own abstraction or give up on isolation in tests. A store-shaped API is materially more mockable.

Also, about the principle of least surprise: get_worker_vars('foo') silently starting a process is a footgun that better error messages don't remove. The surprise isn't "the error was confusing," it's "a read-looking call had a lifecycle-altering side effect."

Finally, genuine question about the API: is it possible to unset a key?

dunglas · 2026-04-21T12:41:20Z

Edited: I missed last response by @alexandre-daubois and I agree with him. API updated.

Thanks, everyone, for the depth of this one! @nicolas-grekas for the huge amount of work, and @henderkes, @AlliBalliBaba, @alexandre-daubois, @dbu for the careful pushback. I've read through the whole thread, and I think we're close to merging it. Most of the back-and-forth is really
three questions tangled together: how workers start, generic vs. worker-scoped names, and single-writer vs. many-writers. Once they're separated, each side is clearly right on some of them.

Here's my opinion on this: Caddyfile and the whole Go/C runtime stay as Nicolas designed them, but we make small changes to the PHP API:

frankenphp_start_background_worker(string $name, float $timeout = 30.0): void
frankenphp_set_vars(array $vars): void
frankenphp_get_vars(string|array $name, float $timeout = 30.0): array
frankenphp_get_worker_handle(): resource

set_vars() takes no $name. The caller writes to its own scope: a background worker writes to its declared name, and that's it. We keep Nicolas' single-writer-per-scope guarantee structurally (no CRDTs, no CAS, same safety), and we drop _worker_ from the data functions so we don't freeze semantics we don't need to freeze as suggested by Alexandre (it's also my main concern).

start_background_worker() blocks until the worker has called set_vars once (clean ready signal, same at-most-once semantics, name-only so Caddyfile stays the trust boundary). get_vars() is pure read, it can still block waiting for data if callers want it to, but no lifecycle side effect. One
extra line at bootstrap (start then get) in exchange for a clean trace, mockable code, and a read that behaves like a read. Good trade.

On unsetting a key: with snapshot semantics it's just set_vars a new array without the key — no dedicated primitive needed. If we ever add per-key writes we'd add a matching unset.

We can apply the same logic for #2319, drop the _worker_ part:

frankenphp_task_send(string $name, array $payload, float $timeout = 30.0): resource
frankenphp_task_read(resource $stream): ?array
frankenphp_task_receive(): ?array
frankenphp_task_update(resource $stream, array $data): void

The fact that a worker picks up the task is an implementation detail.

WDYT?

alexandre-daubois · 2026-04-21T12:49:57Z

Looks like the best of both worlds @dunglas. Dropping _worker_ from the data functions resolves the forward-compat concern I was most worried about.

Sorry if this was answered somewhere in the comments: what's the defined behavior when the caller has no worker scope, e.g. called from an HTTP request context, a CLI script, or any non-worker code path? Should it be no-op or throw a RuntimeException? I'd be in favor of the latter.

dunglas · 2026-04-21T12:55:24Z

I would throw too

henderkes · 2026-04-21T13:11:22Z

frankenphp_start_background_worker(string $name, float $timeout = 30.0): void
frankenphp_get_vars(string|array $name, float $timeout = 30.0): array

Why do we need a timeout for the get_vars? Shouldn't that just return immediately since the prior start_background_worker call is already blocking and guarantees it to be ready?

frankenphp_get_worker_handle(): resource

Perhaps this should return an object on which php can call get_stream()? Or are we certain that a resource will fulfil our future requirements for what that handle has to do?

Sorry if this was answered somewhere in the comments: what's the defined behavior when the caller has no worker scope, e.g. called from an HTTP request context, a CLI script, or any non-worker code path? Should it be no-op or throw a RuntimeException? I'd be in favor of the latter.

You're talking about frankenphp_set_vars? My immediate thought is to throw, but it's hard to say. What if we wanted to update a php_server-wide variable from a http thread in the future? Once we guarantee throwing, we cannot change it later anymore without potentially breaking code that expected a throw.

I'm generally happy with that direction, but I'd still want to argue the case for being able to define multiple background worker scripts. We went out of our way to support non-framework code all the way up until this point, for the gain I see (for a single script would already mostly disappear with an explicit start_background_worker call). I'm just not yet convinced that lazy-starting workers is really worth it from a single script. Or, going back a step, if we weren't better off defining background workers explicitly in the caddyfile.

nicolas-grekas · 2026-04-21T16:22:02Z

Thanks @dunglas for the proposal, I think we're very close. Let me suggest a small refinement that I think fully addresses the debuggability objection without giving up anything structural.

Proposal (noted about #2319 also)

frankenphp_require_background_worker(string $name, float $timeout = 30.0): void
frankenphp_set_vars(array $vars): void
frankenphp_get_vars(string|array $name): array
frankenphp_get_worker_handle(): resource

Four functions, same count as your proposal. Two differences:

`require` instead of `start`

The function is a dependency declaration, not a command. "Start" is slightly misleading because Go owns the worker lifecycle, PHP doesn't actually start anything. "Require" better matches the intent: "I declare that this worker must be running."

Mode-dependent semantics for `require_background_worker`

In a worker script, BEFORE frankenphp_handle_request: starts the worker if needed, blocks until first set_vars call or timeout, throws immediately on boot failure (no exponential backoff). This is the "declare my dependencies up front, fail fast if broken" pattern.

In a worker script, INSIDE the request loop: must refer to a worker that's already running (either declared with num 1 in Caddyfile, or previously required during bootstrap). Throws if the name isn't known. Blocks with timeout if the worker is currently in crash-restart. Never lazy-starts a new worker. This makes runtime calls a clean assertion: "this dependency must be available now."

In NON-worker mode: starts the worker if needed, blocks with timeout, tolerates transient boot failures via exponential backoff. Same as current lazy-start behavior, just explicit.

`get_vars` becomes pure-read everywhere

No lifecycle side effects, no timeout argument needed. Throws if the name isn't currently running. Consistent semantics across worker and non-worker modes.

Usage

// Worker mode
frankenphp_require_background_worker('config-watcher'); // bootstrap, fail-fast
while (frankenphp_handle_request(function () {
    $cfg = frankenphp_get_vars('config-watcher'); // pure read
})) { gc_collect_cycles(); }

// Non-worker mode (every request)
frankenphp_require_background_worker('config-watcher'); // tolerant
$cfg = frankenphp_get_vars('config-watcher');

Why this works

Addresses the "sometimes starts, sometimes doesn't" debuggability concern: get_vars never has lifecycle side effects. The require call is where lifecycle lives, and its name says so.

Preserves the mode asymmetry in the right place: the only mode-dependent behavior is in the lifecycle function (where mode genuinely matters: bootstrap vs. request), not in reads. The asymmetry is visible from the function name.

Keeps set_vars scope-less: the caller still writes to its own scope. Single-writer-per-scope, no CAS, no CRDTs.

Caddyfile remains the trust boundary: require takes a name, not a path. No new input-sensitive API.

Runtime discipline in worker mode: assertions instead of lazy-starts. Library code can declare "my dependency must be running" at runtime without the side-effect surprise.

Non-worker mode ergonomics: accepts that non-worker mode re-initializes everything per request. The require + get pattern is consistent with the rest of per-request setup in that mode.

Answering specific points from the thread

"Non-worker mode should throw": to be clear, background workers already work in non-worker mode today, and I want to keep it that way. Non-worker scripts can require and get_vars normally, that's one of the core use cases (classic request mode reading live config from a bg worker). The only thing that should throw in non-worker mode is set_vars, because the caller has no bg worker scope to write to. That's already the behavior, no change needed.

CLI: rather than throwing, we can simply not expose the functions in CLI mode. CLI is a standalone PHP execution with no worker pool, the functions would be meaningless. Not exposing them is cleaner than throwing at runtime.

get_worker_handle returning an object with get_stream() for future-proofing: I'd push back. PHP streams are the universal primitive for async I/O in PHP. They're not going anywhere, and the upcoming PHP 8.6 poll API RFC is built on top of them (Poll::addReadable accepts stream resources directly). Wrapping them in an object to "future-proof" adds complexity today for a future requirement that doesn't exist and likely won't. If one day we need something a stream can't express, we can add a new function then, and the old one still works.

"Allow defining multiple background worker scripts": this is already supported. You can declare as many worker { background; name X } blocks as you want in the Caddyfile, each with a different script. The catch-all (worker { background } without a name) is an additional mechanism, not the only one.

"catch-all assumes a framework with a central kernel": the catch-all is completely framework-agnostic. The dispatch is a single $_SERVER['FRANKENPHP_WORKER_NAME'] lookup, that's it. A match statement, a class_exists check, a require of a file named after the worker, whatever you prefer. The idea of having a single entrypoint that dispatches to multiple workers doesn't require a kernel or container, it's just switch ($_SERVER['FRANKENPHP_WORKER_NAME']) { ... }. Many PHP libraries already ship a bin/ script that dispatches to different subcommands based on $argv[1], this is the same pattern.

The DX win of the catch-all is that library authors can ship a worker entrypoint that handles multiple named workers without requiring their users to declare each one in the Caddyfile. Remove this and libraries have to document "add these N worker blocks to your Caddyfile" instead of "add one worker { background } block". That's a real usability regression for a debugging concern that the explicit require already addresses.

dunglas · 2026-04-22T09:06:16Z

I don't like require much because require/require_once PHP keywords are totally different (our function doesn't take a path as a parameter).

WDYT about frankenphp_ensure_background_worker(string $name, float $timeout = 30.0): void?

henderkes · 2026-04-22T09:10:18Z

We wouldn't ensure a background worker, we would ensure a background worker is running. I'm with you though, require feels wrong.

I'm still in favour of start, it's fine to me.

henderkes · 2026-04-22T09:19:04Z

"Non-worker mode should throw": to be clear, background workers already work in non-worker mode today, and I want to keep it that way. Non-worker scripts can require and get_vars normally, that's one of the core use cases (classic request mode reading live config from a bg worker). The only thing that should throw in non-worker mode is set_vars, because the caller has no bg worker scope to write to. That's already the behavior, no change needed.

Yes, I think we all meant set_vars. Of course background workers and get_vars should work without worker mode.

CLI: rather than throwing, we can simply not expose the functions in CLI mode. CLI is a standalone PHP execution with no worker pool, the functions would be meaningless. Not exposing them is cleaner than throwing at runtime.

We already don't do frankenphp sapi bootup (embed instead) in the cli version. With my proposed php-src change it would use the cli sapi, still without the frankenphp extension.

get_worker_handle returning an object with get_stream() for future-proofing: I'd push back. PHP streams are the universal primitive for async I/O in PHP. They're not going anywhere, and the upcoming PHP 8.6 poll API RFC is built on top of them (Poll::addReadable accepts stream resources directly). Wrapping them in an object to "future-proof" adds complexity today for a future requirement that doesn't exist and likely won't. If one day we need something a stream can't express, we can add a new function then, and the old one still works.

I was thinking of potential worker orchestration from php side later. But thinking about it again, we could do that with streams too, so it's fine.

"Allow defining multiple background worker scripts": this is already supported. You can declare as many worker { background; name X } blocks as you want in the Caddyfile, each with a different script. The catch-all (worker { background } without a name) is an additional mechanism, not the only one.

Sorry, I should've re-read the current version. We've been through so many iterations, at this point it's all getting a bit fuzzy, haha.

No further objections from my side then.

nicolas-grekas · 2026-04-22T10:14:26Z

Thanks @henderkes for the follow-up confirming no further objections. On your php-src change proposal: if it lands and makes FrankenPHP CLI use the cli SAPI without the frankenphp extension, our MINIT-level function hiding in this PR becomes unnecessary and can be reopened as a follow-up cleanup. Happy to track that.

I pushed a set of refinements on top of the previous round. Summary of what changed and why:

1. Rename `require_background_worker` → `ensure_background_worker`

I considered start_background_worker, but chose ensure over start because the semantic is "make sure this worker is running, start it if it isn't". On your nit @henderkes's ("ensure a background worker" is imprecise, should be "ensure it's running"): in context the implied "...is running" is conventional English.

2. Tolerant lazy-start inside `frankenphp_handle_request`

My previous proposal had three modes for the require/ensure function:

HTTP worker bootstrap (before handle_request): lazy-start + fail-fast
HTTP worker runtime (inside handle_request): assert-only
Non-worker mode: lazy-start + tolerant

The runtime assert-only was meant to enforce "declare deps at bootstrap", but it leads to over-provisioning in practice. It's often easy to list which workers might be used by an app, but much harder to know which ones will actually be exercised by a given deployment's traffic. Under the 3-mode strategy, you'd have to ensure every possible worker at bootstrap, starting workers that a given deployment may never actually use.

Collapsed to two modes:

HTTP worker bootstrap: lazy-start + fail-fast (unchanged)
Everywhere else: lazy-start + tolerant (runtime + non-worker converge)

Bootstrap keeps its strict discipline: a broken dep visibly fails the HTTP worker boot rather than letting it serve degraded traffic. Everywhere else, the first caller actually using a worker pays the startup cost; subsequent callers see the worker already reserved. Workers that the running process never needs never start.

3. Multi-name `ensure`

ensure_background_worker now accepts string|array. Batch dependency declaration with a shared timeout across names. In bootstrap mode, a boot failure on any of them fails fast with that worker's captured details:

frankenphp_ensure_background_worker(['redis-watcher', 'feature-flags', 'config-watcher']);

get_vars loses the array form and becomes single-name only. Multi-name makes more sense for a declarative "ensure these are running" call than for a read (per-read cached lookup is already O(1)).

4. Context cancellation on shutdown

Added <-globalCtx.Done() to ensure_background_worker's blocking select. An in-flight ensure unblocks cleanly with a clear error during FrankenPHP shutdown, instead of waiting out its full timeout.

5. Fix: ready-state accounting across crash-restart

Found during review. markBackgroundReady() was gated by sk.readyOnce, so it only ran on the first-ever set_vars. After a crash-restart cycle:

setupScript sets isBootingScript = true on the restarted thread.
The next set_vars goes through readyOnce.Do → no-op.
isBootingScript stays true permanently.

Consequences: subsequent crashes were misclassified as StopReasonBootFailure, the readyWorkers metric gauge stayed decremented even while the worker was healthy again, and the bootFailure atomic was written on each post-ready crash so bootstrap-mode ensure could show misleading "boot failure" info.

User-visible behavior was fine (sk.ready stays closed across restarts, so ensure returns fast and get_vars returns last-known state), but the metrics and internal classification were wrong.

Works for y'all?

dunglas · 2026-04-22T10:30:58Z

The last version of the public API sounds good to me! Excellent work.
I'll do a full code review soon.

henderkes · 2026-04-22T10:58:54Z

Rename require_background_worker → ensure_background_worker
I considered start_background_worker, but chose ensure over start because the semantic is "make sure this worker is running, start it if it isn't". On your nit @henderkes's ("ensure a background worker" is imprecise, should be "ensure it's running"): in context the implied "...is running" is conventional English.

I'm sorry, but I strongly disagree here. ensure worker is "making sure (a) worker". That's confusing at best, misleading in actuality. ensure_worker_running also doesn't imply that one will be started, I'd expect such a function to just throw if it wasn't. frankenphp_start_worker with a note that it's idempotent is really what happens here.

The runtime assert-only was meant to enforce "declare deps at bootstrap", but it leads to over-provisioning in practice. It's often easy to list which workers might be used by an app, but much harder to know which ones will actually be exercised by a given deployment's traffic. Under the 3-mode strategy, you'd have to ensure every possible worker at bootstrap, starting workers that a given deployment may never actually use.

When this is a real concern (and I'm not sure it is), I think we should shut down workers that haven't been asked for in a while. I'm all for keeping it as simple as possible: frankenphp_start_worker should behave exactly the same, no matter if called from a worker bootup, a worker run, or a regular run.

nicolas-grekas · 2026-04-22T11:23:04Z

ensure/start I'll follow your lead - @dunglas any stronger opinion?

frankenphp_start_worker should behave exactly the same, no matter if called from a worker bootup, a worker run, or a regular run

I agree, and that's now closer to one behavior! the only special case is failing early when a worker cannot start while http workers didn't call handle_request yet. I think that's a net safety gain that will improve robustness for ppl that can start things early, because it makes putting frankenphp live safer. The backoff mechanism of http workers will help recover from that automatically on startup when possible, while providing quicker feedback.

withinboredom · 2026-04-22T11:47:40Z

This PR is absolutely massive. 3k loc change ... I'd argue breaking it down by scope and merge in minimal working systems, iterating as you go and paying attention to related issues so you learn the pain points users experience.

For this PR ... There's so much going on, and some of it is not-obviously-wrong. There are at least 3 potential race conditions that jump out at me immediately, double close issues (which can create a security vulnerability or corruption), workers potentially getting stuck in half-started states, caddy file ordering issues, lack of synchronization, etc. Sure, many of these problems "go away" by enforcing exactly one worker thread and assuming users only use caddy to run frankenphp, but it would be a ton of work to remove that constraint if/when we want to.

I'd be happy to review the whole diff, but my personal preference is to break it down. Here's where I see some seams:

Force-kill infrastructure: The frankenphp_init_force_kill / save_php_timer / force_kill_thread machinery is a self-contained cross-platform primitive. Land it with its own tests (stuck-thread recovery during normal DrainWorkers), independent of background workers. It's useful on its own and gives the later graceful-shutdown work a foundation that's already been reviewed.
Persistent-zval helpers (bg_worker_vars.h): Validation, deep-copy into persistent memory, interned-string and immutable-array fast paths, enum serialization, all exercisable against a trivial in-process API without any thread plumbing. This is the subsystem most likely to have latent refcount or memory-lifetime bugs; reviewing it in isolation with targeted unit tests is much higher-signal than finding issues inside a 3k-line diff. Could be useful for all frankenphp extensions, possibly.
Thread-handler refactor: add drain() to the interface. Pure mechanical change across threadregular, threadworker, threadinactive, threadtasks_test. Lands the seam the background-worker code needs without introducing any new behaviour.
Minimal background worker: one named worker, one thread, one scope. No catch-all, no lazy-start, no pools, no multi-entrypoint, no ensure_background_worker batching. Just: declare a named background worker in a Caddyfile, it starts at boot, publishes via set_vars, HTTP threads read via get_vars. This is the smallest thing that proves the core design (persistent vars, per-request cache, versioning, signaling stream, boot-failure classification) works end-to-end. Most of the race conditions and double-close issues either don't exist at this scope or become trivially analysable. Graceful shutdown + restart. Stop-pipe signaling, grace period, force-kill integration, RestartWorkers behaviour. Get this right with one thread per worker before adding any concurrency.

You could stop here, or keep going. User demand (how can I add more instances?) gives a good reason to continue.

ensure_background_worker with lazy-start + catch-all. Now the registry, reserve/remove protocol, and bootstrap-vs-runtime fail-fast semantics are worth their complexity because there's a working system to build on. Batch-name support can be its own follow-up; the single-name path is the interesting one.
Per-php_server scoping. BackgroundScope, the Caddy provisioning hooks, WithRequestBackgroundScope. Cleanly separable from everything above and each block's isolation is easier to reason about when the within-block behaviour is already settled.
Pools (num > 1) and multi-entrypoint. Explicitly punted by the current Caddyfile validation; land it last, when the constraints around shared registries and partial-start rollback have had time to surface through real usage.

Each of these is independently useful, independently reviewable, and (importantly) independently revertible if a design choice turns out to be wrong. Step 4 alone covers probably 80% of what users will actually reach for. If steps 5–7 take another release or two while patterns emerge from issues, that's fine; the feature is still shipped.

The other thing this buys you: each slice lets the next one's API be informed by what users actually do with the previous one. Shipping 3k lines at once locks in set_vars / get_vars / ensure / batch-names / scoping / catch-all semantics before anyone has written a single real background worker against them.

This is good work, and I'm excited to see where it goes.

nicolas-grekas · 2026-04-23T06:32:08Z

Thanks @withinboredom that's really useful! I think I found and fixed all the issues you described.
Then you might have noticed #2365, #2366, #2367, following your proposal.
Would be great to have them merged quickly, I already have step 4 ready locally but it needs them all merged before.
Thanks!

Introduce background workers via WithWorkerBackground() (Go API) and the Caddyfile `background` token on workers. Background workers share the PHP runtime with HTTP threads but don't serve HTTP requests. They expose a stop pipe (frankenphp_get_worker_handle()) so PHP scripts can park on stream_select and exit gracefully when FrankenPHP drains. The handler auto-restarts the worker on crash with quadratic backoff capped at 1s. The bg worker name is global in this commit; follow-ups will add ensure(), per-php_server scoping, pools, and shared-state APIs (set_vars/get_vars).

Adds frankenphp_ensure_background_worker(string $name): void on top of the minimal background worker from the previous commit. Fire-and-forget: the function lazy-starts the named worker if it is not already running and returns once a thread has been launched, without waiting for the PHP script to reach any particular state (no readiness signal exists in this build; that arrives with the set_vars/get_vars step that follows). Registry + lookup layer: - backgroundWorkerRegistry tracks the template options (env, watch, maxConsecutiveFailures, requestOptions) from one declaration plus the live worker instances spawned from it. Catch-all registries carry a maxWorkers cap. - backgroundWorkerLookup holds a name->registry map plus a single catch- all slot. resolve() falls back to catch-all when the name is not declared. Catch-all dispatch: - A name-less background-worker declaration matches any ensure() name at runtime. max_threads on a catch-all is the cap on how many distinct lazy-started instance names it can host (default 16). Caddyfile no longer requires "name" on background workers, and accepts max_threads > 1 on the catch-all (still rejected on named bg workers). Named lazy path: - A num=0 named declaration registers the worker struct at init but defers thread attach until ensure() schedules it. ensure() reuses the existing struct via workersByName instead of creating a duplicate. calculateMaxThreads now reserves per-bg-worker thread budget separately from HTTP-worker counts and scales catch-all reservations with the declared max_threads, so lazy starts always have a slot to schedule into. metrics.TotalWorkers is registered for bg workers so StartWorker calls in the bg-worker thread aren't silent no-ops in bg-only deployments. $_SERVER['FRANKENPHP_WORKER_NAME'] is now populated for background workers so catch-all instances can tell which name they were started under (lets sentinel-based tests distinguish job-a from job-b). Tests (background_worker_ensure_test.go) cover: - ensure() on a declared num=0 named worker lazy-starts it - ensure() on a name matched by catch-all spawns from the catch-all template; two distinct names produce two independent instances - ensure() with no catch-all and an undeclared name returns the config error - catch-all max_threads cap rejects the (cap+1)th distinct name

Adds a BackgroundScope opaque type (int under the hood; obtain values via NextBackgroundWorkerScope) so each php_server block gets its own isolation boundary for background workers. Zero is the global/embed scope. - backgroundLookups map[BackgroundScope]*backgroundWorkerLookup replaces the single global backgroundLookup. Each scope has its own named registry + catch-all so two blocks can declare bg workers with the same user-facing name without colliding. - buildBackgroundWorkerLookups iterates declarations into their scope's lookup; each declaration still owns its own registry. registry.declared remembers the *worker for a named declaration so lazy-start (num=0) reuses it without scanning the global workersByName map (which is not scope-aware for bg workers). - getLookup(thread) resolves the active scope from the calling thread: worker handler -> request context -> global (0). Scopes that declared their own workers stay strictly isolated; an empty scope falls through to the global lookup so embed-mode workers stay reachable. - Go options: WithWorkerBackgroundScope tags a declaration; the new WithRequestBackgroundScope tags a request so ensure() from a regular HTTP request resolves to the right block's lookup. - Caddy wiring: FrankenPHPModule.Provision allocates one scope per module instance (idempotent across re-provisions) and threads it into worker declarations and ServeHTTP. - workersByName collision check now skips bg workers; they resolve via their scope's lookup, so the same PHP-visible name can appear in two scopes without tripping the duplicate guard. - C side: go_frankenphp_ensure_background_worker now takes the calling thread index so getLookup can resolve the scope from the active handler / request context. Tests: - TestNextBackgroundWorkerScopeIsDistinct: counter hands out unique non-zero scopes. - TestBackgroundWorkerSameNameDifferentScope: two named bg workers with the same user-facing name in distinct scopes both Init successfully and own distinct registries. - TestBackgroundWorkerCatchAllPerScope: ensure() in scope A consumes scope A's catch-all only; scope B's catch-all stays empty. Verified by inspecting the per-scope lookup and the live workers slice via package-internal access. Deferred to follow-ups: pools (num > 1 per named worker, max_threads > 1 for named workers), multiple declarations sharing one entrypoint file in one scope, FRANKENPHP_WORKER_BACKGROUND server flag, batch ensure.

Lifts the remaining constraints on background workers: - Pools: named bg workers can now declare num > 1 (pool of threads per worker) and max_threads > 1. The Caddyfile-level rejections in unmarshalWorker are dropped. - Per-thread stop-pipe: the write fd moved from worker to handler. Each thread in a pool gets its own stop pipe, so drain() can wake them independently. Pools no longer overwrite one another's fd through the shared worker struct. - Multi-entrypoint: multiple named bg workers in the same scope can share the same entrypoint file. Drops the filename-uniqueness rejection in newWorker (it was already skipped via allowPathMatching, this lifts the last Caddyfile-level path check that prevented two named bg workers pointing at the same fixture). Tests: - TestBackgroundWorkerPool: declares num=3, asserts 3 distinct sentinel files appear (each thread tempnam()'s a unique file). - TestBackgroundWorkerMultiEntrypoint: two named bg workers share one entrypoint file; both Init successfully and produce sentinels.

Two small, related polish steps on the bg-worker surface, landing together: - frankenphp_ensure_background_worker now accepts string|array. The array form lazy-starts every named worker fire-and-forget, with the same semantics as the single-string call repeated N times. Input is validated up-front: empty arrays raise ValueError, non-string elements raise TypeError, empty-string and duplicate names raise ValueError. Validation happens before any worker is started so a bad input never leaves a half-spawned batch behind. - $_SERVER['FRANKENPHP_WORKER_BACKGROUND'] = true in background worker scripts, alongside the existing FRANKENPHP_WORKER_NAME wiring. Gives scripts a single-key branch for "am I a bg worker?" without having to probe other frankenphp_* helpers. Set unconditionally for bg workers (catch-all instances with no declared name still see the flag, just no name). ## Tests - TestEnsureBackgroundWorkerBatch: ensure(['a','b','c']) starts three catch-all-resolved instances; assert three per-name sentinels appear. - TestEnsureBackgroundWorkerBatchEmpty: [] raises ValueError. Driven through a PHP fixture that catches the throwable since the validation lives in the Zend parameter-parsing path. - TestEnsureBackgroundWorkerBatchNonString: ['ok-name', 42] raises TypeError, same fixture pattern. - TestEnsureBackgroundWorkerBatchDuplicate: ['dup','dup'] raises ValueError (duplicate names rejected, not silently deduped). - TestBackgroundWorkerBgFlag: bg worker writes var_export() of $_SERVER['FRANKENPHP_WORKER_BACKGROUND'] to a sentinel; assert the exact value is the bool true.

Adds the worker-to-HTTP shared-state surface deferred from the config+ensure split (php#2393): - frankenphp_set_vars(array $vars): void publishes a snapshot from a background worker. Persistent (pemalloc) memory, RWMutex-protected, cross-thread safe. Skips work when data is identical (=== check). - frankenphp_get_vars(string $name): array reads the latest snapshot. Pure read; throws if the worker is not running or has not published yet. - ensure_background_worker now blocks until the named worker has called set_vars at least once (the readiness signal). The fire-and-forget semantics from the config-only PR become a stronger contract here with no API change visible to callers. - Two-mode ensure: fail-fast in HTTP-worker bootstrap (before frankenphp_handle_request) so a broken dependency surfaces at boot rather than serving degraded traffic; tolerant inside requests so the restart-with-backoff cycle can recover from transient boot failures. - Boot-failure capture: the worker's last PHP error (message, file, line, exit status) is recorded so ensure() can throw a descriptive RuntimeException on timeout. The persistent storage path uses opcache-immutable arrays (zero-copy share), interned strings (no copy), and rich type support: null, scalars, arrays (nested), enums. Tests cover happy-path roundtrips, type coverage, ensure() blocking on first set_vars, fail-fast vs tolerant modes, boot-failure reporting, and the catch-all + scope interactions with vars.

Ninth step on top of php#2287's split. Adds a C-side per-request cache keyed on the background worker's vars version so repeated get_vars reads within one request run at O(1) and return the same HashTable pointer. ## What - __thread HashTable *bg_vars_cache maps worker name -> { version, cached_zval }. Initialized lazily on first get_vars call per request. Destroyed before php_request_shutdown tears down request memory, so the cached zvals are torn down while their backing request-memory structures are still alive. - go_frankenphp_get_vars grew callerVersion / outVersion out-params: - If callerVersion matches the live varsVersion, Go skips the deep copy entirely and only reports outVersion. The C side reuses its cached zval (with ZVAL_COPY for refcount bump). - If versions differ, Go runs the normal copy-under-RLock path and reports the fresh version for the caller to cache. - PHP_FUNCTION(frankenphp_get_vars) consults the cache before calling Go, then either reuses the cached zval (hit) or stores the fresh copy (miss). Identity is preserved: $vars === $prev_vars holds across reads within one request. ## Tests - TestGetVarsCacheIdentity: two reads in one request return the same zval (=== true). - TestGetVarsCacheManyReads: 500 reads in one script complete without memory corruption, proving the cache tear-down at request end is correct. All 16 existing bg worker tests still pass.

Covers the full public API landed across the preceding steps: the named/catch-all Caddyfile configuration, the two-mode frankenphp_ensure_background_worker() semantics (fail-fast at HTTP bootstrap, tolerant elsewhere) and its batch form, the pure-read frankenphp_get_vars(), frankenphp_set_vars() with its allowed value types (scalars, nested arrays, enum cases), the signaling stream via frankenphp_get_worker_handle(), and runtime behaviour (dedicated threads, $_SERVER flags, crash recovery with stale vars, 30-second grace period followed by force-kill, per-php_server scoping, and the pool / multi-entrypoint limits).

nicolas-grekas force-pushed the sidekicks branch 4 times, most recently from e1655ab to 867e9b3 Compare March 16, 2026 20:26

nicolas-grekas force-pushed the sidekicks branch 2 times, most recently from da54ab8 to a06ba36 Compare March 16, 2026 21:45

nicolas-grekas force-pushed the sidekicks branch 7 times, most recently from ad71bfe to 05e9702 Compare March 17, 2026 08:03

nicolas-grekas force-pushed the sidekicks branch from 05e9702 to 8a56d4c Compare March 17, 2026 08:34

nicolas-grekas force-pushed the sidekicks branch 3 times, most recently from cb65f46 to 4dda455 Compare March 17, 2026 10:46

nicolas-grekas force-pushed the sidekicks branch 4 times, most recently from b3734f5 to ed79f46 Compare March 17, 2026 11:48

This was referenced Apr 22, 2026

feat: cross-platform force-kill primitive for stuck PHP threads #2365

Merged

feat: persistent-zval helpers (deep-copy zval trees across threads) #2366

Merged

refactor: add drain() seam to threadHandler interface #2367

Merged

nicolas-grekas mentioned this pull request May 4, 2026

feat: background workers (config + ensure) #2393

Open

4 tasks

nicolas-grekas added 8 commits May 4, 2026 20:38

Conversation

nicolas-grekas commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

PHP API

Caddyfile configuration

Shutdown

Boot-failure reporting

Forward compatibility

Architecture

Example

Test coverage

Documentation

Uh oh!

AlliBalliBaba commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henderkes commented Mar 17, 2026

Uh oh!

nicolas-grekas commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dunglas commented Mar 17, 2026

Uh oh!

henderkes commented Mar 17, 2026

Uh oh!

nicolas-grekas commented Mar 17, 2026

Uh oh!

nicolas-grekas commented Mar 17, 2026

Uh oh!

nicolas-grekas commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexandre-daubois commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexandre-daubois commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolas-grekas commented Apr 21, 2026

Uh oh!

henderkes commented Apr 21, 2026

Uh oh!

alexandre-daubois commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dunglas commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexandre-daubois commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dunglas commented Apr 21, 2026

Uh oh!

henderkes commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolas-grekas commented Apr 21, 2026

Proposal (noted about #2319 also)

require instead of start

Mode-dependent semantics for require_background_worker

get_vars becomes pure-read everywhere

Usage

Why this works

Answering specific points from the thread

Uh oh!

dunglas commented Apr 22, 2026

Uh oh!

henderkes commented Apr 22, 2026

Uh oh!

henderkes commented Apr 22, 2026

Uh oh!

nicolas-grekas commented Apr 22, 2026

1. Rename require_background_worker → ensure_background_worker

2. Tolerant lazy-start inside frankenphp_handle_request

3. Multi-name ensure

4. Context cancellation on shutdown

5. Fix: ready-state accounting across crash-restart

Uh oh!

dunglas commented Apr 22, 2026

nicolas-grekas commented Mar 16, 2026 •

edited

Loading

AlliBalliBaba commented Mar 16, 2026 •

edited

Loading

nicolas-grekas commented Mar 17, 2026 •

edited

Loading

nicolas-grekas commented Mar 17, 2026 •

edited

Loading

alexandre-daubois commented Mar 17, 2026 •

edited

Loading

alexandre-daubois commented Apr 21, 2026 •

edited

Loading

alexandre-daubois commented Apr 21, 2026 •

edited

Loading

dunglas commented Apr 21, 2026 •

edited

Loading

alexandre-daubois commented Apr 21, 2026 •

edited

Loading

henderkes commented Apr 21, 2026 •

edited

Loading

`require` instead of `start`

Mode-dependent semantics for `require_background_worker`

`get_vars` becomes pure-read everywhere

1. Rename `require_background_worker` → `ensure_background_worker`

2. Tolerant lazy-start inside `frankenphp_handle_request`

3. Multi-name `ensure`

nicolas-grekas commented Apr 22, 2026 •

edited

Loading