Skip to content

Implement abortIsolate() in cloudflare:workers module#6237

Open
hoodmane wants to merge 2 commits intomainfrom
hoodmane/reload-worker
Open

Implement abortIsolate() in cloudflare:workers module#6237
hoodmane wants to merge 2 commits intomainfrom
hoodmane/reload-worker

Conversation

@hoodmane
Copy link
Contributor

@hoodmane hoodmane commented Mar 4, 2026

Add abortIsolate(reason) API that terminates the current JS isolate and
creates a fresh one from scratch, resetting all module-level state. When
called, it immediately terminates all in-flight requests on the isolate.
The next request creates a new Worker.

The abort-all mechanism uses a shared ForkedPromise that each IoContext
subscribes to via onLimitsExceeded(). When abortIsolate() is called, the
promise is rejected, causing every IoContext on the isolate to abort. This
mirrors how the production 2x memory limit kill works. The reason string is
included in all error messages across all aborted requests.

@hoodmane hoodmane requested review from a team as code owners March 4, 2026 16:45
@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 4, 2026

ResolveMessage: Cannot find module '@opencode-ai/plugin' from '/home/runner/work/workerd/workerd/.opencode/tools/bazel-deps.ts'

github run

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 4, 2026

@hoodmane Bonk workflow failed. Check the logs for details.

View workflow run · To retry, trigger Bonk again.

@hoodmane hoodmane force-pushed the hoodmane/reload-worker branch from ba459c8 to 1f9a0ea Compare March 4, 2026 16:47
@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 4, 2026

ResolveMessage: Cannot find module '@opencode-ai/plugin' from '/home/runner/work/workerd/workerd/.opencode/tools/bazel-deps.ts'

github run

@hoodmane hoodmane force-pushed the hoodmane/reload-worker branch 3 times, most recently from f11cf40 to 2adf211 Compare March 4, 2026 16:54
Factor out the Isolate -> Script -> Worker creation pipeline from
makeWorkerImpl() into a new Server::createWorker() method. This
separates the worker creation logic (inspector policy, module registry,
fallback service, artifact bundler, script compilation, worker
construction) from the validation checks and post-creation wiring that
are specific to makeWorkerImpl().

The experimental feature flag checks (NMR, module fallback, Python/NMR
incompatibility) remain in makeWorkerImpl() since they only need to run
once at initial config validation time.

No behavioral changes.
@hoodmane hoodmane force-pushed the hoodmane/reload-worker branch 3 times, most recently from 8769c83 to abcc698 Compare March 4, 2026 17:11
@kentonv
Copy link
Member

kentonv commented Mar 4, 2026

We need to be a bit careful with this as overuse could lead to excessive isolate creation, especially if requests are long-running (like WebSockets).

For your use case would it be OK if we actually terminated the worker, erroring in-flight requests? I would be more comfortable with that as it can't cause a build-up of condemned isolates so easily.

If that works for you, I would call it abortIsolate(), consistent with the existing ctx.abort(). It should probably take a "reason" as a parameter, which will be thrown from all the in-flight requests.

@hoodmane
Copy link
Contributor Author

hoodmane commented Mar 5, 2026

overuse could lead to excessive isolate creation

My thought was that this should behave in exactly the same way as when the worker allocates 2x the memory limit and gets condemned. That way, it doesn't offer any additional surface area to worry about.

For your use case would it be OK if we actually terminated the worker, erroring in-flight requests?

Is that what allocating too much memory does? The current situation is that every subsequent request fails until the worker is retired, so whatever we do will be an improvement over that -- better for just the in-flight requests to error than for however many future requests to fail.

@jasnell
Copy link
Collaborator

jasnell commented Mar 5, 2026

The way the memory limit works is that if the worker hits 1x the memory limit it is condemned, which allows it to complete the current work but will tear down the isolate once that work is complete. New requests will go to a fresh non-condemned isolate. If, while cleaning up, the condemned work goes on to hit 2x the memory limit, it gets immediately destroyed. @kentonv's concern is that if we just have this condemn the isolate, we might have a case where we have a ton on condemned isolates all just finishing processing of a single request in the worst case and are constantly spinning up new ones to replace it which causes a large amount of churn.

Instead, this should work exactly like hitting the 2x limit. All activity being processed by the current isolate is interrupted immediately and the isolate is destroyed.

@hoodmane
Copy link
Contributor Author

hoodmane commented Mar 6, 2026

In the Python use case the isolate won't be useful for anything anyways so it's definitely better to immediately destroy it and return errors.

Add abortIsolate(reason) API that terminates the current JS isolate and
creates a fresh one from scratch, resetting all module-level state. When
called, it immediately terminates all in-flight requests on the isolate.
The next request creates a new Worker.

The abort-all mechanism uses a shared ForkedPromise that each IoContext
subscribes to via onLimitsExceeded(). When abortIsolate() is called, the
promise is rejected, causing every IoContext on the isolate to abort. This
mirrors how the production 2x memory limit kill works. The reason string is
included in all error messages across all aborted requests.
@hoodmane hoodmane force-pushed the hoodmane/reload-worker branch from abcc698 to 3f64ddb Compare March 6, 2026 13:32
@hoodmane
Copy link
Contributor Author

hoodmane commented Mar 6, 2026

Okay updated the PR to kill all in-flight requests.

@hoodmane hoodmane changed the title Implement resetWorker() in cloudflare:workers module Implement abortIsolate() in cloudflare:workers module Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants