Implement abortIsolate() in cloudflare:workers module#6237
Implement abortIsolate() in cloudflare:workers module#6237
Conversation
|
ResolveMessage: Cannot find module '@opencode-ai/plugin' from '/home/runner/work/workerd/workerd/.opencode/tools/bazel-deps.ts' |
|
@hoodmane Bonk workflow failed. Check the logs for details. View workflow run · To retry, trigger Bonk again. |
ba459c8 to
1f9a0ea
Compare
|
ResolveMessage: Cannot find module '@opencode-ai/plugin' from '/home/runner/work/workerd/workerd/.opencode/tools/bazel-deps.ts' |
f11cf40 to
2adf211
Compare
Factor out the Isolate -> Script -> Worker creation pipeline from makeWorkerImpl() into a new Server::createWorker() method. This separates the worker creation logic (inspector policy, module registry, fallback service, artifact bundler, script compilation, worker construction) from the validation checks and post-creation wiring that are specific to makeWorkerImpl(). The experimental feature flag checks (NMR, module fallback, Python/NMR incompatibility) remain in makeWorkerImpl() since they only need to run once at initial config validation time. No behavioral changes.
8769c83 to
abcc698
Compare
|
We need to be a bit careful with this as overuse could lead to excessive isolate creation, especially if requests are long-running (like WebSockets). For your use case would it be OK if we actually terminated the worker, erroring in-flight requests? I would be more comfortable with that as it can't cause a build-up of condemned isolates so easily. If that works for you, I would call it |
My thought was that this should behave in exactly the same way as when the worker allocates 2x the memory limit and gets condemned. That way, it doesn't offer any additional surface area to worry about.
Is that what allocating too much memory does? The current situation is that every subsequent request fails until the worker is retired, so whatever we do will be an improvement over that -- better for just the in-flight requests to error than for however many future requests to fail. |
|
The way the memory limit works is that if the worker hits 1x the memory limit it is condemned, which allows it to complete the current work but will tear down the isolate once that work is complete. New requests will go to a fresh non-condemned isolate. If, while cleaning up, the condemned work goes on to hit 2x the memory limit, it gets immediately destroyed. @kentonv's concern is that if we just have this condemn the isolate, we might have a case where we have a ton on condemned isolates all just finishing processing of a single request in the worst case and are constantly spinning up new ones to replace it which causes a large amount of churn. Instead, this should work exactly like hitting the 2x limit. All activity being processed by the current isolate is interrupted immediately and the isolate is destroyed. |
|
In the Python use case the isolate won't be useful for anything anyways so it's definitely better to immediately destroy it and return errors. |
Add abortIsolate(reason) API that terminates the current JS isolate and creates a fresh one from scratch, resetting all module-level state. When called, it immediately terminates all in-flight requests on the isolate. The next request creates a new Worker. The abort-all mechanism uses a shared ForkedPromise that each IoContext subscribes to via onLimitsExceeded(). When abortIsolate() is called, the promise is rejected, causing every IoContext on the isolate to abort. This mirrors how the production 2x memory limit kill works. The reason string is included in all error messages across all aborted requests.
abcc698 to
3f64ddb
Compare
|
Okay updated the PR to kill all in-flight requests. |
Add
abortIsolate(reason)API that terminates the current JS isolate andcreates a fresh one from scratch, resetting all module-level state. When
called, it immediately terminates all in-flight requests on the isolate.
The next request creates a new Worker.
The abort-all mechanism uses a shared
ForkedPromisethat eachIoContextsubscribes to via
onLimitsExceeded(). WhenabortIsolate()is called, thepromise is rejected, causing every
IoContexton the isolate to abort. Thismirrors how the production 2x memory limit kill works. The reason string is
included in all error messages across all aborted requests.