Skip to content

feat: log cluster module IPC via onelogger#119

Closed
killagu wants to merge 5 commits into
masterfrom
claude/unruffled-mclean
Closed

feat: log cluster module IPC via onelogger#119
killagu wants to merge 5 commits into
masterfrom
claude/unruffled-mclean

Conversation

@killagu
Copy link
Copy Markdown
Contributor

@killagu killagu commented Apr 15, 2026

Summary

Add structured logging for every master↔app worker IPC point under Node.js cluster mode (process mode only), using onelogger as the logger facade.

Covers both layers of cluster IPC:

Application layer (always on)

Point Direction Where
A master → app send AppProcessWorker#send()
A' master → app send (w/ fd) master.ts sticky-session worker.instance.send('sticky-session:connection', connection)
B master ← app recv cluster.on('fork')worker.on('message', (msg, handle) => ...)
C app → master send static AppProcessWorker.send()
D app ← master recv new process.on('message', (msg, handle) => ...) in app_worker.ts

cluster 内部层 (opt-in via EGG_CLUSTER_IPC_LOG=1; very verbose — newconn fires once per HTTP request)

Point Direction Observes
E master ← app internal online / listening / queryServer / accepted (fd ack) / close — via worker.process.on('internalMessage')
F app ← master internal newconn (fd dispatch, handle=<TCP>) / disconnect / suicide — via process.on('internalMessage')

Note: internalMessage is an undocumented Node.js event but has been stable across major versions. E/F hook on worker.process (ChildProcess) rather than cluster.Worker, since the latter does not forward internalMessage.

Agent worker (child_process.fork) and worker_threads mode are intentionally out of scope — they are not the cluster module.

Users can replace the default console sink via onelogger.setLogger() / setCustomLogger().

Sample output

With EGG_CLUSTER_IPC_LOG=1 and a real HTTP request against the egg-ready fixture:

[master<-app#60303] action=cluster:online
[app#60303->master] action=realport data={"port":17010,"protocol":"http"}
[master<-app#60303] action=realport data={"port":17010,"protocol":"http"}
[master<-app#60303] action=cluster:queryServer data={"cmd":"NODE_CLUSTER",...}
[app#60303<-master] action=cluster:ack#1 data={"cmd":"NODE_CLUSTER","ack":1,...,"sockname":{...}}
[master<-app#60303] action=cluster:listening
[app#60303->master] action=app-start data={"address":{...},"workerId":60303}
[master<-app#60303] action=app-start data={"address":{...},"workerId":60303}
[master->app#60303] action=egg-pids data=[60300]
[app#60303<-master] action=egg-pids data=[60300]
[master->app#60303] action=egg-ready data={...}
[app#60303<-master] action=egg-ready data={...}
[master<-app#60303] action=cluster:ack#1 data={"cmd":"NODE_CLUSTER","ack":1,"accepted":true,"seq":3}   ← fd ack
[app#60303<-master] action=cluster:newconn data={"cmd":"NODE_CLUSTER","act":"newconn",...} +handle=<TCP>   ← fd dispatch

formatIpcMessage has a safe replacer: net.Socket / net.Server / Buffer / circular refs are collapsed to tokens; long strings are truncated at 200 chars; whole JSON payload is capped at 1024 chars. Handles get a +handle=<Socket fd=N> / <TCP> / <Server> suffix.

Test plan

  • tshy dual-build + attw --pack all green (node10 / node16-CJS / node16-ESM / bundler)
  • test/master.test.ts — 43/43 passing (core IPC lifecycle)
  • test/https.test.ts — 3/3 passing
  • test/agent_worker.test.ts — 12/12 passing
  • test/worker_threads.test.ts — 2/2 passing
  • Manual: startCluster on egg-ready fixture with EGG_CLUSTER_IPC_LOG=1 + real HTTP request; all 7 points (A/A'/B/C/D/E/F) print as expected, no Socket serialization errors
  • test/app_worker.test.ts — blocked by a pre-existing TS18048 error (let app2; + await app2.close(); in the finally block); unrelated to this change
  • test/options.test.ts — 19 passing / 2 pre-existing failures about fixture framework packages; unchanged by this commit (verified by re-running against git stashed code)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Breaking Changes

    • Renamed package from egg-cluster to @eggjs/cluster
    • Dropped Node.js < 18.19.0 support
    • Changed startCluster() from callback-based to promise-based API
  • New Features

    • Added full ESM and CommonJS dual-package support
    • Added TypeScript definitions and type safety
  • Bug Fixes

    • Fixed require support paths in cluster configuration

fengmk2 and others added 5 commits December 28, 2024 22:27
BREAKING CHANGE: drop Node.js < 18.19.0 support

part of eggjs/egg#3644

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes for @eggjs/cluster v3.0.0-beta.4

- **New Features**
  - Migrated project to TypeScript.
  - Added support for Node.js 18.19.0, 20, 22, and 23.
  - Enhanced type safety and module exports.
  - Improved worker thread and process management.
  - Introduced new error handling classes for better debugging.

- **Breaking Changes**
  - Renamed package from `egg-cluster` to `@eggjs/cluster`.
  - Updated import/export syntax to ES modules.
  - Minimum Node.js version is now 18.19.0.

- **Performance Improvements**
  - Refactored cluster and worker management.
  - Optimized error handling and logging.

- **Bug Fixes**
  - Resolved various edge cases in worker initialization.
  - Improved graceful shutdown mechanisms.

- **Documentation**
  - Updated README with new package name and usage examples.
  - Added TypeScript and ESM import examples.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
[skip ci]

## [3.0.0](v2.4.0...v3.0.0) (2024-12-28)

### ⚠ BREAKING CHANGES

* drop Node.js < 18.19.0 support

part of eggjs/egg#3644

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes for @eggjs/cluster v3.0.0-beta.4

- **New Features**
  - Migrated project to TypeScript.
  - Added support for Node.js 18.19.0, 20, 22, and 23.
  - Enhanced type safety and module exports.
  - Improved worker thread and process management.
  - Introduced new error handling classes for better debugging.

- **Breaking Changes**
  - Renamed package from `egg-cluster` to `@eggjs/cluster`.
  - Updated import/export syntax to ES modules.
  - Minimum Node.js version is now 18.19.0.

- **Performance Improvements**
  - Refactored cluster and worker management.
  - Optimized error handling and logging.

- **Bug Fixes**
  - Resolved various edge cases in worker initialization.
  - Improved graceful shutdown mechanisms.

- **Documentation**
  - Updated README with new package name and usage examples.
  - Added TypeScript and ESM import examples.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

### Features

* support cjs and esm both by tshy ([#117](#117)) ([e15a4bf](e15a4bf))
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

- **Dependencies**
  - Updated `@eggjs/utils` package to version 4.2.1

- **Enhancements**
  - Improved module loading mechanism with new `baseDir` option
  - Added support for specifying module search paths during import

- **Testing**
  - Enhanced agent worker test suite with additional module injection
  - Added module execution verification in tests

- **Deprecation Notice**
- HTTPS configuration now recommends object-based format with `key` and
`cert`

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
[skip ci]

## [3.0.1](v3.0.0...v3.0.1) (2024-12-30)

### Bug Fixes

* require support paths ([#118](#118)) ([b74872c](b74872c))
Add structured logging on every master<->app worker IPC point in Node.js
cluster mode (process mode only). Covers both application-layer messages
and the internal NODE_CLUSTER fd dispatch / fd ack.

Application layer (always on):
- master->app send (AppProcessWorker#send)
- master->app sticky-session Socket direct send (master.ts)
- master<-app recv (cluster.on 'fork' -> worker.on 'message')
- app->master send (static AppProcessWorker.send)
- app<-master recv (process.on 'message' in app_worker.ts)

Internal cluster layer (opt-in via EGG_CLUSTER_IPC_LOG=1, very verbose):
- master<-app internal (worker.process.on 'internalMessage'):
  online / listening / queryServer / accepted (fd ack) / close
- app<-master internal (process.on 'internalMessage'):
  newconn (fd dispatch, with handle) / disconnect / suicide

Users can replace the default console sink via onelogger.setLogger().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 15, 2026 03:59
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

📝 Walkthrough

Walkthrough

This pull request migrates the egg-cluster package from CommonJS/JavaScript to ESM/TypeScript, renames it to @eggjs/cluster, introduces a promise-based API, implements dual CommonJS/ESM builds via tshy, and updates all tests and configuration accordingly. The major version bumps from 2.4.0 to 3.0.1.

Changes

Cohort / File(s) Summary
Configuration & Package Management
.eslintrc, tsconfig.json, package.json, .gitignore
Updated ESLint presets to TypeScript-aware configs, added TypeScript strict compilation settings targeting ES2022, renamed package to @eggjs/cluster, bumped version to 3.0.1, introduced dual module exports (ESM dist/esm, CommonJS dist/commonjs), added tshy build tooling, updated all dependencies to @eggjs/* scope packages, and extended .gitignore with build artifacts.
CI/CD & Workflows
.github/workflows/nodejs.yml, .github/workflows/pkg.pr.new.yml, .github/PULL_REQUEST_TEMPLATE.md
Updated Node.js test matrix to include pinned Node 18.19.0 and add Node 23, added new Publish Any Commit workflow for pkg.pr.new, and removed pull request template.
Documentation
README.md, CHANGELOG.md
Updated all package references from egg-cluster to @eggjs/cluster, changed usage examples from callback-based to promise-based API, added ESM/TypeScript usage section, updated contributor links, and added v3.0.0 and v3.0.1 changelog entries documenting breaking changes (Node.js < 18.19.0 dropped) and new CJS/ESM support.
Main Entrypoint & Public API
src/index.ts
New promise-based startCluster(options) function replacing callback-style CommonJS index.js, with re-exports of Master, MasterOptions, cluster options types, and error classes.
Master Process Management
src/master.ts
Converted to TypeScript/ESM with async initialization, changed from EventEmitter to ReadyEventEmitter, replaced symbol-based internal state with private fields (#protocol, #realPort, #appAddress), updated all IPC messaging to use typed objects, added egg-pids message sending on app startup, and refactored control flow utilities with modern async/await patterns.
Worker Entrypoints
src/agent_worker.ts, src/app_worker.ts
New TypeScript entrypoint scripts (replacing lib/agent_worker.js and lib/app_worker.js) with dynamic module imports, error handling via new ClusterAgentWorkerError, graceful shutdown hooks, and support for both process and worker_threads modes via AgentWorker/AppWorker static APIs.
Error Classes
src/error/ClusterAgentWorkerError.ts, src/error/ClusterWorkerExceptionError.ts, src/error/index.ts
New TypeScript error classes extending Error for agent worker failures and cluster worker exceptions, with structured fields (id, workerId, status, count) and proper error cause chaining.
Base Worker Abstractions
src/utils/mode/base/agent.ts, src/utils/mode/base/app.ts
New abstract base classes BaseAgentWorker/BaseAppWorker and BaseAgentUtils/BaseAppUtils providing common interfaces for worker wrappers, lifecycle management, and IPC messaging, replacing deleted lib/utils/mode/base/* CommonJS modules.
Process-Based Worker Implementations
src/utils/mode/impl/process/agent.ts, src/utils/mode/impl/process/app.ts
New concrete implementations using Node child_process.fork and cluster module for agent and app workers, with proper error handling, message routing, and graceful termination (replacing deleted lib/utils/mode/impl/process/*).
Worker-Threads Implementations
src/utils/mode/impl/worker_threads/agent.ts, src/utils/mode/impl/worker_threads/app.ts
Updated worker-threads implementations to use TypeScript with explicit BaseAgentWorker<Worker> and BaseAppWorker<Worker> types, refactored message handling and lifecycle management, and renamed classes to AgentThreadWorker/AppThreadWorker.
Core Utilities
src/utils/options.ts, src/utils/terminate.ts, src/utils/messenger.ts, src/utils/worker_manager.ts, src/utils/ipc_logger.ts, src/dirname.ts
Converted CommonJS utilities to TypeScript/ESM with proper typing; parseOptions now async with dynamic module imports; terminate performs graceful SIGTERM→SIGKILL escalation; messenger uses typed MessageBody; worker_manager refactored to use Map and listWorkers(); added new ipcLogger for cluster IPC tracing and getSrcDirname() helper.
Test Suite Migration
test/*.test.ts, test/utils.ts, test/utils.js
Migrated all tests from CommonJS/JavaScript to TypeScript/ESM, replaced sleep(...) with scheduler.wait(...), switched from egg-mock to @eggjs/mock, updated worker ID assertions to use regex patterns, adjusted timing from ms to s in several delays, and created new test/utils.ts with cluster() and getFilepath() helpers (replaced deleted test/utils.js).
Test Fixtures
test/fixtures/apps/.../, test/fixtures/egg/index.js
Updated test fixtures: removed 'use strict' directives, replaced lib/utils/timer imports with node:timers/promises, updated app port access from _options.port to options.port, made Loader.loadConfig() async, added inject1.js fixture and plugin.js config, and refactored egg fixture exports to use named exports for startCluster, Application, Agent.

Sequence Diagram

sequenceDiagram
    participant Client
    participant startCluster as startCluster()
    participant Master
    participant OptionParser
    participant WorkerManager
    participant Messenger
    participant ProcessWorker as App/Agent Worker

    Client->>startCluster: startCluster(options)
    startCluster->>Master: new Master(options)
    Master->>OptionParser: parseOptions(options)
    OptionParser-->>Master: ParsedClusterOptions (with framework, baseDir, etc.)
    Master->>Master: `#start`() async initialization
    Master->>WorkerManager: create WorkerManager instance
    Master->>Messenger: create Messenger(master, workerManager)
    
    par Worker Spawning
        Master->>ProcessWorker: fork/spawn AppWorker(s)
        Master->>ProcessWorker: fork/spawn AgentWorker
    end
    
    ProcessWorker->>ProcessWorker: importModule(framework)
    ProcessWorker->>ProcessWorker: instantiate Application/Agent
    ProcessWorker->>ProcessWorker: app.ready() / agent.ready()
    ProcessWorker->>Messenger: send agent-start / app-start message
    Messenger->>Master: deliver IPC message
    Master->>WorkerManager: track worker status
    Master->>Master: emit ready()
    Master-->>startCluster: Promise resolved
    startCluster-->>Client: startup complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • feat: support cjs and esm both by tshy #117: Performs the same ESM/TypeScript migration with identical structural changes (removing lib/, adding src/, updating package.json/tsconfig), worker abstraction refactoring, and test suite rewrite.
  • fix: require support paths #118: Modifies agent/app worker startup logic and options import patterns with baseDir/paths for dynamic module loading, directly related to the worker initialization refactoring in this PR.

Suggested labels

enhancement, typescript

Poem

🐰 From callbacks to promises we hop,
TypeScript types make our code non-stop,
ESM threads and dual builds so bright,
The cluster hops forward into the light! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat: log cluster module IPC via onelogger' accurately describes the main objective: adding structured IPC logging to the cluster module via onelogger. The title is specific, concise, and clearly conveys the primary change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/unruffled-mclean
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch claude/unruffled-mclean

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the egg-cluster package to TypeScript, renames it to @eggjs/cluster, and updates the minimum Node.js version requirement to 18.19.0. The refactoring transitions the codebase to ES modules and organizes cluster management logic into TypeScript classes. Review feedback highlights a regression in AppThreadWorker where the exitCode is hardcoded to 0, which could mask actual failure states. Additionally, the review suggests several technical improvements, including specifying a radix for parseInt calls to ensure consistent behavior, addressing uninitialized properties in WorkerManager to satisfy strict TypeScript rules, and optimizing digit detection logic in the stickyWorker method.

Comment on lines +37 to 40
return 0;
// return this.instance.exitCode;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The exitCode getter is hardcoded to return 0, which ignores the actual exit status of the worker thread. This is a regression from the previous implementation and will cause incorrect exit codes to be logged in Master.onAppExit.

  get exitCode() {
    return this.instance.exitCode!;
  }

agent: BaseAgentWorker | null;
workers = new Map<number, BaseAppWorker>();
exception = 0;
timer: NodeJS.Timeout;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The timer property is not initialized in the constructor, which violates TypeScript's strict property initialization rules when strict: true is enabled in tsconfig.json. It should be marked as optional.

Suggested change
timer: NodeJS.Timeout;
timer?: NodeJS.Timeout;

Comment thread src/master.ts
let s = '';
for (let i = 0; i < ip.length; i++) {
if (!isNaN(ip[i])) {
if (!isNaN(parseInt(ip[i]))) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using parseInt followed by isNaN to identify digits in a string is inefficient. A direct character range comparison is faster and more idiomatic.

Suggested change
if (!isNaN(parseInt(ip[i]))) {
if (ip[i] >= '0' && ip[i] <= '9') {

Comment thread src/master.ts
Comment on lines +603 to +604
const appTimeout = parseInt(process.env.EGG_APP_CLOSE_TIMEOUT || legacyTimeout);
const agentTimeout = parseInt(process.env.EGG_AGENT_CLOSE_TIMEOUT || legacyTimeout);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Always specify a radix (e.g., 10) when using parseInt to avoid potential ambiguity and ensure consistent behavior across different environments.

Suggested change
const appTimeout = parseInt(process.env.EGG_APP_CLOSE_TIMEOUT || legacyTimeout);
const agentTimeout = parseInt(process.env.EGG_AGENT_CLOSE_TIMEOUT || legacyTimeout);
const appTimeout = parseInt(process.env.EGG_APP_CLOSE_TIMEOUT || legacyTimeout, 10);
const agentTimeout = parseInt(process.env.EGG_AGENT_CLOSE_TIMEOUT || legacyTimeout, 10);

Comment thread src/utils/options.ts
}

if (options.port && typeof options.port === 'string') {
options.port = parseInt(options.port);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Specify a radix of 10 for parseInt to adhere to best practices and avoid ambiguity.

Suggested change
options.port = parseInt(options.port);
options.port = parseInt(options.port, 10);

Comment thread src/utils/options.ts
}

if (options.workers && typeof options.workers === 'string') {
options.workers = parseInt(options.workers);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Specify a radix of 10 for parseInt to adhere to best practices and avoid ambiguity.

Suggested change
options.workers = parseInt(options.workers);
options.workers = parseInt(options.workers, 10);

Comment thread src/utils/terminate.ts
let childrenPids: number[] = [];
try {
const children = await pstree(pid);
childrenPids = children!.map(c => parseInt(c.PID));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Specify a radix of 10 for parseInt when mapping process IDs.

Suggested change
childrenPids = children!.map(c => parseInt(c.PID));
childrenPids = children!.map(c => parseInt(c.PID, 10));

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds structured, opt-in/always-on IPC logging for master↔worker communication in cluster process mode, using onelogger as the logging facade, and updates the codebase/tooling toward TS/ESM + refreshed tests/fixtures to support the new behavior.

Changes:

  • Introduce ipc_logger utilities and instrument master/app-worker IPC send/recv paths (plus optional internal NODE_CLUSTER traffic via EGG_CLUSTER_IPC_LOG).
  • Refactor core cluster runtime to TypeScript/ESM-oriented structure (new src/* entrypoints, base worker abstractions, updated terminate/options).
  • Migrate/adjust tests and fixtures to TS + new mocks/utilities and updated behaviors.

Reviewed changes

Copilot reviewed 62 out of 64 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tsconfig.json Adds TS config (strict, NodeNext) for the repo.
test/worker_threads.test.ts Migrates test to ESM/TS and new cluster() helper.
test/utils.ts New TS test helper for cluster + fixture paths.
test/utils.js Removes legacy CJS test helper.
test/options.test.ts Migrates options tests to TS/ESM and async parseOptions().
test/options.test.js Removes legacy CJS options tests.
test/master.test.ts Migrates master tests to TS/ESM and updates timings/requests.
test/https.test.ts Migrates HTTPS tests to TS/ESM and updated cluster helper usage.
test/fixtures/egg/index.js Adjusts fixture framework exports to satisfy new framework validation.
test/fixtures/apps/worker-close-timeout/app.js Replaces custom sleep with timers/promises wait.
test/fixtures/apps/worker-close-timeout/agent.js Replaces custom sleep with timers/promises wait.
test/fixtures/apps/script-start/start-server.js Replaces custom sleep with timers/promises wait.
test/fixtures/apps/options/config/plugin.js Adds fixture plugin disables for deterministic tests.
test/fixtures/apps/options/app.js Removes strict mode boilerplate; keeps fixture behavior.
test/fixtures/apps/options/agent.js Removes strict mode boilerplate; keeps fixture behavior.
test/fixtures/apps/framework/lib/framework.js Updates fixture loader to async loadConfig().
test/fixtures/apps/framework/lib/agent.js Simplifies fixture agent import pattern.
test/fixtures/apps/framework/index.js Updates fixture framework exports.
test/fixtures/apps/before-close/app.js Replaces custom sleep with timers/promises wait.
test/fixtures/apps/before-close/agent.js Replaces custom sleep with timers/promises wait.
test/fixtures/apps/app-listen-without-port/config/config.default.js Removes strict mode boilerplate in fixture config.
test/fixtures/apps/app-listen-port/config/config.default.js Removes strict mode boilerplate in fixture config.
test/fixtures/apps/app-listen-port/app/router.js Updates fixture to read ctx.app.options.port.
test/fixtures/apps/agent-debug-port/inject1.js Adds injection fixture used by updated tests.
test/fixtures/apps/agent-debug-port/agent.js Removes strict mode boilerplate.
test/app_worker.test.ts Migrates app worker tests to TS/ESM and updates expectations/timing.
test/agent_worker.test.ts Migrates agent worker tests to TS/ESM and updates expectations.
src/utils/worker_manager.ts Refactors manager to typed WorkerManager with worker listing helpers.
src/utils/terminate.ts New TS terminate helper using @fengmk2/ps-tree.
src/utils/options.ts New async TS parseOptions() + typings + validation.
src/utils/mode/impl/worker_threads/app.ts TS/ESM refactor of worker_threads app worker utils.
src/utils/mode/impl/worker_threads/agent.ts TS/ESM refactor of worker_threads agent worker utils.
src/utils/mode/impl/process/app.ts New process-mode app worker utils + IPC logging hooks.
src/utils/mode/impl/process/agent.ts New process-mode agent worker utils.
src/utils/mode/base/app.ts New base abstractions for app workers/utils.
src/utils/mode/base/agent.ts New base abstractions for agent workers/utils.
src/utils/messenger.ts TS refactor + message typing + routing tweaks.
src/utils/ipc_logger.ts Implements onelogger-backed IPC formatter + safe stringification.
src/master.ts Major TS/ESM refactor; integrates new worker utils and IPC logging for sticky dispatch.
src/index.ts New TS entrypoint exporting startCluster and public types.
src/error/index.ts Central export for new error types.
src/error/ClusterWorkerExceptionError.ts Introduces typed exception error for unhealthy cluster state.
src/error/ClusterAgentWorkerError.ts Introduces typed wrapper for agent worker errors.
src/dirname.ts Helper to resolve src dirname under ESM/CJS contexts.
src/app_worker.ts New TS app worker runtime + process-mode IPC recv logging and internalMessage logging.
src/agent_worker.ts New TS agent worker runtime.
package.json Renames package, updates engines/exports, adds onelogger, updates scripts/build.
lib/utils/timer.js Removes legacy CJS sleep helper.
lib/utils/terminate.js Removes legacy CJS terminate implementation.
lib/utils/options.js Removes legacy CJS options parser.
lib/utils/mode/impl/process/app.js Removes legacy CJS process app utils.
lib/utils/mode/impl/process/agent.js Removes legacy CJS process agent utils.
lib/utils/mode/base/app.js Removes legacy CJS base app abstractions.
lib/utils/mode/base/agent.js Removes legacy CJS base agent abstractions.
lib/app_worker.js Removes legacy CJS app worker runtime.
lib/agent_worker.js Removes legacy CJS agent worker runtime.
index.js Removes legacy CJS entrypoint.
README.md Updates docs for new package name and async/ESM usage.
CHANGELOG.md Adds v3.x release notes content.
.gitignore Ignores new build artifacts/cache output.
.github/workflows/pkg.pr.new.yml Adds pkg-pr-new workflow to publish preview builds.
.github/workflows/nodejs.yml Updates Node test matrix to 18.19+ and newer versions.
.github/PULL_REQUEST_TEMPLATE.md Removes old PR template.
.eslintrc Switches to TS-focused eslint config + node: prefix rule.
Comments suppressed due to low confidence (6)

src/master.ts:421

  • In onAppExit(), the log/error message uses worker.exitCode instead of the code provided in the exit event payload (data.code). For process workers, worker.instance.process.exitCode can be null or differ from the actual code argument, which makes the reported exit code unreliable. Use data.code (and consider falling back to worker.exitCode only if needed).
    src/utils/mode/impl/worker_threads/agent.ts:85
  • worker_threads Worker emits exit with only the exit code; it does not provide a signal argument. The current listener signature (code, signal) and forwarding signal to master will produce undefined at runtime and misleads the type system. Update the listener to accept only code and either omit signal in the message or set it explicitly (e.g., null / empty string) and adjust the master-side types accordingly.
    src/utils/mode/impl/worker_threads/app.ts:129
  • The worker_threads exit event only provides an exit code; signal is not available. The app-exit message emitted here omits signal, but master currently types signal as required and uses it for debug logic/log formatting. Consider including signal: null/'' explicitly (or updating the master handler/types to make signal optional) to avoid undefined values and inconsistent behavior between modes.
    src/utils/mode/impl/worker_threads/app.ts:75
  • gracefulExit() hooks process.on('exit', async ...) and awaits beforeExit(). Node's exit event does not allow asynchronous work to complete (the event loop is already shutting down), so beforeExit() promises will not reliably run to completion. Consider using beforeExit/signal handlers (or graceful-process) instead, and avoid calling process.exit() from inside an exit handler.
    src/utils/mode/impl/worker_threads/agent.ts:35
  • gracefulExit() uses process.on('exit', async ...) and awaits beforeExit(), but the exit event cannot perform async cleanup reliably. This can cause beforeExit() (e.g. agent.close()) to be skipped/truncated in worker_threads mode. Prefer beforeExit/signal handlers (or reuse graceful-process if supported) and avoid calling process.exit() from inside an exit handler.
    src/utils/mode/impl/worker_threads/app.ts:39
  • AppThreadWorker.exitCode is currently hard-coded to 0, which means master logs and any exit handling will always report a successful exit even when the thread exits with a non-zero code. Persist the actual exit code from the Worker exit event (e.g., store it on the wrapper) and return that here.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/utils/options.ts
Comment on lines +139 to +144
if (options.port && typeof options.port === 'string') {
options.port = parseInt(options.port);
}
if (options.port === null) {
options.port = undefined;
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseOptions() parses string ports with parseInt(), but does not handle NaN results. If a non-numeric string is provided, options.port becomes NaN (type number), which will bypass typeof port !== 'number' checks later and can cause server.listen(NaN) failures. After parsing, normalize NaN to undefined and/or validate that the port is a finite integer.

Copilot uses AI. Check for mistakes.
Comment thread src/utils/terminate.ts
Comment on lines +84 to +94
function getUnterminatedProcesses(pids: number[]) {
return pids.filter(pid => {
try {
// success means it's still alive
process.kill(pid, 0);
return true;
} catch (err) {
// error means it's dead
debug('kill %s error: %s, it still alive', pid, err);
return false;
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In getUnterminatedProcesses(), the comment says an error from process.kill(pid, 0) means the process is dead, but the debug log message says "it still alive". Also, kill(pid, 0) can throw EPERM even when the process is alive; treating all errors as "dead" can leave processes unterminated. Fix the debug message and consider handling EPERM as "still alive" (and other errors like ESRCH as "dead").

Copilot uses AI. Check for mistakes.
Comment thread package.json
Comment on lines 69 to +73
"engines": {
"node": ">= 14.0.0"
}
"node": ">= 18.19.0"
},
"type": "module",
"tshy": {
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/test plan mentions running builds/tests on Node 10/16, but this change sets engines.node to >= 18.19.0 and CI matrix to 18+/23. Please update the PR description/test plan to match the supported Node versions (or adjust engines/CI if Node 10/16 support is still intended).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
test/master.test.ts (1)

318-324: ⚠️ Potential issue | 🟡 Minor

Type mismatch: alive() expects number but receives string | undefined.

The alive function signature at line 984 expects a number, but it's called with values extracted from regex matches which are string | undefined. This will cause the function to always return false (or throw) because process.kill() coerces its first argument.

🔧 Proposed fix
-function alive(pid: number) {
+function alive(pid: string | number | undefined) {
+  if (pid === undefined) return false;
   try {
     // success means it's still alive
-    process.kill(pid, 0);
+    process.kill(Number(pid), 0);
     return true;
   } catch (err) {
     // error means it's dead
     return false;
   }
 }

Also applies to: 352-357, 984-993

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/master.test.ts` around lines 318 - 324, The calls to alive() are passing
values extracted from regex matches (pid1, pid2 from app.stdout.match) which are
string | undefined; convert/validate these to numbers before calling alive().
Update the test to parse the captured PID strings (e.g. Number(pid1) or
parseInt(pid1, 10)) and guard against undefined (skip assertion or fail the test
with a clear message) so alive() receives a number; reference the regex match
usage for worker1/worker2 and the alive() function to locate where to change the
calls.
src/utils/messenger.ts (1)

100-109: ⚠️ Potential issue | 🟡 Minor

Potential null dereference when agent is not available.

Line 104 uses non-null assertion getAgent()!.workerId, but getAgent() can return null if no agent worker has been set yet. This could throw during early message routing.

🛡️ Proposed defensive fix
     const receiverWorkerId = data.receiverWorkerId ?? data.receiverPid;
     if (receiverWorkerId) {
       if (receiverWorkerId === String(process.pid)) {
         data.to = 'master';
-      } else if (receiverWorkerId === String(this.#workerManager.getAgent()!.workerId)) {
+      } else if (this.#workerManager.getAgent() &&
+                 receiverWorkerId === String(this.#workerManager.getAgent()!.workerId)) {
         data.to = 'agent';
       } else {
         data.to = 'app';
       }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/messenger.ts` around lines 100 - 109, The message routing code
risks a null dereference by using getAgent()!; modify the block that computes
receiverWorkerId and sets data.to to first retrieve const agent =
this.#workerManager.getAgent(); then compare receiverWorkerId to
String(process.pid) first, then to String(agent?.workerId) only if agent is
non-null (or use agent?.workerId and ensure the equality check handles
undefined), and otherwise fall back to 'app'; ensure you do not use the non-null
assertion getAgent()! and handle the case where agent is null so no exception is
thrown during early routing.
src/utils/mode/impl/worker_threads/agent.ts (2)

27-35: ⚠️ Potential issue | 🟠 Major

Same async beforeExit issue as in app.ts.

The process.on('exit') callback cannot perform async operations - the await has no effect and the process will exit immediately. See the fix suggested for src/utils/mode/impl/worker_threads/app.ts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/mode/impl/worker_threads/agent.ts` around lines 27 - 35, The
current gracefulExit in gracefulExit(options) incorrectly registers an async
handler on process.on('exit') where await has no effect; change it to listen for
the 'beforeExit' event instead and perform the async call there (i.e.,
process.on('beforeExit', async code => { if (typeof beforeExit === 'function')
await beforeExit(); })), and remove the call to process.exit from that handler
to avoid re-entrancy; keep references to the gracefulExit method and the
beforeExit option when making this change.

75-85: ⚠️ Potential issue | 🟡 Minor

Remove incorrect signal parameter from Worker exit event handler.

The exit event on Node.js worker_threads.Worker only provides exitCode, not a signal parameter. The signal parameter in the handler will always be undefined, causing confusion when this undefined value is sent to the master process.

🔧 Proposed fix
    worker.once('exit', (code: number, signal: string) => {
+    worker.once('exit', (code: number) => {
       this.messenger.send({
         action: 'agent-exit',
         data: {
           code,
-          signal,
+          signal: null,
         },
         to: 'master',
         from: 'agent',
       });
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/mode/impl/worker_threads/agent.ts` around lines 75 - 85, The worker
exit handler is using a non-existent `signal` parameter; update the
`worker.once('exit', ...)` callback in agent.ts to accept the correct `exitCode`
parameter (e.g., `(exitCode: number)`) and remove `signal` from the message
payload—send only the exit code (use `exitCode` in the data object) when calling
`this.messenger.send` for the 'agent-exit' action.
🧹 Nitpick comments (13)
.gitignore (1)

15-15: Remove duplicate ignore entry for coverage artifacts.

coverage is already listed at Line 2, so Line 15 is redundant and can be removed to keep the ignore file clean.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore at line 15, The .gitignore contains a duplicate entry for
"coverage" (already present earlier); remove the redundant "coverage" line shown
in the diff to keep the file clean and avoid duplicate ignore entries.
test/utils.ts (1)

14-20: Preserve default execArgv when options.opt is provided.

...options at Line 19 can overwrite the opt object from Lines 14-17 entirely. Merge opt explicitly to keep defaults stable.

♻️ Proposed refactor
 export function cluster(baseDir: string, options: MockClusterOptions = {}) {
+  const { opt, ...rest } = options;
   return mm.cluster({
     baseDir,
     framework: path.join(__dirname, 'fixtures/egg'),
     // eggPath: path.join(__dirname, '../node_modules/egg'),
     cache: false,
     opt: {
       // clear execArgv from egg-bin
-      execArgv: [],
+      execArgv: [],
+      ...opt,
     },
     // override `@eggjs/mock` default port 17001
-    ...options,
+    ...rest,
   });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/utils.ts` around lines 14 - 20, The current spread of ...options can
overwrite the default opt object (which sets execArgv: [])—to preserve defaults
merge opt explicitly: build the call so opt is constructed as opt: { execArgv:
[], ...(options?.opt || {}) } and then spread the rest of ...options without
allowing options.opt to replace it; ensure you remove opt from the top-level
options spread (or spread options after extracting opt) so execArgv default
remains unless explicitly overridden.
src/utils/terminate.ts (1)

91-93: Debug message text is inverted.

In this catch path, the process is considered dead, but Line 92 logs “it still alive”.

📝 Suggested tweak
-      debug('kill %s error: %s, it still alive', pid, err);
+      debug('kill %s error: %s, process is already dead', pid, err);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/terminate.ts` around lines 91 - 93, The debug log in the catch path
is inverted: update the debug call that currently reads debug('kill %s error:
%s, it still alive', pid, err) to accurately reflect that the process is dead
(e.g., debug('kill %s error: %s, it is not alive', pid, err) or 'it is dead'),
keeping the same parameters (pid, err) so the message and error details remain
logged; locate the debug invocation in terminate.ts and replace the text
accordingly.
src/app_worker.ts (1)

155-157: Falsy check on stickyWorkerPort may reject port 0.

Using options.stickyWorkerPort in a boolean context will treat port 0 as falsy. While port 0 is rarely used intentionally (it means "pick any available port"), this could cause unexpected behavior.

Consider using explicit undefined check if port 0 should be valid:

if (options.sticky && options.stickyWorkerPort !== undefined) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app_worker.ts` around lines 155 - 157, The current conditional uses a
falsy check on options.stickyWorkerPort which will reject port 0; change the
condition around the server.listen invocation to explicitly check for undefined
(e.g. use options.sticky && options.stickyWorkerPort !== undefined) so port 0 is
allowed, and then call server.listen(options.stickyWorkerPort, '127.0.0.1') as
before; update the check near the server.listen call that references
options.sticky and options.stickyWorkerPort to use an explicit undefined
comparison.
test/master.test.ts (1)

467-494: Commented-out test should be removed or documented.

This entire test block is commented out without explanation. If it's intentionally disabled due to a known issue, add a TODO or FIXME comment explaining why. Otherwise, consider removing dead code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/master.test.ts` around lines 467 - 494, The commented-out test block
labelled "egg-script exit" in test/master.test.ts is dead code; either remove
the entire commented block or re-enable it with a clear rationale by adding a
TODO/FIXME explaining why it's disabled and linking to an issue/PR; if you
choose to document it, add a concise comment above the block mentioning the
reason (e.g., failing on CI, platform-specific behavior) and reference the test
components like appDir, start-server.js, errLogPath and the asserted stderr
check so future readers know the context.
test/options.test.ts (1)

221-233: Remove malformed commented-out test block.

This commented-out code has a nested it() inside another it() which is invalid Mocha syntax. This appears to be leftover from incomplete refactoring and should be removed.

♻️ Proposed fix
-  // it('should exist when specify baseDir', () => {
-  //   it('should get egg by default but not exist', () => {
-  //     const baseDir = path.join(__dirname, 'noexist');
-  //     try {
-  //       parseOptions({
-  //         baseDir,
-  //       });
-  //       throw new Error('should not run');
-  //     } catch (err) {
-  //       assert(err.message === `${path.join(baseDir, 'package.json')} should exist`);
-  //     }
-  //   });
-  // });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/options.test.ts` around lines 221 - 233, Remove the malformed
commented-out Mocha block that nests an `it()` inside another `it()` in the
tests around `parseOptions`; specifically delete the entire commented section
containing the nested `it()`s (the block that references `baseDir`,
`parseOptions`, and the `${path.join(baseDir, 'package.json')} should exist`
assertion) so the test file no longer contains invalid commented test syntax, or
replace it with a single well-formed test that calls `parseOptions({ baseDir })`
and asserts the expected error.
src/utils/messenger.ts (1)

112-122: Consider using else-if chain for mutually exclusive routing.

The default routing logic uses separate if statements. While this works because only one from value matches at a time, an else if chain would make the mutual exclusivity explicit and slightly more efficient.

♻️ Suggested refactor
     // default from -> to rules
     if (!data.to) {
       if (data.from === 'agent') {
         data.to = 'app';
-      }
-      if (data.from === 'app') {
+      } else if (data.from === 'app') {
         data.to = 'agent';
-      }
-      if (data.from === 'parent') {
+      } else if (data.from === 'parent') {
         data.to = 'master';
       }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/messenger.ts` around lines 112 - 122, The routing block that sets
data.to based on data.from uses three separate if statements which are mutually
exclusive; update that block to an if / else if / else if chain so only one
branch executes (e.g., if (data.from === 'agent') { data.to = 'app'; } else if
(data.from === 'app') { data.to = 'agent'; } else if (data.from === 'parent') {
data.to = 'master'; }) to make the intent explicit and slightly more
efficient—modify the code around the data.to assignment in
src/utils/messenger.ts accordingly.
src/utils/mode/impl/process/agent.ts (1)

46-51: Consider adding IPC logging for agent worker messages.

The PR objective mentions adding IPC logging for master↔app worker communication via ipcLogger. For consistency, agent worker IPC (AgentProcessWorker#send, process.send, and message handlers at lines 77-86) could also benefit from IPC logging. Currently, src/utils/mode/impl/process/app.ts imports and uses ipcLogger for app workers but agent workers don't have equivalent logging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/mode/impl/process/agent.ts` around lines 46 - 51, Add IPC logging
for agent worker messages by importing and using the existing ipcLogger in
src/utils/mode/impl/process/agent.ts; instrument AgentProcessWorker#send (the
method that wraps child.send) to log outgoing messages and instrument the agent
process message handlers (the process.on('message') callbacks around lines
77-86) and any direct process.send calls to log incoming messages, mirroring the
pattern used in app.ts (use the same log format/context) so master↔agent IPC is
recorded consistently.
src/utils/options.ts (2)

115-122: Deprecation warning may be suppressed by later code.

The deprecation warning at line 117 is helpful, but lines 161-163 set process.env.NO_DEPRECATION = '*' in production, which would suppress this warning anyway. Consider whether this warning should be logged via the logger instead to ensure it's visible in production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/options.ts` around lines 115 - 122, The console.warn in options
handling (the options.https === true branch) can be suppressed by
process.env.NO_DEPRECATION set later; replace the console.warn with a logger
warning so it always appears in production logs: locate the options.https block
in src/utils/options.ts and change the deprecation message to use the existing
application logger (or export/accept a processLogger) to emit a warning (e.g.,
logger.warn or processLogger.warn) instead of console.warn, preserving the same
message and leave the options.https assignment (options.key, options.cert)
intact.

159-163: Setting NO_DEPRECATION='*' globally may have unintended side effects.

Setting this environment variable globally suppresses all deprecation warnings from Node.js and all npm packages in the process. This could hide important deprecation notices from dependencies. Consider limiting the scope or documenting this behavior.

💡 Alternative: use a more targeted approach

If the intent is only to suppress egg-cluster's own deprecation messages, consider using a package-specific value:

  // don't print deprecated message in production env.
  // it will print to stderr.
  if (process.env.NODE_ENV === 'production') {
-   process.env.NO_DEPRECATION = '*';
+   process.env.NO_DEPRECATION = process.env.NO_DEPRECATION || '@eggjs/cluster';
  }

Or document clearly that this affects all packages.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/options.ts` around lines 159 - 163, The current code in options.ts
sets process.env.NO_DEPRECATION = '*' under the NODE_ENV === 'production' check
which suppresses all deprecation warnings globally; change this to a targeted
approach by either removing the global assignment or replacing '*' with a
package-specific value (e.g., the egg-cluster identifier) and/or gating it
behind a more specific flag so only egg-cluster deprecations are suppressed;
update the logic around the NODE_ENV check and the assignment to
process.env.NO_DEPRECATION and add a brief comment documenting the scope and
reason for the suppression.
src/utils/mode/impl/process/app.ts (1)

157-163: Non-null assertion on cluster.workers and worker lookup.

The cluster.workers! and cluster.workers![id]! non-null assertions assume workers exist. While this is likely safe during shutdown (workers should exist), consider adding a guard for defensive coding.

💡 Defensive check
  async kill(timeout: number) {
-   await Promise.all(Object.keys(cluster.workers!).map(id => {
-     const worker = cluster.workers![id]!;
+   const workers = cluster.workers ?? {};
+   await Promise.all(Object.keys(workers).map(id => {
+     const worker = workers[id];
+     if (!worker) return;
      Reflect.set(worker, 'disableRefork', true);
      return terminate(worker.process, timeout);
    }));
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/mode/impl/process/app.ts` around lines 157 - 163, The kill method
assumes cluster.workers and each worker entry are non-null; change it to
defensively handle missing workers by first checking if cluster.workers is
truthy and returning early (or using an empty object) and then verifying each
worker is defined before calling Reflect.set and terminate; reference the kill
function, cluster.workers, the per-worker lookup (worker), Reflect.set(...,
'disableRefork', true), and terminate(worker.process, timeout) so you only call
these on confirmed non-null worker objects.
src/master.ts (2)

602-604: Consider adding 10 radix to parseInt calls.

While modern JavaScript defaults to base 10 for decimal strings, explicitly passing the radix is a best practice for clarity and to avoid edge cases with leading zeros.

💡 Add explicit radix
-    const appTimeout = parseInt(process.env.EGG_APP_CLOSE_TIMEOUT || legacyTimeout);
-    const agentTimeout = parseInt(process.env.EGG_AGENT_CLOSE_TIMEOUT || legacyTimeout);
+    const appTimeout = parseInt(process.env.EGG_APP_CLOSE_TIMEOUT || legacyTimeout, 10);
+    const agentTimeout = parseInt(process.env.EGG_AGENT_CLOSE_TIMEOUT || legacyTimeout, 10);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/master.ts` around lines 602 - 604, The parseInt calls for appTimeout and
agentTimeout should explicitly pass a radix to avoid ambiguity: update the
parseInt invocations that set appTimeout (currently
parseInt(process.env.EGG_APP_CLOSE_TIMEOUT || legacyTimeout)) and agentTimeout
(currently parseInt(process.env.EGG_AGENT_CLOSE_TIMEOUT || legacyTimeout)) to
include 10 as the second argument (e.g., parseInt(..., 10)); ensure both
occurrences are changed and leave legacyTimeout assignment unchanged.

279-291: Sticky worker selection logic is correct but fragile.

The algorithm extracts digits from the IP address to determine worker assignment. The non-null assertion on getWorker() at line 290 assumes the calculated pid always exists in the worker map. If workers are being killed/restarted, this could fail.

💡 Add fallback for missing worker
   stickyWorker(ip: string) {
     const workerNumbers = this.options.workers;
     const ws = this.workerManager.listWorkerIds();
+    if (ws.length === 0) {
+      return null;
+    }

     let s = '';
     for (let i = 0; i < ip.length; i++) {
       if (!isNaN(parseInt(ip[i]))) {
         s += ip[i];
       }
     }
-    const pid = ws[Number(s) % workerNumbers];
-    return this.workerManager.getWorker(pid)!;
+    const index = ws.length > 0 ? Number(s) % ws.length : 0;
+    const pid = ws[index];
+    return this.workerManager.getWorker(pid);
   }

Note: The caller at line 268 would also need to handle null case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/master.ts` around lines 279 - 291, The stickyWorker method currently
computes a pid from digits in the IP and force-asserts a worker with
this.workerManager.getWorker(pid)! which can throw if that pid was removed;
change stickyWorker to check the result of getWorker(pid) and return null (or a
defined fallback worker from this.workerManager.listWorkerIds(), e.g.,
round-robin or first available) when getWorker returns undefined, and remove the
non-null assertion; update callers (the caller that invokes stickyWorker) to
handle a null return. Ensure references: stickyWorker, this.options.workers,
this.workerManager.listWorkerIds(), this.workerManager.getWorker(pid) are used
to locate and implement the safe fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 21-48: Remove the auto-generated "## Summary by CodeRabbit" block
and the verbose beta-oriented release notes beneath it (the section starting
with "## Release Notes for `@eggjs/cluster` v3.0.0-beta.4" through the closing
auto-generated comment), and replace it with a concise, final-release changelog
entry that omits beta wording and redundant bullets; locate the block by the
header "## Summary by CodeRabbit" and the subsection title "Release Notes for
`@eggjs/cluster` v3.0.0-beta.4" and trim or rewrite those lines to a short,
trustable summary.

In `@README.md`:
- Around line 40-41: The README’s startup section uses “callback” and awkward
grammar (“when application has started”, “will exit when catch an error”);
change it to explicit Promise-based wording that is grammatically correct and
clear. Replace references to a “callback” with a Promise-based description such
as: the startup function returns a Promise that resolves when the application
has started and rejects if startup fails; update the sentence “master process
will exit when catch an error” to something like “the master process will exit
if the Promise rejects due to a startup error,” and ensure examples and verbs
are present-tense and grammatically correct.

In `@src/agent_worker.ts`:
- Around line 54-71: The agent.ready callback currently swallows an error;
modify it to route startup errors through the existing startErrorHandler: inside
the agent.ready((err?: Error) => { ... }) callback, if err is truthy call
startErrorHandler(err) and return (so the same cleanup and exit path is taken),
otherwise proceed to removeListener('error', startErrorHandler) and call
AgentWorker.send({ action: 'agent-start', to: 'master' }); this ensures errors
reported via agent.ready are handled the same as agent.once('error',
startErrorHandler').

In `@src/app_worker.ts`:
- Around line 117-124: Move the port validation to occur before sending the
"realport" message: ensure the code that validates the port variable (the
numeric check currently around the port validation logic) runs before calling
AppWorker.send({ action: 'realport', data: { port, protocol } }). Replace or
reuse the existing validation logic (the block that checks port is a
number/valid range) to guard the send; if validation fails, do not call
AppWorker.send and handle the error/early return accordingly so master never
receives invalid port data.

In `@src/index.ts`:
- Around line 17-18: The public entrypoint startCluster should accept an
optional options parameter to match Master; change the function signature of
startCluster to take options?: ClusterOptions (instead of required) and pass
that value into new Master(options).ready(), ensuring callers can call
startCluster() with no args; update any related exported types if necessary to
reflect the optional parameter.

In `@src/master.ts`:
- Around line 462-467: onAppStart currently dereferences the result of
this.workerManager.getWorker(data.workerId) without checking for undefined (same
as onAppExit); update onAppStart to guard against a missing worker: call const
worker = this.workerManager.getWorker(data.workerId); if (!worker) { debug or
this.logger.warn(...) and return; } then use worker safely (or use optional
chaining for non-critical logs). Ensure you reference the getWorker call and the
onAppStart method so the guard covers all subsequent uses of worker.
- Around line 407-414: The onAppExit handler currently uses a non-null assertion
when fetching the worker with this.workerManager.getWorker(data.workerId) which
can return undefined; update onAppExit to guard against a missing worker: call
const worker = this.workerManager.getWorker(data.workerId) and if (!worker)
return (or log and return) before any use of worker so you avoid a null
dereference in onAppExit and related logic that assumes the worker exists.

In `@src/utils/ipc_logger.ts`:
- Around line 12-16: Replace the current truthy check for EGG_CLUSTER_IPC_LOG
used by internalIpcLogEnabled with an explicit boolean parse: implement a small
parser (e.g., normalize with toLowerCase().trim()) that treats "1", "true",
"yes" as true and "0", "false", "no", "" as false, and set internalIpcLogEnabled
from that parser instead of using !!process.env.EGG_CLUSTER_IPC_LOG; reference
the internalIpcLogEnabled export and any new helper (e.g., parseEnvBool) so the
behavior is deterministic and avoids treating "0" or "false" strings as true.

In `@src/utils/mode/base/app.ts`:
- Around line 66-78: Update the error messages in the BaseAppWorker static
methods so they reference the correct class name: change the thrown Error
messages in static send, static kill, and static gracefulExit to say
"BaseAppWorker should implement ..." instead of "BaseAgentWorker should
implement ..."; locate the three methods (send, kill, gracefulExit) in the
BaseAppWorker class and correct the string constants passed to new Error(...)
accordingly.

In `@src/utils/mode/impl/worker_threads/app.ts`:
- Around line 141-144: The code uses a non-null assertion on this.options.port
when adding to ports; instead, check for undefined and handle it explicitly:
after const ports = this.options.ports ?? []; if (!ports.length) { if
(this.options.port !== undefined) { ports.push(this.options.port); } else {
throw new Error("No port specified: options.port is undefined (parseOptions may
have returned undefined)"); } } — reference symbols: ports, this.options.ports,
this.options.port, parseOptions.
- Around line 68-75: The current gracefulExit implementation registers a
process.on('exit') handler in gracefulExit(gracefulExitOptions) which attempts
to await options.beforeExit() but cannot because 'exit' handlers run
synchronously; change the implementation to listen for the 'beforeExit' event
(or explicitly handle termination signals like SIGINT/SIGTERM and SIGQUIT) so
the async options.beforeExit() can be awaited before calling process.exit(code).
Update the listener registration in gracefulExit to use process.on('beforeExit',
async (code) => { if (typeof options.beforeExit === 'function') await
options.beforeExit(); process.exit(code); }) or add signal handlers that await
options.beforeExit() and then call process.exit, referencing the gracefulExit
function and the options.beforeExit callback.
- Around line 36-39: The exitCode getter in app.ts currently returns 0; change
it to return a stored worker exit code by adding a private field (e.g.,
this._exitCode) and updating that field inside the worker 'exit' event handler
(the handler already receives the code). Then make get exitCode() return
this._exitCode (falling back to 0 or undefined if unset). Update any references
to this.instance if needed, but ensure the 'exit' event handler for the worker
sets this._exitCode so master.ts reading worker.exitCode gets the real value.

In `@src/utils/terminate.ts`:
- Around line 49-58: The code can skip collecting live child PIDs when timeout
<= checkInterval because unterminated is initialized empty and the loop may
never run; before calling kill(...) ensure you compute the current list of live
children by invoking getUnterminatedProcesses(childrenPids) one final time (or
initialize unterminated with that result before the loop) so that
kill(unterminated, 'SIGKILL') is only called with the actual remaining PIDs;
update the logic around unterminated, the while loop using
start/timeout/checkInterval, and the final kill call to use that fresh result.
- Around line 25-36: Replace the unreliable check of subProcess.killed with an
explicit "didExit" flag set from the 'exit' event promise: attach a listener or
await once(subProcess, 'exit') to set didExit = true before sending SIGKILL,
then after the Promise.race check didExit to decide whether to escalate to
(subProcess.process ?? subProcess).kill('SIGKILL');; reference the existing
subProcess, once(subProcess, 'exit'), sleep(timeout), kill, and killed symbols
and ensure the 'exit' event is the authoritative source for termination rather
than subProcess.killed.

In `@test/worker_threads.test.ts`:
- Around line 18-19: The test "should exit when emit error during agent worker
boot" no longer exercises the worker_threads path because the cluster invocation
omits the startMode override; update the cluster call in the test (the line
assigning app = cluster('apps/agent-worker-threads-error')) to include the
option { startMode: 'worker_threads' } so the agent runs in worker_threads mode
(or switch to the correct fixture that explicitly uses worker_threads), ensuring
the test actually triggers the worker_threads code path.

---

Outside diff comments:
In `@src/utils/messenger.ts`:
- Around line 100-109: The message routing code risks a null dereference by
using getAgent()!; modify the block that computes receiverWorkerId and sets
data.to to first retrieve const agent = this.#workerManager.getAgent(); then
compare receiverWorkerId to String(process.pid) first, then to
String(agent?.workerId) only if agent is non-null (or use agent?.workerId and
ensure the equality check handles undefined), and otherwise fall back to 'app';
ensure you do not use the non-null assertion getAgent()! and handle the case
where agent is null so no exception is thrown during early routing.

In `@src/utils/mode/impl/worker_threads/agent.ts`:
- Around line 27-35: The current gracefulExit in gracefulExit(options)
incorrectly registers an async handler on process.on('exit') where await has no
effect; change it to listen for the 'beforeExit' event instead and perform the
async call there (i.e., process.on('beforeExit', async code => { if (typeof
beforeExit === 'function') await beforeExit(); })), and remove the call to
process.exit from that handler to avoid re-entrancy; keep references to the
gracefulExit method and the beforeExit option when making this change.
- Around line 75-85: The worker exit handler is using a non-existent `signal`
parameter; update the `worker.once('exit', ...)` callback in agent.ts to accept
the correct `exitCode` parameter (e.g., `(exitCode: number)`) and remove
`signal` from the message payload—send only the exit code (use `exitCode` in the
data object) when calling `this.messenger.send` for the 'agent-exit' action.

In `@test/master.test.ts`:
- Around line 318-324: The calls to alive() are passing values extracted from
regex matches (pid1, pid2 from app.stdout.match) which are string | undefined;
convert/validate these to numbers before calling alive(). Update the test to
parse the captured PID strings (e.g. Number(pid1) or parseInt(pid1, 10)) and
guard against undefined (skip assertion or fail the test with a clear message)
so alive() receives a number; reference the regex match usage for
worker1/worker2 and the alive() function to locate where to change the calls.

---

Nitpick comments:
In @.gitignore:
- Line 15: The .gitignore contains a duplicate entry for "coverage" (already
present earlier); remove the redundant "coverage" line shown in the diff to keep
the file clean and avoid duplicate ignore entries.

In `@src/app_worker.ts`:
- Around line 155-157: The current conditional uses a falsy check on
options.stickyWorkerPort which will reject port 0; change the condition around
the server.listen invocation to explicitly check for undefined (e.g. use
options.sticky && options.stickyWorkerPort !== undefined) so port 0 is allowed,
and then call server.listen(options.stickyWorkerPort, '127.0.0.1') as before;
update the check near the server.listen call that references options.sticky and
options.stickyWorkerPort to use an explicit undefined comparison.

In `@src/master.ts`:
- Around line 602-604: The parseInt calls for appTimeout and agentTimeout should
explicitly pass a radix to avoid ambiguity: update the parseInt invocations that
set appTimeout (currently parseInt(process.env.EGG_APP_CLOSE_TIMEOUT ||
legacyTimeout)) and agentTimeout (currently
parseInt(process.env.EGG_AGENT_CLOSE_TIMEOUT || legacyTimeout)) to include 10 as
the second argument (e.g., parseInt(..., 10)); ensure both occurrences are
changed and leave legacyTimeout assignment unchanged.
- Around line 279-291: The stickyWorker method currently computes a pid from
digits in the IP and force-asserts a worker with
this.workerManager.getWorker(pid)! which can throw if that pid was removed;
change stickyWorker to check the result of getWorker(pid) and return null (or a
defined fallback worker from this.workerManager.listWorkerIds(), e.g.,
round-robin or first available) when getWorker returns undefined, and remove the
non-null assertion; update callers (the caller that invokes stickyWorker) to
handle a null return. Ensure references: stickyWorker, this.options.workers,
this.workerManager.listWorkerIds(), this.workerManager.getWorker(pid) are used
to locate and implement the safe fallback.

In `@src/utils/messenger.ts`:
- Around line 112-122: The routing block that sets data.to based on data.from
uses three separate if statements which are mutually exclusive; update that
block to an if / else if / else if chain so only one branch executes (e.g., if
(data.from === 'agent') { data.to = 'app'; } else if (data.from === 'app') {
data.to = 'agent'; } else if (data.from === 'parent') { data.to = 'master'; })
to make the intent explicit and slightly more efficient—modify the code around
the data.to assignment in src/utils/messenger.ts accordingly.

In `@src/utils/mode/impl/process/agent.ts`:
- Around line 46-51: Add IPC logging for agent worker messages by importing and
using the existing ipcLogger in src/utils/mode/impl/process/agent.ts; instrument
AgentProcessWorker#send (the method that wraps child.send) to log outgoing
messages and instrument the agent process message handlers (the
process.on('message') callbacks around lines 77-86) and any direct process.send
calls to log incoming messages, mirroring the pattern used in app.ts (use the
same log format/context) so master↔agent IPC is recorded consistently.

In `@src/utils/mode/impl/process/app.ts`:
- Around line 157-163: The kill method assumes cluster.workers and each worker
entry are non-null; change it to defensively handle missing workers by first
checking if cluster.workers is truthy and returning early (or using an empty
object) and then verifying each worker is defined before calling Reflect.set and
terminate; reference the kill function, cluster.workers, the per-worker lookup
(worker), Reflect.set(..., 'disableRefork', true), and terminate(worker.process,
timeout) so you only call these on confirmed non-null worker objects.

In `@src/utils/options.ts`:
- Around line 115-122: The console.warn in options handling (the options.https
=== true branch) can be suppressed by process.env.NO_DEPRECATION set later;
replace the console.warn with a logger warning so it always appears in
production logs: locate the options.https block in src/utils/options.ts and
change the deprecation message to use the existing application logger (or
export/accept a processLogger) to emit a warning (e.g., logger.warn or
processLogger.warn) instead of console.warn, preserving the same message and
leave the options.https assignment (options.key, options.cert) intact.
- Around line 159-163: The current code in options.ts sets
process.env.NO_DEPRECATION = '*' under the NODE_ENV === 'production' check which
suppresses all deprecation warnings globally; change this to a targeted approach
by either removing the global assignment or replacing '*' with a
package-specific value (e.g., the egg-cluster identifier) and/or gating it
behind a more specific flag so only egg-cluster deprecations are suppressed;
update the logic around the NODE_ENV check and the assignment to
process.env.NO_DEPRECATION and add a brief comment documenting the scope and
reason for the suppression.

In `@src/utils/terminate.ts`:
- Around line 91-93: The debug log in the catch path is inverted: update the
debug call that currently reads debug('kill %s error: %s, it still alive', pid,
err) to accurately reflect that the process is dead (e.g., debug('kill %s error:
%s, it is not alive', pid, err) or 'it is dead'), keeping the same parameters
(pid, err) so the message and error details remain logged; locate the debug
invocation in terminate.ts and replace the text accordingly.

In `@test/master.test.ts`:
- Around line 467-494: The commented-out test block labelled "egg-script exit"
in test/master.test.ts is dead code; either remove the entire commented block or
re-enable it with a clear rationale by adding a TODO/FIXME explaining why it's
disabled and linking to an issue/PR; if you choose to document it, add a concise
comment above the block mentioning the reason (e.g., failing on CI,
platform-specific behavior) and reference the test components like appDir,
start-server.js, errLogPath and the asserted stderr check so future readers know
the context.

In `@test/options.test.ts`:
- Around line 221-233: Remove the malformed commented-out Mocha block that nests
an `it()` inside another `it()` in the tests around `parseOptions`; specifically
delete the entire commented section containing the nested `it()`s (the block
that references `baseDir`, `parseOptions`, and the `${path.join(baseDir,
'package.json')} should exist` assertion) so the test file no longer contains
invalid commented test syntax, or replace it with a single well-formed test that
calls `parseOptions({ baseDir })` and asserts the expected error.

In `@test/utils.ts`:
- Around line 14-20: The current spread of ...options can overwrite the default
opt object (which sets execArgv: [])—to preserve defaults merge opt explicitly:
build the call so opt is constructed as opt: { execArgv: [], ...(options?.opt ||
{}) } and then spread the rest of ...options without allowing options.opt to
replace it; ensure you remove opt from the top-level options spread (or spread
options after extracting opt) so execArgv default remains unless explicitly
overridden.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 720a8da4-1985-411a-a004-b718cb9ed576

📥 Commits

Reviewing files that changed from the base of the PR and between 103df69 and 4d1c26f.

📒 Files selected for processing (64)
  • .eslintrc
  • .github/PULL_REQUEST_TEMPLATE.md
  • .github/workflows/nodejs.yml
  • .github/workflows/pkg.pr.new.yml
  • .gitignore
  • CHANGELOG.md
  • README.md
  • index.js
  • lib/agent_worker.js
  • lib/app_worker.js
  • lib/utils/mode/base/agent.js
  • lib/utils/mode/base/app.js
  • lib/utils/mode/impl/process/agent.js
  • lib/utils/mode/impl/process/app.js
  • lib/utils/options.js
  • lib/utils/terminate.js
  • lib/utils/timer.js
  • package.json
  • src/agent_worker.ts
  • src/app_worker.ts
  • src/dirname.ts
  • src/error/ClusterAgentWorkerError.ts
  • src/error/ClusterWorkerExceptionError.ts
  • src/error/index.ts
  • src/index.ts
  • src/master.ts
  • src/utils/ipc_logger.ts
  • src/utils/messenger.ts
  • src/utils/mode/base/agent.ts
  • src/utils/mode/base/app.ts
  • src/utils/mode/impl/process/agent.ts
  • src/utils/mode/impl/process/app.ts
  • src/utils/mode/impl/worker_threads/agent.ts
  • src/utils/mode/impl/worker_threads/app.ts
  • src/utils/options.ts
  • src/utils/terminate.ts
  • src/utils/worker_manager.ts
  • test/agent_worker.test.ts
  • test/app_worker.test.ts
  • test/fixtures/apps/agent-debug-port/agent.js
  • test/fixtures/apps/agent-debug-port/inject1.js
  • test/fixtures/apps/app-listen-port/app/router.js
  • test/fixtures/apps/app-listen-port/config/config.default.js
  • test/fixtures/apps/app-listen-without-port/config/config.default.js
  • test/fixtures/apps/before-close/agent.js
  • test/fixtures/apps/before-close/app.js
  • test/fixtures/apps/framework/index.js
  • test/fixtures/apps/framework/lib/agent.js
  • test/fixtures/apps/framework/lib/framework.js
  • test/fixtures/apps/options/agent.js
  • test/fixtures/apps/options/app.js
  • test/fixtures/apps/options/config/plugin.js
  • test/fixtures/apps/script-start/start-server.js
  • test/fixtures/apps/worker-close-timeout/agent.js
  • test/fixtures/apps/worker-close-timeout/app.js
  • test/fixtures/egg/index.js
  • test/https.test.ts
  • test/master.test.ts
  • test/options.test.js
  • test/options.test.ts
  • test/utils.js
  • test/utils.ts
  • test/worker_threads.test.ts
  • tsconfig.json
💤 Files with no reviewable changes (18)
  • test/fixtures/apps/options/agent.js
  • .github/PULL_REQUEST_TEMPLATE.md
  • test/fixtures/apps/app-listen-without-port/config/config.default.js
  • test/fixtures/apps/agent-debug-port/agent.js
  • test/fixtures/apps/options/app.js
  • test/fixtures/apps/app-listen-port/config/config.default.js
  • lib/utils/timer.js
  • index.js
  • test/utils.js
  • lib/utils/options.js
  • lib/utils/terminate.js
  • test/options.test.js
  • lib/agent_worker.js
  • lib/utils/mode/impl/process/app.js
  • lib/utils/mode/impl/process/agent.js
  • lib/utils/mode/base/app.js
  • lib/utils/mode/base/agent.js
  • lib/app_worker.js

Comment thread CHANGELOG.md
Comment on lines +21 to +48
## Summary by CodeRabbit

## Release Notes for @eggjs/cluster v3.0.0-beta.4

- **New Features**
- Migrated project to TypeScript.
- Added support for Node.js 18.19.0, 20, 22, and 23.
- Enhanced type safety and module exports.
- Improved worker thread and process management.
- Introduced new error handling classes for better debugging.

- **Breaking Changes**
- Renamed package from `egg-cluster` to `@eggjs/cluster`.
- Updated import/export syntax to ES modules.
- Minimum Node.js version is now 18.19.0.

- **Performance Improvements**
- Refactored cluster and worker management.
- Optimized error handling and logging.

- **Bug Fixes**
- Resolved various edge cases in worker initialization.
- Improved graceful shutdown mechanisms.

- **Documentation**
- Updated README with new package name and usage examples.
- Added TypeScript and ESM import examples.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Trim the auto-generated summary block from the changelog.

This section is very verbose and includes beta-oriented wording inside final release notes, which can make the changelog harder to trust/scan.

Suggested cleanup
-## Summary by CodeRabbit
-
-## Release Notes for `@eggjs/cluster` v3.0.0-beta.4
-
-- **New Features**
-  - Migrated project to TypeScript.
-  - Added support for Node.js 18.19.0, 20, 22, and 23.
-  - Enhanced type safety and module exports.
-  - Improved worker thread and process management.
-  - Introduced new error handling classes for better debugging.
-
-- **Breaking Changes**
-  - Renamed package from `egg-cluster` to `@eggjs/cluster`.
-  - Updated import/export syntax to ES modules.
-  - Minimum Node.js version is now 18.19.0.
-
-- **Performance Improvements**
-  - Refactored cluster and worker management.
-  - Optimized error handling and logging.
-
-- **Bug Fixes**
-  - Resolved various edge cases in worker initialization.
-  - Improved graceful shutdown mechanisms.
-
-- **Documentation**
-  - Updated README with new package name and usage examples.
-  - Added TypeScript and ESM import examples.
🧰 Tools
🪛 LanguageTool

[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ess management. - Introduced new error handling classes for better debugging. ...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 21 - 48, Remove the auto-generated "## Summary by
CodeRabbit" block and the verbose beta-oriented release notes beneath it (the
section starting with "## Release Notes for `@eggjs/cluster` v3.0.0-beta.4"
through the closing auto-generated comment), and replace it with a concise,
final-release changelog entry that omits beta wording and redundant bullets;
locate the block by the header "## Summary by CodeRabbit" and the subsection
title "Release Notes for `@eggjs/cluster` v3.0.0-beta.4" and trim or rewrite those
lines to a short, trustable summary.

Comment thread README.md
Comment on lines +40 to +41
You can specify a callback that will be invoked when application has started.
However, master process will exit when catch an error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify Promise wording in startup description.

Current text says “callback” and has awkward grammar; this section reads clearer if it explicitly describes Promise-based completion/error handling.

Suggested wording
-You can specify a callback that will be invoked when application has started.
-However, master process will exit when catch an error.
+You can handle startup completion from the returned Promise.
+The master process exits if startup throws an error.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
You can specify a callback that will be invoked when application has started.
However, master process will exit when catch an error.
You can handle startup completion from the returned Promise.
The master process exits if startup throws an error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 40 - 41, The README’s startup section uses “callback”
and awkward grammar (“when application has started”, “will exit when catch an
error”); change it to explicit Promise-based wording that is grammatically
correct and clear. Replace references to a “callback” with a Promise-based
description such as: the startup function returns a Promise that resolves when
the application has started and rejects if startup fails; update the sentence
“master process will exit when catch an error” to something like “the master
process will exit if the Promise rejects due to a startup error,” and ensure
examples and verbs are present-tense and grammatically correct.

Comment thread src/agent_worker.ts
Comment on lines +54 to +71
function startErrorHandler(err: Error) {
consoleLogger.error(err);
consoleLogger.error('[agent_worker] start error, exiting with code:1');
AgentWorker.kill();
}

agent.ready((err?: Error) => {
// don't send started message to master when start error
if (err) {
return;
}

agent.removeListener('error', startErrorHandler);
AgentWorker.send({ action: 'agent-start', to: 'master' });
});

// exit if agent start error
agent.once('error', startErrorHandler);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Route agent.ready(err) through the same failure path.

The callback on Lines 60-64 drops err on the floor. If a framework reports startup failure via ready(err) without also emitting 'error', this worker never sends agent-start and never exits, so startup hangs until the master timeout.

Suggested fix
+  let startupFailed = false;
+
   function startErrorHandler(err: Error) {
+    if (startupFailed) return;
+    startupFailed = true;
     consoleLogger.error(err);
     consoleLogger.error('[agent_worker] start error, exiting with code:1');
     AgentWorker.kill();
   }

   agent.ready((err?: Error) => {
     // don't send started message to master when start error
     if (err) {
-      return;
+      startErrorHandler(err);
+      return;
     }

     agent.removeListener('error', startErrorHandler);
     AgentWorker.send({ action: 'agent-start', to: 'master' });
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent_worker.ts` around lines 54 - 71, The agent.ready callback currently
swallows an error; modify it to route startup errors through the existing
startErrorHandler: inside the agent.ready((err?: Error) => { ... }) callback, if
err is truthy call startErrorHandler(err) and return (so the same cleanup and
exit path is taken), otherwise proceed to removeListener('error',
startErrorHandler) and call AgentWorker.send({ action: 'agent-start', to:
'master' }); this ensures errors reported via agent.ready are handled the same
as agent.once('error', startErrorHandler').

Comment thread src/app_worker.ts
Comment on lines +117 to +124
AppWorker.send({
to: 'master',
action: 'realport',
data: {
port,
protocol,
},
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Port validation occurs after realport message is sent.

The worker sends the realport message to master (lines 117-124) before validating that port is a number (lines 172-177). If port validation fails, master has already received potentially invalid port information.

♻️ Proposed fix - validate port before sending realport message
     const port = app.options.port = options.port || listenConfig.port;
     const debugPort = options.debugPort;
+
+    // Validate port early before notifying master
+    if (!listenConfig.path && !options.sticky && typeof port !== 'number') {
+      consoleLogger.error('[app_worker:%s] port should be number, but got %s(%s)',
+        process.pid, port, typeof port);
+      exitProcess();
+      return;
+    }
+
     const protocol = (httpsOptions.key && httpsOptions.cert) ? 'https' : 'http';
     debug('[app_worker:%s] listenConfig: %j, real port: %o, protocol: %o, debugPort: %o',
       process.pid, listenConfig, port, protocol, debugPort);

     AppWorker.send({

Also applies to: 172-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app_worker.ts` around lines 117 - 124, Move the port validation to occur
before sending the "realport" message: ensure the code that validates the port
variable (the numeric check currently around the port validation logic) runs
before calling AppWorker.send({ action: 'realport', data: { port, protocol } }).
Replace or reuse the existing validation logic (the block that checks port is a
number/valid range) to guard the send; if validation fails, do not call
AppWorker.send and handle the error/early return accordingly so master never
receives invalid port data.

Comment thread src/index.ts
Comment on lines +17 to +18
export async function startCluster(options: ClusterOptions) {
await new Master(options).ready();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep options optional in the public entrypoint.

Master already accepts options?: ClusterOptions, but startCluster now requires an argument. That unnecessarily narrows the root API and will break existing TypeScript callers that rely on default startup behavior.

Suggested fix
-export async function startCluster(options: ClusterOptions) {
+export async function startCluster(options?: ClusterOptions) {
   await new Master(options).ready();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/index.ts` around lines 17 - 18, The public entrypoint startCluster should
accept an optional options parameter to match Master; change the function
signature of startCluster to take options?: ClusterOptions (instead of required)
and pass that value into new Master(options).ready(), ensuring callers can call
startCluster() with no args; update any related exported types if necessary to
reflect the optional parameter.

Comment on lines +68 to 75
static gracefulExit(options: gracefulExitOptions) {
process.on('exit', async code => {
if (typeof beforeExit === 'function') {
await beforeExit();
if (typeof options.beforeExit === 'function') {
await options.beforeExit();
}
process.exit(code);
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Async beforeExit callback not properly awaited before process.exit.

The gracefulExit handler calls await options.beforeExit() but then immediately calls process.exit(code). The process.on('exit') callback runs synchronously, so the await has no effect - the process will exit before beforeExit completes.

🐛 Proposed fix
   static gracefulExit(options: gracefulExitOptions) {
-    process.on('exit', async code => {
-      if (typeof options.beforeExit === 'function') {
-        await options.beforeExit();
-      }
-      process.exit(code);
-    });
+    // Use SIGTERM/SIGINT handlers for async cleanup, not 'exit' event
+    const cleanup = async (signal: string) => {
+      if (typeof options.beforeExit === 'function') {
+        await options.beforeExit();
+      }
+      process.exit(0);
+    };
+    process.on('SIGTERM', () => cleanup('SIGTERM'));
+    process.on('SIGINT', () => cleanup('SIGINT'));
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static gracefulExit(options: gracefulExitOptions) {
process.on('exit', async code => {
if (typeof beforeExit === 'function') {
await beforeExit();
if (typeof options.beforeExit === 'function') {
await options.beforeExit();
}
process.exit(code);
});
}
static gracefulExit(options: gracefulExitOptions) {
// Use SIGTERM/SIGINT handlers for async cleanup, not 'exit' event
const cleanup = async (signal: string) => {
if (typeof options.beforeExit === 'function') {
await options.beforeExit();
}
process.exit(0);
};
process.on('SIGTERM', () => cleanup('SIGTERM'));
process.on('SIGINT', () => cleanup('SIGINT'));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/mode/impl/worker_threads/app.ts` around lines 68 - 75, The current
gracefulExit implementation registers a process.on('exit') handler in
gracefulExit(gracefulExitOptions) which attempts to await options.beforeExit()
but cannot because 'exit' handlers run synchronously; change the implementation
to listen for the 'beforeExit' event (or explicitly handle termination signals
like SIGINT/SIGTERM and SIGQUIT) so the async options.beforeExit() can be
awaited before calling process.exit(code). Update the listener registration in
gracefulExit to use process.on('beforeExit', async (code) => { if (typeof
options.beforeExit === 'function') await options.beforeExit();
process.exit(code); }) or add signal handlers that await options.beforeExit()
and then call process.exit, referencing the gracefulExit function and the
options.beforeExit callback.

Comment on lines +141 to 144
const ports = this.options.ports ?? [];
if (!ports.length) {
ports.push(this.options.port);
ports.push(this.options.port!);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Non-null assertion on potentially undefined options.port.

Line 143 uses this.options.port! but the port may be undefined based on parseOptions behavior (which returns undefined when no port is specified and HTTPS is not enabled).

🛡️ Proposed fix
     const ports = this.options.ports ?? [];
     if (!ports.length) {
-      ports.push(this.options.port!);
+      if (this.options.port !== undefined) {
+        ports.push(this.options.port);
+      } else {
+        throw new Error('Either ports or port must be specified for worker_threads mode');
+      }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const ports = this.options.ports ?? [];
if (!ports.length) {
ports.push(this.options.port);
ports.push(this.options.port!);
}
const ports = this.options.ports ?? [];
if (!ports.length) {
if (this.options.port !== undefined) {
ports.push(this.options.port);
} else {
throw new Error('Either ports or port must be specified for worker_threads mode');
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/mode/impl/worker_threads/app.ts` around lines 141 - 144, The code
uses a non-null assertion on this.options.port when adding to ports; instead,
check for undefined and handle it explicitly: after const ports =
this.options.ports ?? []; if (!ports.length) { if (this.options.port !==
undefined) { ports.push(this.options.port); } else { throw new Error("No port
specified: options.port is undefined (parseOptions may have returned
undefined)"); } } — reference symbols: ports, this.options.ports,
this.options.port, parseOptions.

Comment thread src/utils/terminate.ts
Comment on lines +25 to +36
(subProcess.process ?? subProcess).kill('SIGTERM');
await Promise.race([
once(subProcess, 'exit'),
sleep(timeout),
]);
if (subProcess.killed) {
return;
}
// SIGKILL: http://man7.org/linux/man-pages/man7/signal.7.html
// worker: https://github.com/nodejs/node/blob/master/lib/internal/cluster/worker.js#L22
// subProcess.kill is wrapped to subProcess.destroy, it will wait to disconnected.
(subProcess.process ?? subProcess).kill('SIGKILL');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In Node.js child_process, does ChildProcess.killed mean the process has exited, or only that a signal was successfully sent?

💡 Result:

ChildProcess.killed means that a signal was successfully sent via subprocess.kill, not that the process has exited.

Citations:


🏁 Script executed:

cat -n src/utils/terminate.ts | sed -n '20,40p'

Repository: eggjs/cluster

Length of output: 983


subProcess.killed cannot determine if the process actually exited; it only indicates a signal was sent.

At line 30, the code incorrectly uses killed to skip SIGKILL escalation. After the Promise.race() times out, the process may still be alive even though killed is true. The 'exit' event must be explicitly checked to confirm termination; otherwise, SIGKILL may never be sent.

🐛 Proposed fix
 async function killProcess(subProcess: SubProcess, timeout: number) {
-  // https://github.com/nodejs/node/pull/34312
-  (subProcess.process ?? subProcess).kill('SIGTERM');
-  await Promise.race([
-    once(subProcess, 'exit'),
-    sleep(timeout),
-  ]);
-  if (subProcess.killed) {
-    return;
-  }
+  const target = subProcess.process ?? subProcess;
+  const targetPid = target.pid;
+  if (!targetPid) return;
+
+  try {
+    target.kill('SIGTERM');
+  } catch {
+    return;
+  }
+
+  const exited = await Promise.race([
+    once(subProcess, 'exit').then(() => true),
+    sleep(timeout).then(() => false),
+  ]);
+  if (exited) return;
+
+  try {
+    process.kill(targetPid, 0);
+  } catch {
+    return;
+  }
   // SIGKILL: http://man7.org/linux/man-pages/man7/signal.7.html
   // worker: https://github.com/nodejs/node/blob/master/lib/internal/cluster/worker.js#L22
   // subProcess.kill is wrapped to subProcess.destroy, it will wait to disconnected.
-  (subProcess.process ?? subProcess).kill('SIGKILL');
+  target.kill('SIGKILL');
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/terminate.ts` around lines 25 - 36, Replace the unreliable check of
subProcess.killed with an explicit "didExit" flag set from the 'exit' event
promise: attach a listener or await once(subProcess, 'exit') to set didExit =
true before sending SIGKILL, then after the Promise.race check didExit to decide
whether to escalate to (subProcess.process ?? subProcess).kill('SIGKILL');;
reference the existing subProcess, once(subProcess, 'exit'), sleep(timeout),
kill, and killed symbols and ensure the 'exit' event is the authoritative source
for termination rather than subProcess.killed.

Comment thread src/utils/terminate.ts
Comment on lines +49 to +58
let unterminated: number[] = [];

while (Date.now() - start < timeout - checkInterval) {
await sleep(checkInterval);
unterminated = getUnterminatedProcesses(childrenPids);
if (unterminated.length === 0) {
return;
}
}
kill(unterminated, 'SIGKILL');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Short timeout path can skip child SIGKILL entirely.

unterminated starts empty, so if the loop never runs (e.g., timeout <= check interval), Line 58 sends SIGKILL to an empty list even when children are still alive.

🛠️ Proposed fix
-  let unterminated: number[] = [];
+  let unterminated: number[] = childrenPids;
 
-  while (Date.now() - start < timeout - checkInterval) {
+  while (unterminated.length > 0 && Date.now() - start < timeout - checkInterval) {
     await sleep(checkInterval);
     unterminated = getUnterminatedProcesses(childrenPids);
     if (unterminated.length === 0) {
       return;
     }
   }
-  kill(unterminated, 'SIGKILL');
+  if (unterminated.length > 0) {
+    kill(unterminated, 'SIGKILL');
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/terminate.ts` around lines 49 - 58, The code can skip collecting
live child PIDs when timeout <= checkInterval because unterminated is
initialized empty and the loop may never run; before calling kill(...) ensure
you compute the current list of live children by invoking
getUnterminatedProcesses(childrenPids) one final time (or initialize
unterminated with that result before the loop) so that kill(unterminated,
'SIGKILL') is only called with the actual remaining PIDs; update the logic
around unterminated, the while loop using start/timeout/checkInterval, and the
final kill call to use that fresh result.

Comment on lines 18 to +19
it('should exit when emit error during agent worker boot', () => {
app = utils.cluster('apps/agent-worker-threads-error');
app = cluster('apps/agent-worker-threads-error');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This case no longer exercises the worker_threads path.

Unlike the previous test, this cluster boot omits { startMode: 'worker_threads' }, so the suite can pass while the default process-mode agent path is what actually ran.

Suggested fix
-      app = cluster('apps/agent-worker-threads-error');
+      app = cluster('apps/agent-worker-threads-error', { startMode: 'worker_threads' } as any);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it('should exit when emit error during agent worker boot', () => {
app = utils.cluster('apps/agent-worker-threads-error');
app = cluster('apps/agent-worker-threads-error');
it('should exit when emit error during agent worker boot', () => {
app = cluster('apps/agent-worker-threads-error', { startMode: 'worker_threads' } as any);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/worker_threads.test.ts` around lines 18 - 19, The test "should exit when
emit error during agent worker boot" no longer exercises the worker_threads path
because the cluster invocation omits the startMode override; update the cluster
call in the test (the line assigning app =
cluster('apps/agent-worker-threads-error')) to include the option { startMode:
'worker_threads' } so the agent runs in worker_threads mode (or switch to the
correct fixture that explicitly uses worker_threads), ensuring the test actually
triggers the worker_threads code path.

@killagu
Copy link
Copy Markdown
Contributor Author

killagu commented Apr 15, 2026

Wrong base branch — this was created against master (v3+ TypeScript), but the active branch is 1.x. Re-creating against 1.x.

@killagu killagu closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants