Skip to content

feat: [ENG-2240] AutoHarness V2 SandboxService.loadHarness + harness.* injection#499

Merged
danhdoan merged 2 commits intoproj/autoharness-v2from
feat/ENG-2240
Apr 21, 2026
Merged

feat: [ENG-2240] AutoHarness V2 SandboxService.loadHarness + harness.* injection#499
danhdoan merged 2 commits intoproj/autoharness-v2from
feat/ENG-2240

Conversation

@danhdoan
Copy link
Copy Markdown
Collaborator

Summary

  • Problem: the module builder (ENG-2239) and store (ENG-2228) shipped standalone — nothing yet connects them to the sandbox. User code running in the sandbox has no harness.* surface to call, and the per-session version id tracked by the outcome recorder stays empty.
  • Why it matters: completes the Phase 3 vertical. After this PR, the full pipeline — store → builder → sandbox injection — is wired end to end behind SandboxService.loadHarness. Phase 5 (mode selector) is the first consumer; Phase 3 Task 3.5's isolation integration test also depends on this wiring.
  • What changed: new loadHarness(sessionId, projectId, commandType): Promise<HarnessLoadResult> on SandboxService. Two new setters (setHarnessModuleBuilder, setHarnessStore). service-initializer.ts constructs the builder and injects both. executeCode now injects harness.* into the sandbox context on creation when a module is loaded for the session. 6 unit tests covering every branch.
  • What did NOT change (scope boundary): No consumer calls loadHarness yet — Phase 5 will be the first. No mode selection (hardcoded to Phase 3 assisted baseline). No attack-vector integration tests (Task 3.5). No template content (Phase 4). Sandbox.execute semantics unchanged when no harness is loaded.

Type of change

  • New feature
  • Bug fix
  • Refactor (no behavior change)
  • Documentation
  • Test
  • Chore (build, dependencies, CI)

Scope (select all touched areas)

  • TUI / REPL
  • Agent / Tools
  • LLM Providers
  • Server / Daemon
  • Shared (constants, types, transport events)
  • CLI Commands (oclif)
  • Hub / Connectors
  • Cloud Sync
  • CI/CD / Infra

Linked issues

  • Closes ENG-2240
  • Depends on (merged): ENG-2238 (Phase 3 Task 3.1 — HarnessContext types), ENG-2239 (Phase 3 Task 3.2 — HarnessModuleBuilder), ENG-2228 (Phase 1 Task 1.4 — HarnessStore service-initializer wiring)
  • Unblocks Phase 3 Task 3.5 (isolation integration test — exercises this wiring end-to-end against attack fixtures)
  • First production consumer: Phase 5 (mode selector + AgentLLMService session-start hook)

Root cause (bug fixes only, otherwise write N/A)

  • Root cause: N/A
  • Why this was not caught earlier: N/A

Test plan

  • Coverage added:
    • Unit test
    • Integration test
    • Manual verification only
  • Test file(s): test/unit/infra/sandbox/sandbox-service-harness-load.test.ts
  • Key scenario(s) covered (6 tests):
    • Config-disabled early return: harness.enabled === false{loaded: false, reason: 'no-version'}, store.getLatest never called
    • No version in store: store.getLatest(...) returns undefined{loaded: false, reason: 'no-version'}, call target verified
    • Builder failure propagation: builder returns {loaded: false, reason: 'meta-threw'} → propagates as-is, no session state populated
    • Successful load: {loaded: true} returned with the stored HarnessVersion reference
    • Capability-driven injection: curate-only template end-to-end through a real HarnessModuleBuilderharness.curate present, harness.query absent
    • harnessVersionIdBySession population: successful load writes the version id so Phase 2 recorder can tag outcomes

User-visible changes

None. No consumer calls loadHarness yet, and harness.enabled = false remains the public default. Sandbox execution without a loaded harness is byte-identical to before this PR.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording

Before this PR, loadHarness didn't exist — no test in the suite referenced it. After: all 6 pass. Full suite: 6697 passing / 0 failing.

$ perl -e 'alarm 60; exec @ARGV' npx mocha test/unit/infra/sandbox/sandbox-service-harness-load.test.ts --timeout 10000
  SandboxService.loadHarness
    ✔ returns {loaded:false,reason:no-version} when harness.enabled is false — no store call
    ✔ returns {loaded:false,reason:no-version} when store has no version for the pair
    ✔ propagates builder {loaded:false} result without injecting harness namespace
    ✔ returns {loaded:true} with the stored version on success
    ✔ curate-only module injects harness.curate and NOT harness.query
    ✔ populates harnessVersionIdBySession on successful load for Phase 2 recorder
  6 passing (30ms)

Checklist

  • Tests added or updated and passing (npm test) — 6 new tests; full suite 6697 passing / 0 failing
  • Lint passes (npm run lint) — 0 errors, 226 pre-existing warnings
  • Type check passes (npm run typecheck) — exit=0
  • Build succeeds (npm run build) — exit=0
  • Commits follow Conventional Commits format — feat: [ENG-2240] ...
  • Documentation updated (if applicable) — task doc at features/autoharness-v2/tasks/phase_3/task_03-load-harness-and-injection.md (research repo) drove the scope; the ctx.abort placeholder + mode-hardcoded-to-assisted decisions flagged below for post-merge task doc tightening
  • No breaking changes (or clearly documented above) — purely additive; no existing call site exercises the new path
  • Branch is up to date with main — targets proj/autoharness-v2, not main

Risks and mitigations

  • Risk: ctx.abort is a placeholder — a fresh AbortController().signal per call. If a harness function checks ctx.abort.aborted, it will always see false, even when user code has requested cancellation.

    • Mitigation: Phase 4 pass-through templates don't check ctx.abort (they forward to ctx.tools.curate which has its own cancellation plumbing). Phase 5's AgentLLMService hook threads the real session abort signal through. Placeholder is documented in-source at buildHarnessNamespace with a pointer to Phase 5. If a Phase 6 refined harness genuinely needs ctx.abort before Phase 5 lands, the fix is plumbing the sandbox-level abort signal through loadHarness — additive.
  • Risk: Mode is hardcoded to "assisted" (all exported capabilities visible to user code). A session that should be in Mode B (filter) or Mode C (policy) would still see harness.* without the mode-specific safety caps from v1-design-decisions.md §2.5.

    • Mitigation: No consumer calls loadHarness yet — Phase 5 is the first. Phase 5's mode selector runs BEFORE loadHarness and refuses to load below the Mode A threshold (H ≥ 0.30). So the "wrong mode injected" scenario can't materialize until a mode selector exists, and the mode selector arrives in the same phase that will swap the hardcoded mode. Zero production risk.
  • Risk: buildHarnessNamespace binds ctx.tools.curate / ctx.tools.readFile to the session's real curate service + file system. If those services aren't wired (e.g., early tests), the bound tools throw instead of silently no-op'ing.

    • Mitigation: Explicit error messages ("harness.ctx.tools.curate: no curate service wired") make diagnostic obvious. Existing SandboxService tests that don't wire those services also don't call loadHarness, so no test breakage. Future regression: flagged in the buildHarnessTools JSDoc.
  • Risk: loadHarness uses session-scoped state on two maps (sessionHarnessStates, harnessVersionIdBySession). clearSession and cleanup both clear them, but any future bypass of those lifecycle hooks would leak state across sessions.

    • Mitigation: The cleanup pattern mirrors sandboxes and pendingVariables — all maps follow the same lifecycle hooks. Any future field that forgets to clear is caught by the established test pattern ("sessions don't share state"). Acceptable v1.0 hygiene.

Notes for reviewers

loadHarness is API-complete but production-dormant. Shipping this PR doesn't change any user-visible behavior because nothing calls loadHarness yet. Phase 5's mode selector + AgentLLMService session-start hook is the first production caller. Task 3.5 (isolation integration test) is the first test consumer. This shape — ship the capability, defer the caller — matches Phase 1.4's pattern and de-risks Phase 5 by letting the wiring soak in proj/autoharness-v2 first.

The harness.* injection happens in two places, both calling buildHarnessNamespace(sessionId):

  1. loadHarness itself, when called for a session whose sandbox already exists (updateContext({harness: ns})).
  2. executeCode's sandbox-creation branch (new sandbox for this session), before new LocalSandbox(...).

This handles both orderings: loadHarness before or after the first executeCode. If Phase 5 ever calls them in a weird interleaving, the namespace lands correctly either way.

SessionHarnessState is readonly at the type level. The stored HarnessMeta is the cached result of module.meta() from Task 3.2, which is pure and invariant. readonly prevents accidental mutation of state used across multiple executeCode calls.

Capability-driven injection is export-based, not meta-capabilities-based. If a template declares meta().capabilities: ['curate', 'query'] but only exports curate, the injected namespace has just harness.curate. Strictly, this diverges from the task doc's wording ("capabilities declared in meta() are injected"), but in practice well-formed templates export exactly what they declare, and HarnessMetaSchema's validation runs at save time in Phase 1's store. The module-builder already uses the same export-based logic. Flagging in case reviewers prefer an additional cross-check.

Scope call — loadHarness is idempotent but not cheap on repeat. Each call re-evaluates the template through the VM. If Phase 5 calls loadHarness once per session (the intended pattern), this is a non-issue. If a future consumer calls it per-request, add a "already loaded?" check at the top.

Related

  • Types this method consumes: src/agent/core/domain/harness/types.ts (ENG-2238 — HarnessContext, HarnessLoadResult)
  • Module builder: src/agent/infra/harness/harness-module-builder.ts (ENG-2239)
  • Store: src/agent/infra/harness/harness-store.ts (ENG-2227 + ENG-2228)
  • Handoff contract: features/autoharness-v2/tasks/phase_3_4_handoff.md §C2 §C3 §C4
  • Task doc: features/autoharness-v2/tasks/phase_3/task_03-load-harness-and-injection.md (research repo)
  • Previous in stream: Phase 3 Task 3.2 (ENG-2239 — module builder)
  • Next in stream: Phase 3 Task 3.4 (graceful degradation tests — exercises the 7 A4 cases end-to-end against this pipeline)
  • First production consumer: Phase 5 — HarnessModeSelector + AgentLLMService session-start hook
    `

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Claude finished @danhdoan's task in 3m 22s —— View job


PR Review: feat: [ENG-2240] AutoHarness V2 SandboxService.loadHarness + harness.* injection

  • Gather context and understand the request
  • Read changed files
  • Read supporting types/interfaces for deeper context
  • Analyze for correctness, security, and code quality
  • Post comprehensive review with inline comments and overall score

Overall Score: 4 / 5

This is a well-scoped, additive PR that cleanly wires up the Phase 3 pipeline. The documentation and JSDoc are unusually thorough, the two-ordering injection design (load before or after first executeCode) is correct, and the harness namespace is properly export-driven rather than declaration-driven. No functional regressions found. The issues below are either blocking-if-unaddressed or minor polish items.


Critical

loadHarness (and the two new setters) are missing from ISandboxService

CipherAgentServices.sandboxService is typed as ISandboxService (cipher-services.ts:79). Phase 5's AgentLLMService hook receives sharedServices.sandboxService through that type. When Phase 5 tries to call loadHarness, it will fail at compile time because the method doesn't exist on the interface.

The three methods that need to be added to src/agent/core/interfaces/i-sandbox-service.ts:

  • loadHarness(sessionId, projectId, commandType): Promise<HarnessLoadResult> — Phase 5's caller
  • setHarnessModuleBuilder(builder: HarnessModuleBuilder): void — wired during init; fine to stay optional like the other setters
  • setHarnessStore(store: IHarnessStore): void — same

service-initializer.ts uses the concrete SandboxService type directly, so the wiring calls won't fail today, but loadHarness is the consumer-facing surface and it must be on the interface before Phase 5 lands. Fix this →


Significant

Untested injection path: "sandbox already exists when loadHarness is called"

loadHarness lines 295–301 handle the case where executeCode ran before loadHarness. This path (sandbox.updateContext({harness: harnessNs})) has no test. All 6 existing tests call loadHarness before any executeCode, so they only exercise the other path (namespace picked up at sandbox-creation time in executeCode). The symmetric interleaving isn't covered.

A minimal addition:

it('injects harness into an already-existing sandbox when loadHarness is called after executeCode', async () => {
  // ... setup builder/store/config ...
  await service.executeCode('1+1', 's1')   // create sandbox first
  await service.loadHarness('s1', 'p1', 'curate')
  // verify harness is now present in sandbox context
})

Minor (inline comments posted)

Location Finding
sandbox-service.ts:265 'no-version' used for both "disabled" and "no version trained" — conflates two distinct states
sandbox-service.ts:297 if (harnessNs !== undefined) is always true here — redundant guard after sessionHarnessStates.set
sandbox-service.ts:481 harness.curate exposes no way for the caller to inject custom env overrides — fine for v1.0 but worth a note for Phase 5 API review
test:109 Missing assertion that sessionHarnessStates stays empty on builder failure
test:163 Behavioral test preferred over private method cast

What's Well Done

  • Two-ordering correctness: The design where loadHarness injects immediately if a sandbox exists, or defers to executeCode's creation branch, handles both call orderings cleanly without a polling loop or flag.
  • readonly on SessionHarnessState: Prevents accidental mutation of cross-call state; good hygiene.
  • buildHarnessTools() captures services dynamically via this: Each harness.curate() call gets a fresh binding to the current curateService — consistent with how toolsSDK handles service updates.
  • Never-throws contract on loadHarness: Every failure mode returns a typed result. Combined with the HarnessLoadResult discriminated union, downstream code can narrow safely without try/catch.
  • resolveProjectType() uses ProjectTypeSchema.safeParse: New project types added to the schema are automatically accepted without a code change — a quiet forward-compatibility win.
  • cleanup() and clearSession() both clear the new maps: Lifecycle hygiene matches the existing patterns for sandboxes and pendingVariables.

Comment thread src/agent/infra/sandbox/sandbox-service.ts
Comment thread src/agent/infra/sandbox/sandbox-service.ts Outdated
Comment thread src/agent/infra/sandbox/sandbox-service.ts
Comment thread test/unit/infra/sandbox/sandbox-service-harness-load.test.ts Outdated
Comment thread test/unit/infra/sandbox/sandbox-service-harness-load.test.ts Outdated
injection

Phase 3 Task 3.3 — wires the module builder (ENG-2239) and store
(ENG-2228) into `SandboxService` as a user-facing capability.
`loadHarness(sessionId, projectId, commandType)` reads the latest
`HarnessVersion` via the store, evaluates it through the module
builder, and exposes `harness.curate`, `harness.query`,
`harness.meta` inside the session's sandbox context.

- `SandboxService` gets two new optional fields: `harnessModuleBuilder`
  (class) + `harnessStore` (interface). Setters wire them
  alphabetically.
- `service-initializer.ts` constructs `HarnessModuleBuilder` alongside
  the existing `HarnessStore` + `HarnessOutcomeRecorder` block, and
  calls both setters. No new service on `CipherAgentServices` — the
  builder is internal to the sandbox.
- `src/agent/infra/harness/index.ts` barrel re-exports
  `HarnessModuleBuilder`.

Never throws — encodes every failure in the returned
`HarnessLoadResult`.

  - config disabled / deps missing → `{loaded:false,
    reason:'no-version'}`
  - store returns `undefined` → `{loaded:false, reason:'no-version'}`
  - builder returns `{loaded:false,
    reason:'syntax'|'meta-threw'|'meta-invalid'}`
    → propagates unchanged, sandbox untouched
  - builder returns `{loaded:true}`:
      1. `sessionHarnessStates` Map gets {commandType, meta, module,
         projectType} keyed by sessionId
      2. `harnessVersionIdBySession` populated so Phase 2 recorder
         can attribute outcomes to the loaded version
      3. If sandbox already exists → `updateContext({harness: ns})`;
         otherwise `executeCode` injects on creation via
         `buildHarnessNamespace(sessionId)`

`buildHarnessNamespace(sessionId)` returns `{meta, curate?, query?}`
per what the loaded module exports:

  - `meta()` returns the captured `HarnessMeta` (no VM re-invocation)
  - `curate()` / `query()` build a fresh `HarnessContext` per call
    and invoke `module.curate(ctx)` / `module.query(ctx)` through the
    module builder's timeout-wrapped, deep-freeze-enforced path
  - `ctx.tools.curate` / `ctx.tools.readFile` bind to the session's
    real curate service + file system; explicit error when unwired

Hardcoded to the Phase 3 assisted baseline. Phase 5's
`HarnessModeSelector` will layer mode gating on top — mode-specific
prompt contributors + Mode C safety caps.

Currently a fresh `AbortController().signal` per call — placeholder.
Phase 5's `AgentLLMService` hook will thread the session's real
abort signal through so user cancellation propagates into harness
code. Documented in-source.

6 unit tests in `sandbox-service-harness-load.test.ts` covering:

  - config-disabled early return (no store call)
  - no-version store return
  - builder-failure propagation (no state populated)
  - successful load returns {loaded:true} with stored version
  - curate-only template → `harness.curate` present, `harness.query`
    absent (end-to-end via real `HarnessModuleBuilder`)
  - `harnessVersionIdBySession` populated on success for Phase 2
    recorder attribution

- `rg "loadHarness" src/` → 1 definition + 0 production call sites.
  Phase 5's mode selector will be the first real consumer. Comments +
  JSDoc references present but no code calls.
- `rg "new HarnessModuleBuilder" src/` → 1 construction site in
  service-initializer.ts.

Full suite: 6697 passing / 0 failing.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Claude finished @danhdoan's task in 1m 50s —— View job


PR Re-Review: feat: [ENG-2240] AutoHarness V2 SandboxService.loadHarness + harness.* injection

  • Gather context and understand the request
  • Read changed files and diff since last review
  • Check if critical issue (ISandboxService) was addressed
  • Check if significant issue (missing test for alternate injection path) was addressed
  • Check inline comment issues
  • Post comprehensive updated review

Overall Score: 4 / 5

The implementation remains well-scoped and the core logic is correct. This re-review was triggered by the synchronize event but the diff is identical to the first review — none of the previously raised issues have been addressed yet. Summary of outstanding items follows.


Critical — Still Open

loadHarness, setHarnessModuleBuilder, setHarnessStore are missing from ISandboxService

src/agent/core/interfaces/i-sandbox-service.ts has not been updated. The interface ends at line 131 without any of the three new methods. CipherAgentServices.sandboxService is typed as ISandboxService, so Phase 5's AgentLLMService hook will fail at compile time when it tries to call loadHarness.

Required additions to ISandboxService:

// non-optional — Phase 5's production caller
loadHarness(sessionId: string, projectId: string, commandType: 'chat' | 'curate' | 'query'): Promise<HarnessLoadResult>

// optional setters (matching the existing set* pattern)
setHarnessModuleBuilder?(builder: HarnessModuleBuilder): void
setHarnessStore?(store: IHarnessStore): void

Fix this →


Significant — Still Open

Untested injection path: "sandbox already exists when loadHarness is called"

All 6 tests call loadHarness before any executeCode, covering only the path where executeCode picks up the namespace at sandbox-creation time. The branch at sandbox-service.ts:295–301 (sandbox.updateContext({harness: harnessNs})) has zero test coverage. The PR description explicitly calls out that both orderings are handled — but only one is verified.

Minimal test to add:

it('injects harness into an already-existing sandbox when loadHarness is called after executeCode', async () => {
  service.setHarnessConfig(makeEnabledConfig())
  service.setHarnessStore(store as unknown as IHarnessStore)
  const {HarnessModuleBuilder: RealBuilder} = await import('../../../../src/agent/infra/harness/harness-module-builder.js')
  const {NoOpLogger} = await import('../../../../src/agent/core/interfaces/i-logger.js')
  service.setHarnessModuleBuilder(new RealBuilder(new NoOpLogger()))
  store.getLatest.resolves(makeVersion())

  // Create sandbox FIRST
  await service.executeCode('1+1', 's1')

  // THEN load harness — should inject via updateContext branch
  const result = await service.loadHarness('s1', 'p1', 'curate')
  expect(result.loaded).to.equal(true)

  // Verify harness is now visible inside the sandbox
  const execResult = await service.executeCode(
    '__result = typeof harness !== "undefined" && typeof harness.curate === "function"',
    's1',
  )
  expect(execResult.locals.__result).to.equal(true)
})

Fix this →


Minor — Still Open

Location Finding Status
sandbox-service.ts:260–265 'no-version' used for disabled config AND unconfigured wiring — conflates distinct failure modes unchanged
sandbox-service.ts:297–300 if (harnessNs !== undefined) is always true here — sessionHarnessStates.set was called 13 lines above unchanged
test:108–109 Missing assertion that sessionHarnessStates is NOT populated on builder failure; comment says it but doesn't assert it unchanged
test:160–163 Private method cast tests implementation detail; behavioral executeCode assertion preferred unchanged

For the redundant guard at line 297–300, the simplest fix is a comment acknowledging the guarantee, or just dropping the inner if:

// sessionHarnessStates was just set above — buildHarnessNamespace is guaranteed non-undefined here
sandbox.updateContext({harness: this.buildHarnessNamespace(sessionId)!})

(or keep the outer sandbox !== undefined guard and inline the update.)


What's Well Done (unchanged from prior review)

  • Two-ordering correctness: Load before or after first executeCode both inject cleanly.
  • readonly on SessionHarnessState: Prevents cross-call mutation.
  • Never-throws contract: Every failure mode is a typed HarnessLoadResult.
  • cleanup() / clearSession() both clear the new maps: Lifecycle hygiene is consistent.
  • resolveProjectType() uses safeParse: New project types accepted without code changes.
  • buildHarnessTools() captures services via this: Fresh bindings per call, consistent with toolsSDK pattern.

Action items before merge:

  1. Add loadHarness, setHarnessModuleBuilder, setHarnessStore to ISandboxService (blocking)
  2. Add test for the reverse-order injection path (executeCode → loadHarness) (blocking per TDD standard)
  3. (Nice-to-have) Add assertion to builder-failure test; address redundant guard

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Claude finished @danhdoan's task in 1m 44s —— View job


PR Re-Review: feat: [ENG-2240] AutoHarness V2 SandboxService.loadHarness + harness.* injection

  • Gather context and understand the request
  • Read changed files to check status of previously raised issues
  • Verify ISandboxService update (critical issue from prior reviews)
  • Verify new test for reverse-order injection path (significant issue)
  • Check minor issues
  • Post comprehensive updated review

Overall Score: 5 / 5

All items raised in the two prior reviews have been addressed. The PR is now fully ready to merge into proj/autoharness-v2.


Previously Critical — Now Resolved ✅

ISandboxService updated with the three new methods

src/agent/core/interfaces/i-sandbox-service.ts now includes:

  • loadHarness(sessionId, projectId, commandType): Promise<HarnessLoadResult> — non-optional, Phase 5's production caller surface
  • setHarnessModuleBuilder?(builder: HarnessModuleBuilder): void — optional, matching the existing setter pattern
  • setHarnessStore?(store: IHarnessStore): void — optional, matching the existing setter pattern

Phase 5's AgentLLMService hook can now call sharedServices.sandboxService.loadHarness(...) through the typed interface without a compile-time failure.


Previously Significant — Now Resolved ✅

Reverse-order injection test added

test/unit/infra/sandbox/sandbox-service-harness-load.test.ts:186–220 covers the sandbox.updateContext({harness: ...}) branch in loadHarness. The test:

  1. Calls executeCode first (creates the sandbox with no harness — asserts typeof harness === 'undefined')
  2. Calls loadHarness on the already-existing sandbox
  3. Calls executeCode again and verifies harness.meta and harness.curate are now present

Both injection orderings are now test-verified.


Previously Minor — All Resolved ✅

Finding Resolution
'no-version' used for disabled/not-configured/missing — conflation of distinct states Explicit intentional comment added at sandbox-service.ts:260–267: "Deliberate: three distinct conditions…collapse into the same 'no-version' result for v1.0. Keep this conflation intentional, not accidental."
if (harnessNs !== undefined) redundant guard after unconditional sessionHarnessStates.set Guard removed; direct sandbox.updateContext({harness: this.buildHarnessNamespace(sessionId)}) with a comment: "buildHarnessNamespace is guaranteed non-undefined here: state was just set above."
Missing assertion on sessionHarnessStates in builder-failure test Test at line 116 now asserts expect(internal.sessionHarnessStates.has('s1')).to.equal(false) and expect(internal.harnessVersionIdBySession.has('s1')).to.equal(false)
Capability-driven injection test used private method cast (unknown cast to call buildHarnessNamespace) Test now exercises executeCode and inspects harness.* availability as user code would see it — behaviorally correct. Comment explicitly notes this approach.

What's Well Done (carried from prior reviews)

  • Two-ordering correctness: Both load-before-execute and execute-before-load paths are now tested and work correctly.
  • readonly on SessionHarnessState: Prevents cross-call mutation.
  • Never-throws contract: Every failure mode is a typed HarnessLoadResult.
  • cleanup() / clearSession() both clear the new maps: Lifecycle hygiene consistent with sandboxes and pendingVariables.
  • resolveProjectType() uses safeParse: New project types accepted without code changes.
  • buildHarnessTools() captures services via this: Fresh bindings per call, consistent with toolsSDK pattern.
  • JSDoc on loadHarness in ISandboxService: Thorough, references Phase 5 as the first production consumer — useful for whoever writes that callsite.

No further action required. Ready to merge.

@danhdoan danhdoan merged commit 4834c54 into proj/autoharness-v2 Apr 21, 2026
6 checks passed
@danhdoan danhdoan deleted the feat/ENG-2240 branch April 22, 2026 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants