Skip to content

Improve workflow runtime error logging#1812

Closed
pranaygp wants to merge 2 commits intomainfrom
pranaygp/workflow-replay-error-logging
Closed

Improve workflow runtime error logging#1812
pranaygp wants to merge 2 commits intomainfrom
pranaygp/workflow-replay-error-logging

Conversation

@pranaygp
Copy link
Copy Markdown
Contributor

Summary

  • Log the error stack as the message for user-code errors and WorkflowRuntimeError during setup, so stacks actually appear in logs
  • Replay-timeout path: warn when the queue will retry, error only when max retries is exhausted; log the reason when we fail to mark the run as failed
  • Include workflowRunId in the suspension debug log, and drop the now-redundant [Workflows] "<runId>" - prefix from the suspension message
  • Standardize the console prefix in logger.ts to [workflow-sdk]

Test plan

  • pnpm test in packages/core
  • Trigger a failing workflow locally and confirm the stack is visible in the runtime error log
  • Trigger a replay timeout and confirm retry attempts log as warn and the final attempt logs as error

🤖 Generated with Claude Code

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 20, 2026

🦋 Changeset detected

Latest commit: e12b1c1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages
Name Type
@workflow/core Patch
@workflow/builders Patch
@workflow/cli Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/vitest Patch
@workflow/web-shared Patch
@workflow/web Patch
workflow Patch
@workflow/world-testing Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/nuxt Patch
@workflow/ai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Error Error Apr 20, 2026 3:06pm
example-nextjs-workflow-webpack Error Error Apr 20, 2026 3:06pm
example-workflow Error Error Apr 20, 2026 3:06pm
workbench-astro-workflow Error Error Apr 20, 2026 3:06pm
workbench-express-workflow Error Error Apr 20, 2026 3:06pm
workbench-fastify-workflow Error Error Apr 20, 2026 3:06pm
workbench-hono-workflow Error Error Apr 20, 2026 3:06pm
workbench-nitro-workflow Error Error Apr 20, 2026 3:06pm
workbench-nuxt-workflow Error Error Apr 20, 2026 3:06pm
workbench-sveltekit-workflow Error Error Apr 20, 2026 3:06pm
workbench-vite-workflow Error Error Apr 20, 2026 3:06pm
workflow-docs Error Error Apr 20, 2026 3:06pm
workflow-swc-playground Error Error Apr 20, 2026 3:06pm
workflow-web Error Error Apr 20, 2026 3:06pm

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

No benchmark result files found in benchmark-results

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

🧪 E2E Test Results

No test result files found.


Some E2E test jobs failed:

  • Vercel Prod: failure
  • Local Dev: failure
  • Local Prod: failure
  • Local Postgres: failure
  • Windows: failure

Check the workflow run for details.

Copy link
Copy Markdown
Contributor

@vercel vercel Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Suggestion:

Unused runId parameter in buildWorkflowSuspensionMessage causes TypeScript build failure (TS6133) because noUnusedParameters: true is set in tsconfig.

Fix on Vercel

@pranaygp
Copy link
Copy Markdown
Contributor Author

Superseded by #1832, which folds all of this PR's changes into the broader structured-logger work (part of the friendlier-errors PR stack #1831#1832). Closing in favor of that.

pranaygp added a commit that referenced this pull request Apr 24, 2026
Adds a `.child()` and `.forRun(runId, workflowName)` child-logger API to
the structured logger so runtime/step code doesn't have to repeat
`workflowRunId`/`workflowName`/`stepId` on every call. Normalizes error
metadata to structured `errorName` / `errorMessage` / `errorStack` fields
instead of ad-hoc `error: err.message` strings, and adds comments to
silent catches that swallow expected idempotency conflicts.

Also folds in the pending changes from #1812 so that PR can be closed:

- Standardize the console prefix to `[workflow-sdk]`.
- Split the replay-timeout log into a warn-while-retrying vs.
  error-when-giving-up, and surface the underlying error when we can't
  mark a timed-out run as failed.
- Include the error stack in the "Fatal runtime error during workflow
  setup" log and in the top-level user-code workflow error log so the
  stack surfaces in flattened log drains.
- Drop the `[Workflows] "<runId>" - ` prefix from
  `buildWorkflowSuspensionMessage` — the structured logger now attaches
  run context.

Supersedes #1812.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
pranaygp added a commit that referenced this pull request May 4, 2026
* Introduce structured context-violation errors + Ansi renderer

Phase 1: Add Ansi rendering helpers (frame, hint, note, help, code, inline)
to @workflow/errors, and a chalk mock for readable snapshot tests.

Phase 2: Add four context-violation error classes to @workflow/core
(NotInWorkflowContextError, NotInStepContextError,
NotInWorkflowOrStepContextError, UnavailableInWorkflowContextError)
and apply them to all twelve user-facing throw sites so errors now
include docs links and a structured "what/why/fix" frame.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address review: tighten changeset, implement ansifyName, harden Ansi

- Tighten phase 1 changeset to a single sentence (per pranaygp review) and switch to double-quoted frontmatter (per Copilot + repo convention).
- Implement `ansifyName` to actually apply dim styling to workflow/ / step/ prefixes; add an `Ansi.dim` helper to `@workflow/errors` so callers don't need to import chalk directly.
- Remove the `void getWorkflowMetadata;` workaround in context-errors.ts by dropping the unused value import (we only needed the type and symbol).
- Render the plain-Error throw in `workflow/get-workflow-metadata.ts` with `Ansi.frame` + docs link so the VM path matches the structured-class styling from the sibling step path (still uses a plain Error to avoid the module-init cycle).
- Guard `buildUnderline` against zero-length markers so a stray empty token can't produce a negative `String.repeat` count.

* Structured runtime logger metadata + fold in replay-timeout logging

Adds a `.child()` and `.forRun(runId, workflowName)` child-logger API to
the structured logger so runtime/step code doesn't have to repeat
`workflowRunId`/`workflowName`/`stepId` on every call. Normalizes error
metadata to structured `errorName` / `errorMessage` / `errorStack` fields
instead of ad-hoc `error: err.message` strings, and adds comments to
silent catches that swallow expected idempotency conflicts.

Also folds in the pending changes from #1812 so that PR can be closed:

- Standardize the console prefix to `[workflow-sdk]`.
- Split the replay-timeout log into a warn-while-retrying vs.
  error-when-giving-up, and surface the underlying error when we can't
  mark a timed-out run as failed.
- Include the error stack in the "Fatal runtime error during workflow
  setup" log and in the top-level user-code workflow error log so the
  stack surfaces in flattened log drains.
- Drop the `[Workflows] "<runId>" - ` prefix from
  `buildWorkflowSuspensionMessage` — the structured logger now attaches
  run context.

Supersedes #1812.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Use double-quoted changeset frontmatter per repo convention

* Add SerializationError + apply to user-facing serialization sites

Phase 4 of friendlier errors: introduce a `SerializationError` class with
an optional `hint` and a docs link (workflow-sdk.dev/err/serialization-failed),
and adopt it at every user-facing serialization boundary in @workflow/core:

- Locked ReadableStream at a workflow boundary
- Unregistered class / missing `classId` / missing `WORKFLOW_DESERIALIZE`
- Attempting to return step functions to clients or call workflow functions
  directly
- Webhook `respondWith()` called outside a step
- `dehydrate*` / `getSerializeStream` failures (workflow args/return, step
  args/return, stream chunks)

Internal invariants (format prefix length checks, unknown format bytes,
missing `STREAM_NAME_SYMBOL`, encryption key/size guards, etc.) now throw
`WorkflowRuntimeError` instead of plain `Error` so the classifier and logger
treat them consistently.

`formatSerializationError` now returns `{ message, hint }` so the hint
fragment can be rendered with the standard SerializationError framing
instead of being baked into the message string.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Use double-quoted changeset frontmatter per repo convention

* Presentation-only user vs SDK error attribution

Add describeError() that derives attribution and class-aware hints from
existing error classes + RUN_ERROR_CODES — no event data changes. Wire into
step failures, max-delivery exhaustion, run failures, and fatal setup errors
so terminal logs include errorAttribution and a hint for known error types.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address review: describeError accepts precomputed errorCode + instanceof

- `describeError(err, errorCode?)` now accepts an optional precomputed
  `RunErrorCode`. `classifyRunError(err)` only narrows to USER_ERROR /
  RUNTIME_ERROR, so the REPLAY_TIMEOUT and MAX_DELIVERIES_EXCEEDED branches
  were previously unreachable from the step / run failure log sites.
  Callers that know the failure category (runtime.ts for replay timeout and
  max-deliveries exhaustion) now pass the code in.
- Context-violation checks use `instanceof` against the actual classes from
  context-errors.ts instead of a name-string set. Type-safe + survives
  class renames.
- Wire the new hints through to the REPLAY_TIMEOUT and MAX_DELIVERIES_EXCEEDED
  log sites so those branches actually render a hint now.
- 3 new tests cover the reachable code paths + precomputed-code override.
- Changeset frontmatter switched to double quotes per repo convention.

* Cosmetic consistency pass on remaining bare throws

Internal invariants now use WorkflowRuntimeError so describeError attributes
them to the SDK: missing startedAt, VM generateKey, closure-vars outside
step context, ENOTSUP. defineHook().resume() formats schema validation
failures as a readable list instead of a JSON blob.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Use double-quoted changeset frontmatter per repo convention

* Data-driven describeRunError + expose via @workflow/core/describe-error

Observability renderers read persisted run_failed / step_failed event data,
not live Error instances. describeRunError takes { errorCode, errorName }
and returns the same { attribution, hint } shape as describeError, so the
CLI and web UI can derive user-vs-SDK framing from the event log directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Friendlier build-time errors: WorkflowBuildError class + applications

Add `WorkflowBuildError` class in `@workflow/errors` with optional `hint`
for an actionable next step, and apply it in `@workflow/builders` at
user-facing sites: failed esbuild phases, unresolved built-in steps, and
empty esbuild output now throw `WorkflowBuildError` with a hint pointing
at the likely fix. Runtime invariants remain plain `Error`.

* Polish friendlier-errors rendering: drop functionName leak, simplify docs link, redirect stack

- Drop the readonly `functionName` param-property on context-error classes so
  util.inspect no longer prints a trailing `{ functionName: 'foo()' }` block.
- Replace the `DocLink` ("label: https://…") shape with a plain `DocsUrl`
  template-literal type. Error output now renders a single clean line:
  `docs: https://…` (new `Ansi.docs` helper) instead of the noisier
  "note: Read more about foo(): https://…".
- Add throw helpers (`throwNotInWorkflowContext`, etc.) that call
  `Error.captureStackTrace(err, stackStartFn)` on V8 engines so the top frame
  of the thrown error points at the user's call site instead of at the gate
  function inside the framework. Callers pass themselves as the boundary.
- Refactor `defineHook()` (both root and `/workflow`) to use named function
  closures rather than `this.create`/`this.resume`, since the stack redirect
  relies on a stable function identity that survives destructuring.
- Update context-errors.test.ts to snapshot the new `docs:` framing and to
  add a regression test asserting the top stack frame is the user call site.

* Consolidate friendlier-errors stack: fix ANSI leak + non-retry semantics

Addresses PR review feedback across the 8-phase friendlier-errors stack and
fixes issues surfaced by manual testing (createHook() inside a step):

- ANSI no longer leaks into .message / .stack. Context-violation errors
  now store plain text on .message and render the colored framed form
  lazily via [util.inspect.custom] / toString(). Structured logs, log
  drains, CBOR-serialized events, and JSON payloads no longer contain
  raw \x1B[...m bytes.

- Context violations are now fatal. ContextViolationError sets
  fatal = true; FatalError.is(err) recognizes any error with a
  fatal: true own property. Calling createHook() from a step no longer
  burns three retry attempts on a guaranteed-to-fail context violation.

- Ansi helpers moved to @workflow/errors/ansi subpath so imports from
  @workflow/errors no longer pull chalk into consumers that only want
  error classes (addresses reviewer VaguelySerious).

- Shared redirectStackToCaller helper in packages/core/src/capture-stack.ts,
  used by both context-errors.ts and workflow/get-workflow-metadata.ts
  (addresses Copilot review on #1849).

- Structured framed content: ContextViolationError now takes a structured
  FramedContent (title segments + detail branches) and renders plain/pretty
  from the same source of truth.

Tightens the eight existing phase changesets to 1-2 sentences each and adds
four new scoped changesets (errors-ansi-subpath, context-errors-plain-message,
context-errors-fatal, capture-stack-shared) for the followup fixes, so the
final changelog history stays readable.

* test: update step-handler mocks for scoped forRun() logger

The runtime logger now uses .forRun(runId, name, {stepId, stepName})
to attach scope context, so 409-handling log calls no longer repeat
{workflowRunId, stepId} in every metadata bag — those live on the
scoped logger instance. Update the mock to return itself from forRun()
and tighten assertions to check both the log args (errorName/errorMessage)
and the forRun() scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Mark SerializationError fatal + route dehydration through step-failure path

SerializationError now carries readonly fatal = true. Step-return
dehydration is wrapped inside the user-code try/catch so that the
resulting error flows through userCodeFailed → step_failed →
FatalError.is() short-circuit instead of bubbling up as HTTP 500 and
triggering a queue retry loop. Retrying a step that returned a non-POJO
is guaranteed to fail the same way, so this saves ~20s and 3 near-
identical error blocks per serialization failure.

* Add logging snapshot tests + manual-test artifacts

Snapshot tests lock in the exact shape of:
- describeError() payloads (attribution, errorCode, hint) for every
  classification — plain Error, SerializationError, context-violation,
  WorkflowRuntimeError, REPLAY_TIMEOUT, MAX_DELIVERIES_EXCEEDED.
- The scoped-logger call signature for the two canonical runtime
  failure paths (fatal-bubble and hit-max-retries), so refactors of
  forRun() / child() metadata merging can't silently change what users
  see in their log drains.

SerializationError now also has a direct test for readonly fatal=true
+ FatalError.is() recognition.

pr-artifacts/ contains real log-output snapshots from running the
nextjs-turbopack workbench against five error scenarios. These are
reference material for reviewers and are flagged to be removed before
merge.

* Readable step-fatal logs: inline stack + friendly step/workflow names

The step-level fatal-error log used to embed the full stack trace inside
an `errorStack` string field in the metadata object, so util.inspect
rendered it as a quote-escaped, line-continuation blob when the log
hit the terminal — unreadable in practice. Move framing + stack into
the log *message* (matching the workflow-level log in runtime.ts) and
keep the metadata object compact with only the indexable structured
fields (`errorAttribution`, `errorName`, `errorMessage`, `hint`,
IDs). Log drains still get the same keys; humans now see a readable
stack trace.

Also introduce `formatStepName` / `formatWorkflowName` in
`@workflow/utils` that render machine names
(`step//./workflows/1_simple//add`) as `add (./workflows/1_simple)` in
log framings, using the existing `parseStepName` / `parseWorkflowName`
parsers. Applied to step-fatal, hit-max-retries, exceeded-max-retries,
and workflow-threw log sites.

Artifacts in pr-artifacts/ updated to show the new output shape, and
renamed .log → .md since they're Markdown and IDE previews are nicer
that way.

* Opinionated pretty formatter for runtime structured-log metadata

Replace util.inspect's default object dump (which quote-escapes
multi-line stacks and paragraph hints into a single-line JSON-y blob)
with a workflow-aware formatter that composes the entire log line
into a single string passed to console.error / console.warn.

Highlights of the new output:
- Per-run / per-step IDs render with their parsed friendly names so
  users see `wrun_… · simple (./workflows/1_simple)` instead of just
  the raw `workflowName: 'workflow//./workflows/1_simple//simple'`.
- Color-coded attribution badge (user error red / sdk error magenta)
  paired with the error class in bold.
- Hints render as a paragraph under `hint:` rather than a backslash-
  `\n`-escaped string.
- Drops redundant fields (errorStack always; errorMessage when it's
  already in the parent message) to avoid double-printing.
- Unknown fields fall through as a sorted `key  value` tail so we
  never silently drop log information.

@workflow/errors/ansi gains bold/red/magenta helpers used by the
formatter. The web / web-shared packages don't consume stderr — they
read structured event payloads from the World event log — so this is
presentation-only at the runtime layer.

* ci(benchmarks): disable pnpm cache for getCommunityWorldsMatrix

The job never runs `pnpm install` (it just calls `node` against a
checked-in script), so the pnpm store path never exists. The post-job
`actions/setup-node@v4` cache-save then fails with `Path Validation
Error: Path(s) specified in the action for caching do(es) not exist`
and red-X's the entire job even though the matrix step succeeded.

The setup-workflow-dev composite already has a `cache-pnpm` opt-out
input for this exact case — wire it through here.

* Address PR review comments: inspect dedup, cause leak, retry-loop tests

- ContextViolationError: util.inspect(err) duplicated every framed detail
  line because the stack-tail strip only sliced the first message line.
  V8's Error.stack reads `Name: messageLine1\n  messageLine2\n  at ...`,
  so for our multi-line `title\n╰▶ docs: …` messages every detail line
  was getting prepended twice (once in the pretty form, once via the
  unsliced message tail). Count the actual message lines and slice past
  all of them. Repro test asserts `╰▶ docs:` appears exactly once.

- WorkflowError: stop assigning `cause: undefined` as an enumerable own
  property when no cause is provided. Subclasses (every error in this PR)
  inherit the parent constructor; the unconditional assignment polluted
  `util.inspect(err)` output with `{ cause: undefined, … }` on every
  no-cause instance. The `super(...)` call already conditionally sets
  `.cause` non-enumerably when `options.cause` is provided.

- step-handler.test.ts: add a regression-gate suite that exercises the
  fatal-vs-retryable retry-loop wiring directly. Asserts that an error
  with `fatal: true` produces exactly one `step_failed` event with no
  `step_retrying`, and that a non-fatal `Error` retries via
  `step_retrying` on early attempts and emits `step_failed` once the
  retry budget is exhausted. Catches the silent-regression case where
  `fatal = true` is removed from a context-violation error class but
  the `FatalError.is()` unit tests stay green.

* Consolidate changesets + remove pr-artifacts

Address review feedback to drastically shorten the changesets — fold
the 15 file-by-file entries into a single user-facing changeset for
@workflow/core / errors / builders / utils. Also drop the pr-artifacts/
folder (reviewer-only log captures, no longer needed).

* Polish runtime error logging: layout, stack trim, hint consolidation

Five user-driven fixes from manual smoke-testing of #1849:

1. Logger layout. composeLogLine() now puts the structured-fields block
   (attribution badge, run/step IDs, error code) **between** the framing
   line and the stack body, instead of after it where 30+ lines of stack
   buried the most useful information. The framing stays at the top,
   stack at the bottom, structured info readable at a glance.

2. Stack trim. Drops framework-internal frames (`node_modules/.pnpm/`,
   `node:internal/`, Turbopack-bundled `node_modules__pnpm_*` chunks,
   `_next_dist_*` chunks) and caps the surviving frame count at 6
   so the stack stays compact even on heavy async wrappers. Suppressed
   runs emit one summary line so users know the trim happened.

3. Wrapper-route noise. The nextjs-turbopack workbench's start route
   was catching `WorkflowRunFailedError` rejection on
   `Promise.race([readLoop(), run.returnValue])` and re-logging it via
   `console.error('Error in workflow stream:', error)` plus
   `controller.error(error)` — which then triggered Next.js's
   `⨯ failed to pipe response` overlay. The SDK already logs the
   failure cleanly upstream and the runId is on the response header, so
   the wrapper now closes the SSE stream cleanly on
   WorkflowRunFailedError.

4. Consistent framed `╰▶ hint:` / `╰▶ docs:` layout for all errors
   that carry a hint or docs slug. WorkflowError, SerializationError,
   and WorkflowBuildError now share one `appendFramedDetails` helper
   matching the box-drawing structure that ContextViolationError
   already used. Was: blank-line-separated `Learn more: <url>`. Now:
   one tree, indistinguishable from context-violation rendering.

5. Drop the duplicate logger-side `hint` field. Hints now live on the
   error message only — actionable hints get serialized into the event
   log, rehydrated on the workflow side, and shown in observability
   automatically. The previous logger-only hint duplicated stderr but
   never made it past the step boundary.

   Updated SerializationError hint to point at the foundations doc
   ("Ensure you're returning workflow serializable types. Check the
   serialization docs to see what's serializable:
   https://workflow-sdk.dev/docs/foundations/serialization") instead
   of the hardcoded `(plain objects, arrays, primitives, …)` list,
   which drifted out of sync as the supported types grew. Same hint
   reuses for step args, workflow args/return, stream messages, and
   any other site that goes through `formatSerializationError`.

Also retitled the retry summary `3 retries` → `3 max retries` since
"3 retries" next to "4 attempts" was ambiguous (already-happened vs.
budget).

* Trim error-card title + drop machine step name from persisted error

- ErrorStackBlock (web observability): show just the first non-empty
  trimmed line of the error message in the card title with single-line
  truncation. Multi-line messages (`Failed to serialize step return
  value\n╰▶ hint: …`) were rendering the entire framed body in the
  title, pushing the copy button off-screen and burying the
  scannability of the headline. Full message stays in the body via
  the stack (V8 prepends `Name: message` to `Error.stack`), so no
  information is lost; hover-tooltip exposes the full title text.

- Persisted error message: drop the `Step "step//./.../foo"` machine
  name from `Step failed after N retries: …` and `Step exceeded max
  retries (…)` strings. Observability already attributes the event
  to a specific step via the UI tree, and the CLI logger emits the
  friendly `Step foo (./...) hit max retries` framing on its own
  line. Embedding the raw `step//./...` machine name in the persisted
  message text was duplicate noise.

* Update .changeset/friendlier-errors.md

Co-authored-by: Peter Wielander <mittgfu@gmail.com>
Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>

* Update .changeset/pretty-log-format.md

Co-authored-by: Peter Wielander <mittgfu@gmail.com>
Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>

* Update SerializationError snapshot tests for slug-less message

The class no longer attaches a slug-based `╰▶ docs:` line — the
foundations URL is embedded directly in the hint via the
`formatSerializationError` helper in @workflow/core. Update the test
expectations accordingly:

- bare-title case is now a single line (no docs link)
- hint case renders one `╰▶ hint: …` branch (no second branch)

* Update serialization.test.ts hint assertions for foundations URL

Four `should throw error for an unsupported type` cases were still
asserting on the old hardcoded type list. Update to the new hint
phrasing that points at the foundations doc, matching the change in
`formatSerializationError` (`packages/core/src/serialization/errors.ts`).

---------

Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Peter Wielander <mittgfu@gmail.com>
ziyak97 pushed a commit to ziyak97/workflow that referenced this pull request May 4, 2026
* Introduce structured context-violation errors + Ansi renderer

Phase 1: Add Ansi rendering helpers (frame, hint, note, help, code, inline)
to @workflow/errors, and a chalk mock for readable snapshot tests.

Phase 2: Add four context-violation error classes to @workflow/core
(NotInWorkflowContextError, NotInStepContextError,
NotInWorkflowOrStepContextError, UnavailableInWorkflowContextError)
and apply them to all twelve user-facing throw sites so errors now
include docs links and a structured "what/why/fix" frame.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address review: tighten changeset, implement ansifyName, harden Ansi

- Tighten phase 1 changeset to a single sentence (per pranaygp review) and switch to double-quoted frontmatter (per Copilot + repo convention).
- Implement `ansifyName` to actually apply dim styling to workflow/ / step/ prefixes; add an `Ansi.dim` helper to `@workflow/errors` so callers don't need to import chalk directly.
- Remove the `void getWorkflowMetadata;` workaround in context-errors.ts by dropping the unused value import (we only needed the type and symbol).
- Render the plain-Error throw in `workflow/get-workflow-metadata.ts` with `Ansi.frame` + docs link so the VM path matches the structured-class styling from the sibling step path (still uses a plain Error to avoid the module-init cycle).
- Guard `buildUnderline` against zero-length markers so a stray empty token can't produce a negative `String.repeat` count.

* Structured runtime logger metadata + fold in replay-timeout logging

Adds a `.child()` and `.forRun(runId, workflowName)` child-logger API to
the structured logger so runtime/step code doesn't have to repeat
`workflowRunId`/`workflowName`/`stepId` on every call. Normalizes error
metadata to structured `errorName` / `errorMessage` / `errorStack` fields
instead of ad-hoc `error: err.message` strings, and adds comments to
silent catches that swallow expected idempotency conflicts.

Also folds in the pending changes from vercel#1812 so that PR can be closed:

- Standardize the console prefix to `[workflow-sdk]`.
- Split the replay-timeout log into a warn-while-retrying vs.
  error-when-giving-up, and surface the underlying error when we can't
  mark a timed-out run as failed.
- Include the error stack in the "Fatal runtime error during workflow
  setup" log and in the top-level user-code workflow error log so the
  stack surfaces in flattened log drains.
- Drop the `[Workflows] "<runId>" - ` prefix from
  `buildWorkflowSuspensionMessage` — the structured logger now attaches
  run context.

Supersedes vercel#1812.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Use double-quoted changeset frontmatter per repo convention

* Add SerializationError + apply to user-facing serialization sites

Phase 4 of friendlier errors: introduce a `SerializationError` class with
an optional `hint` and a docs link (workflow-sdk.dev/err/serialization-failed),
and adopt it at every user-facing serialization boundary in @workflow/core:

- Locked ReadableStream at a workflow boundary
- Unregistered class / missing `classId` / missing `WORKFLOW_DESERIALIZE`
- Attempting to return step functions to clients or call workflow functions
  directly
- Webhook `respondWith()` called outside a step
- `dehydrate*` / `getSerializeStream` failures (workflow args/return, step
  args/return, stream chunks)

Internal invariants (format prefix length checks, unknown format bytes,
missing `STREAM_NAME_SYMBOL`, encryption key/size guards, etc.) now throw
`WorkflowRuntimeError` instead of plain `Error` so the classifier and logger
treat them consistently.

`formatSerializationError` now returns `{ message, hint }` so the hint
fragment can be rendered with the standard SerializationError framing
instead of being baked into the message string.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Use double-quoted changeset frontmatter per repo convention

* Presentation-only user vs SDK error attribution

Add describeError() that derives attribution and class-aware hints from
existing error classes + RUN_ERROR_CODES — no event data changes. Wire into
step failures, max-delivery exhaustion, run failures, and fatal setup errors
so terminal logs include errorAttribution and a hint for known error types.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address review: describeError accepts precomputed errorCode + instanceof

- `describeError(err, errorCode?)` now accepts an optional precomputed
  `RunErrorCode`. `classifyRunError(err)` only narrows to USER_ERROR /
  RUNTIME_ERROR, so the REPLAY_TIMEOUT and MAX_DELIVERIES_EXCEEDED branches
  were previously unreachable from the step / run failure log sites.
  Callers that know the failure category (runtime.ts for replay timeout and
  max-deliveries exhaustion) now pass the code in.
- Context-violation checks use `instanceof` against the actual classes from
  context-errors.ts instead of a name-string set. Type-safe + survives
  class renames.
- Wire the new hints through to the REPLAY_TIMEOUT and MAX_DELIVERIES_EXCEEDED
  log sites so those branches actually render a hint now.
- 3 new tests cover the reachable code paths + precomputed-code override.
- Changeset frontmatter switched to double quotes per repo convention.

* Cosmetic consistency pass on remaining bare throws

Internal invariants now use WorkflowRuntimeError so describeError attributes
them to the SDK: missing startedAt, VM generateKey, closure-vars outside
step context, ENOTSUP. defineHook().resume() formats schema validation
failures as a readable list instead of a JSON blob.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Use double-quoted changeset frontmatter per repo convention

* Data-driven describeRunError + expose via @workflow/core/describe-error

Observability renderers read persisted run_failed / step_failed event data,
not live Error instances. describeRunError takes { errorCode, errorName }
and returns the same { attribution, hint } shape as describeError, so the
CLI and web UI can derive user-vs-SDK framing from the event log directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Friendlier build-time errors: WorkflowBuildError class + applications

Add `WorkflowBuildError` class in `@workflow/errors` with optional `hint`
for an actionable next step, and apply it in `@workflow/builders` at
user-facing sites: failed esbuild phases, unresolved built-in steps, and
empty esbuild output now throw `WorkflowBuildError` with a hint pointing
at the likely fix. Runtime invariants remain plain `Error`.

* Polish friendlier-errors rendering: drop functionName leak, simplify docs link, redirect stack

- Drop the readonly `functionName` param-property on context-error classes so
  util.inspect no longer prints a trailing `{ functionName: 'foo()' }` block.
- Replace the `DocLink` ("label: https://…") shape with a plain `DocsUrl`
  template-literal type. Error output now renders a single clean line:
  `docs: https://…` (new `Ansi.docs` helper) instead of the noisier
  "note: Read more about foo(): https://…".
- Add throw helpers (`throwNotInWorkflowContext`, etc.) that call
  `Error.captureStackTrace(err, stackStartFn)` on V8 engines so the top frame
  of the thrown error points at the user's call site instead of at the gate
  function inside the framework. Callers pass themselves as the boundary.
- Refactor `defineHook()` (both root and `/workflow`) to use named function
  closures rather than `this.create`/`this.resume`, since the stack redirect
  relies on a stable function identity that survives destructuring.
- Update context-errors.test.ts to snapshot the new `docs:` framing and to
  add a regression test asserting the top stack frame is the user call site.

* Consolidate friendlier-errors stack: fix ANSI leak + non-retry semantics

Addresses PR review feedback across the 8-phase friendlier-errors stack and
fixes issues surfaced by manual testing (createHook() inside a step):

- ANSI no longer leaks into .message / .stack. Context-violation errors
  now store plain text on .message and render the colored framed form
  lazily via [util.inspect.custom] / toString(). Structured logs, log
  drains, CBOR-serialized events, and JSON payloads no longer contain
  raw \x1B[...m bytes.

- Context violations are now fatal. ContextViolationError sets
  fatal = true; FatalError.is(err) recognizes any error with a
  fatal: true own property. Calling createHook() from a step no longer
  burns three retry attempts on a guaranteed-to-fail context violation.

- Ansi helpers moved to @workflow/errors/ansi subpath so imports from
  @workflow/errors no longer pull chalk into consumers that only want
  error classes (addresses reviewer VaguelySerious).

- Shared redirectStackToCaller helper in packages/core/src/capture-stack.ts,
  used by both context-errors.ts and workflow/get-workflow-metadata.ts
  (addresses Copilot review on vercel#1849).

- Structured framed content: ContextViolationError now takes a structured
  FramedContent (title segments + detail branches) and renders plain/pretty
  from the same source of truth.

Tightens the eight existing phase changesets to 1-2 sentences each and adds
four new scoped changesets (errors-ansi-subpath, context-errors-plain-message,
context-errors-fatal, capture-stack-shared) for the followup fixes, so the
final changelog history stays readable.

* test: update step-handler mocks for scoped forRun() logger

The runtime logger now uses .forRun(runId, name, {stepId, stepName})
to attach scope context, so 409-handling log calls no longer repeat
{workflowRunId, stepId} in every metadata bag — those live on the
scoped logger instance. Update the mock to return itself from forRun()
and tighten assertions to check both the log args (errorName/errorMessage)
and the forRun() scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Mark SerializationError fatal + route dehydration through step-failure path

SerializationError now carries readonly fatal = true. Step-return
dehydration is wrapped inside the user-code try/catch so that the
resulting error flows through userCodeFailed → step_failed →
FatalError.is() short-circuit instead of bubbling up as HTTP 500 and
triggering a queue retry loop. Retrying a step that returned a non-POJO
is guaranteed to fail the same way, so this saves ~20s and 3 near-
identical error blocks per serialization failure.

* Add logging snapshot tests + manual-test artifacts

Snapshot tests lock in the exact shape of:
- describeError() payloads (attribution, errorCode, hint) for every
  classification — plain Error, SerializationError, context-violation,
  WorkflowRuntimeError, REPLAY_TIMEOUT, MAX_DELIVERIES_EXCEEDED.
- The scoped-logger call signature for the two canonical runtime
  failure paths (fatal-bubble and hit-max-retries), so refactors of
  forRun() / child() metadata merging can't silently change what users
  see in their log drains.

SerializationError now also has a direct test for readonly fatal=true
+ FatalError.is() recognition.

pr-artifacts/ contains real log-output snapshots from running the
nextjs-turbopack workbench against five error scenarios. These are
reference material for reviewers and are flagged to be removed before
merge.

* Readable step-fatal logs: inline stack + friendly step/workflow names

The step-level fatal-error log used to embed the full stack trace inside
an `errorStack` string field in the metadata object, so util.inspect
rendered it as a quote-escaped, line-continuation blob when the log
hit the terminal — unreadable in practice. Move framing + stack into
the log *message* (matching the workflow-level log in runtime.ts) and
keep the metadata object compact with only the indexable structured
fields (`errorAttribution`, `errorName`, `errorMessage`, `hint`,
IDs). Log drains still get the same keys; humans now see a readable
stack trace.

Also introduce `formatStepName` / `formatWorkflowName` in
`@workflow/utils` that render machine names
(`step//./workflows/1_simple//add`) as `add (./workflows/1_simple)` in
log framings, using the existing `parseStepName` / `parseWorkflowName`
parsers. Applied to step-fatal, hit-max-retries, exceeded-max-retries,
and workflow-threw log sites.

Artifacts in pr-artifacts/ updated to show the new output shape, and
renamed .log → .md since they're Markdown and IDE previews are nicer
that way.

* Opinionated pretty formatter for runtime structured-log metadata

Replace util.inspect's default object dump (which quote-escapes
multi-line stacks and paragraph hints into a single-line JSON-y blob)
with a workflow-aware formatter that composes the entire log line
into a single string passed to console.error / console.warn.

Highlights of the new output:
- Per-run / per-step IDs render with their parsed friendly names so
  users see `wrun_… · simple (./workflows/1_simple)` instead of just
  the raw `workflowName: 'workflow//./workflows/1_simple//simple'`.
- Color-coded attribution badge (user error red / sdk error magenta)
  paired with the error class in bold.
- Hints render as a paragraph under `hint:` rather than a backslash-
  `\n`-escaped string.
- Drops redundant fields (errorStack always; errorMessage when it's
  already in the parent message) to avoid double-printing.
- Unknown fields fall through as a sorted `key  value` tail so we
  never silently drop log information.

@workflow/errors/ansi gains bold/red/magenta helpers used by the
formatter. The web / web-shared packages don't consume stderr — they
read structured event payloads from the World event log — so this is
presentation-only at the runtime layer.

* ci(benchmarks): disable pnpm cache for getCommunityWorldsMatrix

The job never runs `pnpm install` (it just calls `node` against a
checked-in script), so the pnpm store path never exists. The post-job
`actions/setup-node@v4` cache-save then fails with `Path Validation
Error: Path(s) specified in the action for caching do(es) not exist`
and red-X's the entire job even though the matrix step succeeded.

The setup-workflow-dev composite already has a `cache-pnpm` opt-out
input for this exact case — wire it through here.

* Address PR review comments: inspect dedup, cause leak, retry-loop tests

- ContextViolationError: util.inspect(err) duplicated every framed detail
  line because the stack-tail strip only sliced the first message line.
  V8's Error.stack reads `Name: messageLine1\n  messageLine2\n  at ...`,
  so for our multi-line `title\n╰▶ docs: …` messages every detail line
  was getting prepended twice (once in the pretty form, once via the
  unsliced message tail). Count the actual message lines and slice past
  all of them. Repro test asserts `╰▶ docs:` appears exactly once.

- WorkflowError: stop assigning `cause: undefined` as an enumerable own
  property when no cause is provided. Subclasses (every error in this PR)
  inherit the parent constructor; the unconditional assignment polluted
  `util.inspect(err)` output with `{ cause: undefined, … }` on every
  no-cause instance. The `super(...)` call already conditionally sets
  `.cause` non-enumerably when `options.cause` is provided.

- step-handler.test.ts: add a regression-gate suite that exercises the
  fatal-vs-retryable retry-loop wiring directly. Asserts that an error
  with `fatal: true` produces exactly one `step_failed` event with no
  `step_retrying`, and that a non-fatal `Error` retries via
  `step_retrying` on early attempts and emits `step_failed` once the
  retry budget is exhausted. Catches the silent-regression case where
  `fatal = true` is removed from a context-violation error class but
  the `FatalError.is()` unit tests stay green.

* Consolidate changesets + remove pr-artifacts

Address review feedback to drastically shorten the changesets — fold
the 15 file-by-file entries into a single user-facing changeset for
@workflow/core / errors / builders / utils. Also drop the pr-artifacts/
folder (reviewer-only log captures, no longer needed).

* Polish runtime error logging: layout, stack trim, hint consolidation

Five user-driven fixes from manual smoke-testing of vercel#1849:

1. Logger layout. composeLogLine() now puts the structured-fields block
   (attribution badge, run/step IDs, error code) **between** the framing
   line and the stack body, instead of after it where 30+ lines of stack
   buried the most useful information. The framing stays at the top,
   stack at the bottom, structured info readable at a glance.

2. Stack trim. Drops framework-internal frames (`node_modules/.pnpm/`,
   `node:internal/`, Turbopack-bundled `node_modules__pnpm_*` chunks,
   `_next_dist_*` chunks) and caps the surviving frame count at 6
   so the stack stays compact even on heavy async wrappers. Suppressed
   runs emit one summary line so users know the trim happened.

3. Wrapper-route noise. The nextjs-turbopack workbench's start route
   was catching `WorkflowRunFailedError` rejection on
   `Promise.race([readLoop(), run.returnValue])` and re-logging it via
   `console.error('Error in workflow stream:', error)` plus
   `controller.error(error)` — which then triggered Next.js's
   `⨯ failed to pipe response` overlay. The SDK already logs the
   failure cleanly upstream and the runId is on the response header, so
   the wrapper now closes the SSE stream cleanly on
   WorkflowRunFailedError.

4. Consistent framed `╰▶ hint:` / `╰▶ docs:` layout for all errors
   that carry a hint or docs slug. WorkflowError, SerializationError,
   and WorkflowBuildError now share one `appendFramedDetails` helper
   matching the box-drawing structure that ContextViolationError
   already used. Was: blank-line-separated `Learn more: <url>`. Now:
   one tree, indistinguishable from context-violation rendering.

5. Drop the duplicate logger-side `hint` field. Hints now live on the
   error message only — actionable hints get serialized into the event
   log, rehydrated on the workflow side, and shown in observability
   automatically. The previous logger-only hint duplicated stderr but
   never made it past the step boundary.

   Updated SerializationError hint to point at the foundations doc
   ("Ensure you're returning workflow serializable types. Check the
   serialization docs to see what's serializable:
   https://workflow-sdk.dev/docs/foundations/serialization") instead
   of the hardcoded `(plain objects, arrays, primitives, …)` list,
   which drifted out of sync as the supported types grew. Same hint
   reuses for step args, workflow args/return, stream messages, and
   any other site that goes through `formatSerializationError`.

Also retitled the retry summary `3 retries` → `3 max retries` since
"3 retries" next to "4 attempts" was ambiguous (already-happened vs.
budget).

* Trim error-card title + drop machine step name from persisted error

- ErrorStackBlock (web observability): show just the first non-empty
  trimmed line of the error message in the card title with single-line
  truncation. Multi-line messages (`Failed to serialize step return
  value\n╰▶ hint: …`) were rendering the entire framed body in the
  title, pushing the copy button off-screen and burying the
  scannability of the headline. Full message stays in the body via
  the stack (V8 prepends `Name: message` to `Error.stack`), so no
  information is lost; hover-tooltip exposes the full title text.

- Persisted error message: drop the `Step "step//./.../foo"` machine
  name from `Step failed after N retries: …` and `Step exceeded max
  retries (…)` strings. Observability already attributes the event
  to a specific step via the UI tree, and the CLI logger emits the
  friendly `Step foo (./...) hit max retries` framing on its own
  line. Embedding the raw `step//./...` machine name in the persisted
  message text was duplicate noise.

* Update .changeset/friendlier-errors.md

Co-authored-by: Peter Wielander <mittgfu@gmail.com>
Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>

* Update .changeset/pretty-log-format.md

Co-authored-by: Peter Wielander <mittgfu@gmail.com>
Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>

* Update SerializationError snapshot tests for slug-less message

The class no longer attaches a slug-based `╰▶ docs:` line — the
foundations URL is embedded directly in the hint via the
`formatSerializationError` helper in @workflow/core. Update the test
expectations accordingly:

- bare-title case is now a single line (no docs link)
- hint case renders one `╰▶ hint: …` branch (no second branch)

* Update serialization.test.ts hint assertions for foundations URL

Four `should throw error for an unsupported type` cases were still
asserting on the old hardcoded type list. Update to the new hint
phrasing that points at the foundations doc, matching the change in
`formatSerializationError` (`packages/core/src/serialization/errors.ts`).

---------

Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Peter Wielander <mittgfu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant