Skip to content

Enable Cloudflare tracing + bind rayId/traceparent/instanceId onto logs#111

Merged
neekolas merged 8 commits intoxmtplabs:mainfrom
xmtp-coder-agent:fix/issue-108
Apr 20, 2026
Merged

Enable Cloudflare tracing + bind rayId/traceparent/instanceId onto logs#111
neekolas merged 8 commits intoxmtplabs:mainfrom
xmtp-coder-agent:fix/issue-108

Conversation

@xmtp-coder-agent
Copy link
Copy Markdown
Collaborator

@xmtp-coder-agent xmtp-coder-agent commented Apr 19, 2026

Resolves #108

Summary

Enable Cloudflare Workers tracing and extend the structured JSON logger so every log line carries the identifiers needed to join a Worker request to its TaskRunnerWorkflow instance and to any upstream W3C trace context.

Changes

  • wrangler.toml — added [observability.logs] invocation_logs = true and [observability.traces] enabled = true, and documented head_sampling_rate = 1 explicitly on both blocks so sampling is reviewable in future diffs.
  • src/utils/logger.ts — added a pure parseTraceparent(header) helper that returns { traceId, spanId } | null after validating W3C traceparent format (segment count, version != "ff", lengths 2/32/16/2, lowercase hex, non-zero trace-id).
  • src/main.ts — per-request logger now binds rayId (from cf-ray) and traceId / spanId (from traceparent) via conditional spread. Absent or malformed headers are omitted rather than set to "unknown" so log queries are unambiguous. Existing deliveryId + eventName bindings preserved.
  • src/workflows/task-runner-workflow.tsTaskRunnerWorkflow.run() now chains .child({ instanceId: event.instanceId }) onto the logger so every service / step emission carries the workflow instance ID. A "Workflow run started" breadcrumb is emitted at the top of run() so Workers Logs always has an instanceId-tagged anchor for the run, even when all step results are cached.
  • Tests — unit tests for parseTraceparent (19 cases covering happy path + edge cases), Worker-level tests asserting JSON log shape for present / absent / malformed headers, and a workflow introspection test asserting instanceId appears on at least one emitted line.

Log-query impact

Operators can now join logs on a single field:

  • instanceId = "task_requested-<repo>-<n>-<delivery>" surfaces every step log plus the Worker's "Webhook processed" summary.
  • rayId = "8f...-SJC" / traceId = "<hex>" / spanId = "<hex>" filter to the Worker request path.
  • deliveryId = "<hex>" continues to work exactly as before.

Test plan

  • npm run check — 294/294 tests pass, biome clean, typecheck clean.
  • New unit tests for parseTraceparent covering empty / whitespace / lone dashes / trailing hyphen / uppercase / too-long / too-short / all-zero / malformed-flags / forward-compat versions.
  • New integration-style test that fires a signed webhook with and without cf-ray / traceparent and parses the emitted JSON log line.
  • New workflow test asserting instanceId is on captured console.log emissions.
  • Staging: wrangler tail, fire a webhook, confirm the new fields appear in Workers Logs.

Needs Human Input

Two assumptions were made while writing the spec and are called out in the issue comments:

  1. Trace head sampling rate = 1 (100%). If cost is a concern, drop to 0.1 in wrangler.toml.
  2. Paid tracing feature (billable since 2026-03-01). Assumed the plan covering this Worker is okay paying for tracing since the issue asked to enable it.

Update the wrangler.toml block if either assumption needs changing.

🤖 Generated with Claude Code

Note

Enable Cloudflare tracing and bind rayId/traceparent/instanceId onto logs

  • Adds parseTraceparent() to logger.ts to validate and parse W3C traceparent headers into traceId/spanId.
  • The GitHub webhook handler in main.ts now extracts cf-ray and traceparent headers and binds rayId, traceId, and spanId onto the request-scoped logger.
  • When dispatching to a workflow, a source.trace object is attached to the event payload if any tracing fields are present.
  • The TaskRunnerWorkflow in task-runner-workflow.ts binds instanceId and any propagated trace fields onto its logger and emits a "Workflow run started" breadcrumb.
  • Enables Cloudflare observability and traces sampling in wrangler.toml.

Macroscope summarized e579334.

xmtp-coder-agent and others added 6 commits April 19, 2026 03:15
Chain `.child({ instanceId: event.instanceId })` on the workflow logger
so every log line emitted from any step carries the instanceId, enabling
Workers Logs correlation across replays. Services (GitHubClient,
CoderService) inherit the binding via closure — no service-signature
changes needed.

Also adds a replay-safe `logger.info("Workflow run started", { type })`
breadcrumb at the top of `run()`. This guarantees at least one
instanceId-tagged line is emitted even when all downstream side-effects
are cached in `step.do` results (Option B from the task plan — needed
because mocked-step tests never let services emit logs themselves).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread src/utils/logger.ts
const HEX_RE = /^[0-9a-f]+$/;
const ZERO_TRACE_ID = "00000000000000000000000000000000";

export function parseTraceparent(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudflare doesn't have some open source library to help with this? Feels like a thing that we shouldn't have to implement from scratch

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this. Summary of what's on the shelf today:

  • Cloudflare first-party — nothing for request-path code. When [observability.traces] is on, the runtime auto-propagates traceparent on outbound fetch, but the parsed context is only exposed to Tail Workers (SpanContext in workers-types is Tail-only). To put traceId/spanId into our structured logs in the fetch handler, we have to pull it off the incoming header ourselves.
  • @opentelemetry/core — exports W3CTraceContextPropagator publicly, but to use it you build a Context + a TextMapGetter adapter just to read two hex strings. The parseTraceParent(header) helper that would be a clean fit is an internal export, not a public API. Plus @opentelemetry/api peer dep + bundle weight for ~15 lines of work.
  • tctx (maraisr) — cleanest API match (traceparent.parse(header)), MIT, zero deps. But ~2.5k weekly downloads, single maintainer — supply-chain trade for 20 LOC feels lopsided.
  • traceparent (elastic) — inactive, no release in the last year.
  • @microlabs/otel-cf-workers / cloudflare/workers-honeycomb-logger — the Workers-community references I looked at either hand-roll extraction or go through the full OTel stack. No one's pulling a small dedicated parser.

I left the 20-line inline parser in place and added a comment at src/utils/logger.ts:60 noting the options I evaluated and why, plus a link to W3C Trace Context §3.2 so the validation rules are traceable. Happy to swap to tctx if you'd rather lean on a package — just want to flag the DL/maintainer profile first. If/when Cloudflare exposes the parsed context in the request path, the right move is to delete the helper, not to swap in a library.

Pushed in 4e37842.

Answers a reviewer question about using a library. Cloudflare does not
expose parsed traceparent to request-path user code, @opentelemetry/core
is overkill and its parse helper is internal, tctx has <3k weekly DL
with one maintainer, and elastic/traceparent is inactive. Comment links
the W3C spec so future readers can verify the validation rules.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@neekolas
Copy link
Copy Markdown
Contributor

Will this change pass the tracing context from the main worker that receives the webhook to the workflow run that executes the work?

Answers a reviewer question: before this, only the Worker log line
carried trace context — the Workflow logger only had `instanceId`, so
querying by `traceId` in Workers Logs missed every step log. Now the
Worker attaches the parsed `cf-ray` + `traceparent` fields onto
`payload.source.trace` before calling `WORKFLOW.create`, and
`TaskRunnerWorkflow.run()` spreads them onto its bound logger. A query
by `traceId` now surfaces the webhook line plus every step log in the
dispatched workflow run.

`EventSource.trace` is optional, so existing fixtures and event
construction sites that omit it continue to type-check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@xmtp-coder-agent
Copy link
Copy Markdown
Collaborator Author

Good catch — it didn't before this commit. The Worker was parsing cf-ray / traceparent and binding them onto reqLogger, but the Workflow side only had instanceId, so a log query by traceId would miss every step log.

Fixed in e579334:

  • src/events/types.ts — added optional trace?: { rayId?, traceId?, spanId? } to EventSource. Optional so existing fixtures and router construction sites still type-check.
  • src/main.ts — after the router returns an Event, merges the parsed trace fields onto result.source.trace before calling WORKFLOW.create. Only sets the object when at least one field exists, so absence stays absent rather than {}.
  • src/workflows/task-runner-workflow.tsrun() now reads payload.source.trace and conditionally spreads rayId / traceId / spanId onto the same .child() call that already binds instanceId. Every service + step emission inherits all four fields via closure.

Tests added in the same commit:

  • src/main.test.ts — 3 tests that stub TASK_RUNNER_WORKFLOW.create to capture the params and assert source.trace is set (or not) based on which headers the incoming webhook carried. Uses the issues-assigned fixture so the flow actually dispatches (the existing tracing tests use workflow-run-success which skips).
  • src/workflows/task-runner-workflow.test.ts — 2 introspection tests: one constructs a TaskRequestedEvent with source.trace populated and asserts the captured JSON log lines carry rayId / traceId / spanId; the other confirms the fields are absent from every emitted line when source.trace is not set.

Net result: traceId = X in Workers Logs now surfaces the Worker's "Webhook received" / "Webhook processed" lines plus every downstream step log for the dispatched workflow run. npm run check green — 299/299 tests pass.

@neekolas neekolas marked this pull request as ready for review April 20, 2026 05:14
@neekolas neekolas merged commit 80e899d into xmtplabs:main Apr 20, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable tracing

2 participants