Skip to content

feat: migrate to OTel GenAI semconv + new /agents/otel/v1/traces endpoint#51

Merged
rgao-coreweave merged 15 commits into
mainfrom
otel-genai-migration
May 19, 2026
Merged

feat: migrate to OTel GenAI semconv + new /agents/otel/v1/traces endpoint#51
rgao-coreweave merged 15 commits into
mainfrom
otel-genai-migration

Conversation

@rgao-coreweave
Copy link
Copy Markdown
Contributor

@rgao-coreweave rgao-coreweave commented May 13, 2026

Summary

  • Replace the Weave JS SDK (call/start + call/end API) with the OTel JS SDK, emitting OTLP/HTTP-protobuf to /agents/otel/v1/traces so spans land in the Weave Agents observability surface.
  • Map Claude Code's session/turn/tool/subagent/permission tree onto the OTel GenAI semantic conventions (invoke_agent, chat, execute_tool).
  • Drop the weave npm dependency; add the standard @opentelemetry/* packages.

Span mapping

Before (Weave SDK) After (OTel GenAI semconv)
claude_code.session — (no root session span; turns stitched via gen_ai.conversation.id)
claude_code.turn invoke_agent claude-code (root — one trace per user prompt)
— (usage aggregated on turn) chat <model> — one per LLM API call, emitted at Stop from parsed transcript with backdated timestamps and per-call usage
claude_code.tool.<name> execute_tool <tool_name>
claude_code.subagent.<type> (nested inside Agent tool span) invoke_agent <subagent_type> — child of the turn, sibling of regular execute_tool spans. The spawning Agent tool call does NOT emit an execute_tool span; the inner invoke_agent is the agent invocation. Back-pointer to the spawning tool_use_id via weave.claude_code.subagent.spawning_tool_call_id.
claude_code.permission_request (child span) weave.permission_request + weave.permission_resolved span events on the parent execute_tool span (split so each is stamped at the time it actually happened)
weave.compaction.* attributes on the turn span open at compaction time (or the next turn, if compaction fires between turns)

Span tree

invoke_agent claude-code                  (turn — root, one trace per user prompt)
├─ chat <model>                           (each LLM API call)
├─ execute_tool <tool_name>               (Read, Bash, Grep, ...)
└─ invoke_agent <subagent_type>           (subagent dispatched via the `Agent` tool)
   ├─ chat <model>                        (subagent LLM calls)
   └─ execute_tool <tool_name>            (tools the subagent ran)

This matches the Weave Agents chat view's reference structure (weave/trace_server/agents/chat_view.py and tests/trace_server/test_genai_chat_view.py::test_subagent_spans_render_inline_with_agent_label_inheritance), where nested invoke_agent spans render as their own agent_start lifecycle marker with the inner agent's identity, distinct from a tool-call event.

Attribute namespace

  • gen_ai.* — everything in the OTel GenAI semconv catalog: gen_ai.operation.name, gen_ai.provider.name (chat spans only, derived from the model id via providerFromModel()), gen_ai.agent.name, gen_ai.agent.id, gen_ai.agent.version, gen_ai.conversation.id, gen_ai.request.model, gen_ai.response.model, gen_ai.response.id, gen_ai.response.finish_reasons, gen_ai.usage.*, gen_ai.tool.*, gen_ai.input.messages, gen_ai.output.messages, gen_ai.output.type, error.type.
  • weave.claude_code.* — Claude-Code-specific extensions with no semconv equivalent: session.id (current process's session id, debug breadcrumb), cwd, source, plugin.version, turn.number, turn.tool_count, orphan_reason, display_name, subagent.spawning_tool_call_id (back-pointer from the inner invoke_agent <subagent_type> span to the parent's Agent tool_use_id).

gen_ai.request.model and gen_ai.agent.version are stamped on every invoke_agent span so the Weave Agents UI's agent_start lifecycle card and Version column populate at every agent level.

Wire format

  • POST ${WANDB_BASE_URL}/agents/otel/v1/traces
  • Content-Type: application/x-protobuf
  • wandb-api-key: <WANDB_API_KEY> header
  • Resource attributes: wandb.entity, wandb.project, service.name=claude-code, service.version=<plugin version>

Trace continuity (resume)

Each user prompt produces its own root trace. Multi-turn conversations are stitched together server-side via gen_ai.conversation.id.

For resumed sessions (claude --continue, claude --resume <id>, daemon restart mid-session), Claude Code generates a new process-level session_id but stamps every transcript line with forkedFrom.sessionId pointing at the immediate parent session. At SessionStart the daemon walks the forkedFrom.sessionId chain across sibling transcript files to the root ancestor and uses that root id as gen_ai.conversation.id on every span the resumed session produces. The current process's session id is still stamped on each span as weave.claude_code.session.id so the resume is visible in the trace.

The walk has a hard depth cap and a cycle guard. If a parent transcript is missing on disk (e.g., resumed across machines) the walk stops at the highest recorded parent — still a better stitching key than the current process's session id. The first hop retries up to 4×100 ms while the transcript first line is flushing; ancestor reads are not retried because ancestor transcripts are static.

Fresh (non-forked) sessions are unaffected — the chain walk returns the current session id, and conversation.id equals session.id.

Subagent correlation

SubagentStart carries no pointer back to the spawning Agent tool's tool_use_id. The daemon correlates the subagent's runtime agent_id to the open subagent tracker by content: at PreToolUse(Agent + subagent_type) it records the sha256 of the firing prompt; at SubagentStart it reads the subagent transcript's first user-message line (byte-identical to the parent's tool_input.prompt) and matches by (promptHash, subagent_type). No temporal window; deterministic across loaded CI and back-to-back identical Agent calls.

If correlation fails (parent's PreToolUse never fired, or the firing prompt couldn't be read from the subagent transcript), an orphan invoke_agent span is created as a direct child of the current turn span and closed at SubagentStop. The weave.claude_code.orphan_reason attribute records why correlation didn't match.

Files changed

  • New: src/genaiSpans.ts — attribute key constants, span builders (startTurnSpan, startInvokeAgentSpan, startToolSpan, emitChatSpan), span event emitters, transcript-to-chat-span conversion
  • Rewritten: src/daemon.tsNodeTracerProvider + BatchSpanProcessor + OTLPTraceExporter replace WeaveClient / saveOp / saveCallStart / saveCallEnd; per-session state stores Span objects instead of UUIDv7 call IDs; new PreCompact handler; resolveConversationId() walks forkedFrom.sessionId to the root ancestor for resume stitching; subagent dispatch emits a nested invoke_agent <subagent_type> span (not an execute_tool Agent wrapper) and correlates by content-based prompt hash
  • Updated: src/parser.ts — adds per-LLM-call detail (assistantCalls()) carrying per-call timestamps, model, usage, content blocks, response id, and finish reason so chat spans can be emitted with backdated start/end times
  • Updated: src/transcriptFile.ts — adds readFirstTranscriptLine() for safe (O_RDONLY|O_NOFOLLOW, regular-file-only) first-line reads of ancestor and subagent transcripts
  • Removed: src/traceRegistry.ts — no longer needed; cross-process continuity is now derived from forkedFrom.sessionId in the transcript itself rather than a separate disk-side mapping
  • Updated: package.json / package-lock.json — drops weave; adds @opentelemetry/api, @opentelemetry/sdk-trace-node, @opentelemetry/sdk-trace-base, @opentelemetry/resources, @opentelemetry/semantic-conventions, @opentelemetry/exporter-trace-otlp-proto (pinned to ^0.218.0 — pulls in protobufjs ≥ 8.0.2 to clear the Socket Security CVEs flagged on earlier revisions of this PR)
  • Updated: README.md — refreshed "What Gets Traced" hierarchy

Test plan

  • npx tsc --noEmit — clean
  • npm audit — 0 vulnerabilities post protobufjs bump
  • In-process span smoke test (InMemorySpanExporter) — turn/tool/subagent/chat spans emit with correct kinds, attributes, parent linkage, and shared traceId; permission request + resolved events land on tool span; compaction attributes land on turn span
  • End-to-end daemon smoke test — booted daemon child process pointed at a local mock OTLP HTTP server, fired full hook sequence (SessionStart, UserPromptSubmit, PreToolUse, PermissionRequest, PostToolUse, PreCompact, Stop, SessionEnd), confirmed POST /agents/otel/v1/traces with Content-Type: application/x-protobuf and wandb-api-key header; raw-protobuf scan of the body found gen_ai.operation.name, invoke_agent, execute_tool, chat, weave.permission_request, weave.compaction, wandb.entity, wandb.project
  • Run against a live Weave trace server
  • Verify subagent span tree matches the canonical Weave Agents reference (test_subagent_spans_render_inline_with_agent_label_inheritance) via scripts/smoke/verify-subagent-shape.mjs: turn → execute_tool sibling → nested invoke_agent <subagent_type> sibling → chat under subagent; gen_ai.agent.id, weave.claude_code.subagent.spawning_tool_call_id, gen_ai.input.messages, gen_ai.output.messages all set on the inner invoke_agent
  • Verify resume conversation stitching — resolveConversationId() exercised against three real on-disk transcripts: a resumed session resolves to its root ancestor, an already-root session resolves to itself, and a fresh non-forked session resolves to itself

Breaking changes

  • Existing dashboards / saved views that filter on claude_code.* op names will not see traces from this version.
  • This is a different ingest path (/agents/otel/v1/traces) and a different backend (Agents observability vs. the Calls API), so traces produced by this version appear in a different surface in the Weave UI.

🤖 Generated with Claude Code

Replace the Weave JS SDK (call/start + call/end API) with the OTel JS SDK
emitting protobuf OTLP to /agents/otel/v1/traces. Spans follow the OTel
GenAI semantic conventions so they appear in the Weave Agents observability
surface.

Span mapping:
- session → invoke_agent claude-code (root)
- turn    → invoke_agent claude-code (one per user prompt)
- chat    → chat <model> (one per LLM API call, emitted at Stop from parsed
            transcript with backdated timestamps)
- tool    → execute_tool <tool_name>
- subagent → invoke_agent <subagent_type> (flat under turn; back-pointer
             attribute to spawning Agent tool)

Permission requests appear as `weave.permission_request` span events on the
parent tool span; PreCompact appears as a `weave.compaction` event on the
session span.

Attribute namespace: gen_ai.* for everything in the OTel GenAI semconv,
weave.* for Claude-Code-specific extensions (session.id, cwd, source,
turn.number, tool.use_id, subagent.spawning_tool_call_id, ...).

Trace continuity on session resume is preserved by forcing the new session
span's traceId via a synthetic remote parent context.

Drops the `weave` npm dependency. Adds @opentelemetry/{api, sdk-trace-node,
sdk-trace-base, resources, semantic-conventions, exporter-trace-otlp-proto}.

Design spec: docs/superpowers/specs/2026-05-12-otel-semconv-migration-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 13, 2026

@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 13, 2026

All alerts resolved. Learn more about Socket for GitHub.

This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored.

View full report

@w-b-hivemind
Copy link
Copy Markdown

w-b-hivemind Bot commented May 13, 2026

HiveMind Sessions

21 sessions · 5h 4m · $171

Session Agent Duration Tokens Cost Lines
Hello World From Sf
80124e7a-03c1-4226-b050-a7a8eec58c90
claude 58m 212.1K $33 +1282 -221
Subagent Correlation Without Proximity Window
a69ce146-6f5a-41d5-99cd-faaa3fad2179
claude 33m 98.6K $8.00 +106 -24
Fix PR Issues and Trace Hierarchy
e5224b1a-0d08-442c-a810-1c1270f38d19
claude 26m 108.7K $13 +412 -82
Weave Claude Plugin Setup and Configuration
2198b476-314b-46b1-85bc-2b38269210be
claude 5m 14.8K $1.61 +0 -0
Debugging Claude Plugin Daemon Socket Issue
121d80f6-6705-4151-b585-6648e93778ac
claude 6m 26.9K $2.60 +0 -0
Investigating Missing Sessions in Weave Plugin
37e17685-9953-44b2-a376-649e4295c6e9
claude 7m 33.3K $2.15 +479 -0
PR Review Fixes and Investigation
2ecc5fbf-9442-4bde-a258-3adcea573566
claude 39m 162.5K $33 +844 -390
[/clear
        <c](https://hivemind.wandb.tools/sessions/3ff2a089-739b-5a11-aeca-9b0ab7dee268)<br>`7e58dd0e-7110-47e7-a7b4-9417be64014b` | claude | 5m | 24.1K | $2.07 | +0 -0 |

| Weave Claude Plugin Repo Context Inquiry
916162c0-c186-4ee4-8cc6-ac4fc9532c55 | claude | 45s | 2.0K | $0.33 | +0 -0 |
| Review and Push OTel GenAI Migration Branch
ed112fd2-d275-444e-8802-5b1905d03926 | claude | 1m | 2.7K | $0.63 | +0 -0 |
| Claude Opus Effort Levels Research
3973c2ac-47d4-4209-9be4-48960d52ae48 | claude | 28s | 2.7K | $0.34 | +0 -0 |
| Version Bump and Semver Marker Wiring
5a67b0b6-2cda-4d9a-b624-2e4d1284ad22 | claude | 11m | 44.8K | $5.74 | +33 -14 |
| Bump Package Version for Hard Cutover
ab7836ea-722f-4b9e-bc61-43eed14bed8b | claude | 1m | 4.6K | $0.69 | +8 -8 |
| Write PR Description for OTel Migration
b15d169f-0613-4030-a0e4-d4f418da4063 | claude | 3m | 16.1K | $1.06 | +0 -0 |
| Debugging Talk-to-Tim Skill Spawn
5c4e9656-45d7-4b96-b8f9-9d7503089787 | claude | 34s | 2.4K | $0.28 | +0 -0 |
| Talk to Tim Skill Implementation and Refinement
569f2d63-579a-4b98-a2a0-51096bccf6ee | claude | 19m | 92.9K | $10 | +151 -91 |
| Multi-Agent Talk-to-Tim Architecture Implementation
23ea869a-6a52-42ab-99df-4a89b54ce7cf | claude | 35m | 161.0K | $25 | +194 -146 |
| Clean Up PR Move File to rgao
fdb97bcd-d25c-4d43-b405-285a4f0c9e2b | claude | 1m | 2.5K | $0.34 | +0 -0 |
| Hello World Test Session
eda2f840-5d37-424d-b66d-772f610f56f8 | claude | 9s | 51 | $0.10 | +0 -0 |
| Debugging Missing Plugin Version in OTel Spans
bbae2b9f-a5b2-4b3d-9bc7-cb75eea066c9 | claude | 5m | 19.2K | $2.97 | +8 -0 |
| OTel GenAI Migration for Claude Plugin Tracing
a28cbeca-6100-4130-b4d5-b0bcf2addf13 | claude | 40m | 193.1K | $28 | +1886 -759 |
| Total | | 5h 4m | 1.2M | $171 | +5403 -1735 |

Screenshots

View all sessions in HiveMind →

Run claude --resume 80124e7a-03c1-4226-b050-a7a8eec58c90 to pickup where you left off.

rgao-coreweave and others added 2 commits May 12, 2026 23:24
The Weave Agents UI populates its Version column from
`gen_ai.agent.version` (alias `weave.agent.version`). Without it, the
Version column shows "(no version)" for every agent. Set it to the
plugin VERSION on session, turn, and subagent invoke_agent spans.

The plugin already exposed VERSION via `service.version` (resource) and
`weave.claude_code.plugin.version` (custom span attr); neither is what
the Agents UI keys off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rgao-coreweave
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request May 13, 2026
@rgao-coreweave rgao-coreweave marked this pull request as ready for review May 13, 2026 18:51
@rgao-coreweave rgao-coreweave requested review from a team and chance-wnb May 13, 2026 18:51
rgao-coreweave added a commit that referenced this pull request May 13, 2026
Pulls in protobufjs >= 8.0.2, which fixes the 7 CVEs flagged by Socket
Security on PR #51 (high-severity code injection and prototype pollution
plus several DoS vectors in 8.0.0-8.0.1). Other @opentelemetry/* deps
already satisfy 0.218.0's peer constraints (core/resources/sdk-trace-base
all at 2.7.1). npm audit reports 0 vulnerabilities post-bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .claude-plugin/marketplace.json Outdated
rgao-coreweave and others added 5 commits May 13, 2026 13:07
- gen_ai.provider.name only on chat spans, derived from model id via
  providerFromModel() — not stamped as a constant on tool/turn/session
- replace Math.random() span-id generator with crypto.randomBytes(8)
- isValidTraceId / isValidSpanId in genaiSpans.ts; remove duplicated
  hex regexes in daemon.ts and traceRegistry.ts
- split weave.permission_request into request + resolved events stamped
  at the time each actually happens; drop the deferred-write workaround
- collapse subagentTrackers + subagentByAgentId into a single
  SubagentTracking class with byAgentId / findUnmatchedByProximity

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The rgao/ directory is excluded by the global core.excludesFile but the
spec was committed before that rule existed; untrack it so it stops
appearing in the PR diff. File preserved locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The orphan path (SubagentStart with no matching PreToolUse) was stamping
toolUseId = agentId and spawningToolCallId = '' to satisfy required
fields. Make those fields optional on SubagentTracker so orphans simply
don't carry them, and omit the spawning_tool_call_id span attribute when
no spawning tool exists rather than emitting an empty string.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- drop dead permissionSuggestions field from PendingToolCall; pass
  payload value directly to addPermissionRequestEvent
- rename permissionStartedAt: Date -> permissionRequested: boolean
  (the field served only as a sentinel after the event-split refactor)
- extract resolvePermissionIfPending() helper — three near-identical
  call sites at PostToolUse, PostToolUseFailure, SessionEnd cleanup
- disambiguate duplicate "invoke_agent claude-code" span name: session
  uses :session suffix, turn uses :turn suffix
- drop weave.claude_code.tool.use_id from tool spans (was identical
  to gen_ai.tool.call.id)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in protobufjs >= 8.0.2, which fixes the 7 CVEs flagged by Socket
Security on PR #51 (high-severity code injection and prototype pollution
plus several DoS vectors in 8.0.0-8.0.1). Other @opentelemetry/* deps
already satisfy 0.218.0's peer constraints (core/resources/sdk-trace-base
all at 2.7.1). npm audit reports 0 vulnerabilities post-bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rgao-coreweave rgao-coreweave force-pushed the otel-genai-migration branch from 0920e3a to 5f4fb5c Compare May 13, 2026 20:11
Weave's chat_view consumer reads gen_ai.request.model directly off
invoke_agent spans to populate the agent_start lifecycle card
(services/weave-python/.../agents/chat_view.py:_walk_invoke_agent,
_has_agent_start_payload). Previously only chat spans carried the model,
so the session / turn / subagent cards rendered without a model badge.

- Session span: model from the SessionStart hook payload (only when present).
- Turn span: model from the parsed transcript's primaryModel() at Stop.
- Subagent span: model from the parsed subagent transcript at SubagentStop
  (set alongside the existing response_model so both are populated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rgao-coreweave rgao-coreweave requested review from a team and chance-wnb May 13, 2026 21:04
@chance-wnb
Copy link
Copy Markdown
Collaborator

on

Not yet run against a live Weave trace server

Is this true? I seem have have seen a screenshot of the UI. That probably mean the spans are injected.

Without a proper test, this might break users if they adopt the new version and running it against production.

@rgao-coreweave
Copy link
Copy Markdown
Contributor Author

on

Not yet run against a live Weave trace server

Is this true? I seem have have seen a screenshot of the UI. That probably mean the spans are injected.

Without a proper test, this might break users if they adopt the new version and running it against production.

Updated the description, this is tested on live prod weave

Copy link
Copy Markdown
Collaborator

@chance-wnb chance-wnb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work Rick! I leave many comments, but I am mostly concerned on the need of the root session span. According to your API design doc and your Python implementation, there is not a session span in the picture: turns will be stitched together by conversation_id.

If my understanding is wrong, please correct me. That means I need to change my other implementasion. but I highly suspect that your AI usage is greatly impacted by my weave-op based implementation.

Comment thread README.md Outdated
Comment thread src/daemon.ts Outdated
Comment thread src/parser.ts Outdated
Comment thread src/parser.ts
Comment thread src/parser.ts Outdated
Comment thread src/genaiSpans.ts Outdated
Comment thread src/parser.ts Outdated
Comment thread src/daemon.ts Outdated
Comment thread src/daemon.ts Outdated
Comment thread src/daemon.ts Outdated
- daemon.ts: replace the for-loop in `SubagentTracking.byAgentId` with
  `Array.find` (review #51, daemon.ts:171).
- genaiSpans.ts: `ctxWithParent` now builds on `context.active()` instead of
  `ROOT_CONTEXT`, so baggage on the active context propagates to children
  (review #51, genaiSpans.ts:138).
- parser.ts:
  - Destructure `line: m` directly in the `buildTurn` callback.
  - Trust the transcript schema and drop the `typeof === 'string'` guards
    around `timestamp` — it's an ISO string or missing per the spec.
  - Rename single-letter locals (`l`/`u`/`c`/`b`) to descriptive names.
  - Extract `extractAssistantTextBlocks` and reuse it from `genaiSpans`'s
    `assistantBlocksToText`, eliminating the duplicated block-walk.

Typecheck (`npx tsc --noEmit`) is clean.
Per review feedback from chance-wnb on PR #51, align with the
Python implementation and the Weave Agents backend's design where
one trace_id = one turn and conversations are stitched server-side
by `gen_ai.conversation.id`. Source: weave/trace_server/agents in
core (AgentConversationChatRes: "Each entry in `turns` corresponds
to one trace_id, which Weave treats as one conversation turn").

Changes:
- Turn spans are now root spans. Each user prompt produces one OTel
  trace; multi-turn conversations are reassembled by the backend via
  `gen_ai.conversation.id = session_id`. Session-level metadata
  (cwd, source, plugin.version) is stamped on every turn span.
- Drop the `invoke_agent claude-code` session root. No more double
  `invoke_agent` cards in the chat view; no session-level usage
  aggregation; no `weave.claude_code.session.end_reason` etc.
- Subagents no longer get a wrapper `invoke_agent <subagent_type>`
  span. The spawning `execute_tool Agent` span IS the subagent
  invocation in the chat view, and the subagent's per-LLM `chat`
  spans attach directly under that tool span. Nested tool calls
  from inside the subagent (PreToolUse with `agent_id`) likewise
  parent under the Agent tool span.
- PreCompact stamps `weave.compaction.{summary,items_before,
  items_after}` as span attributes on the open turn span (or
  buffers them onto the next turn if no turn is active). The
  backend extracts these into dedicated columns and renders a
  `context_compacted` chat-view card — span events on a session
  span weren't being picked up the same way.
- Drop the trace registry and the `weave_trace_id` /
  `weave_parent_call_id` env-parent path entirely. Without a session
  span there's nothing to force-attach a prior trace ID to;
  conversation_id stitching covers the resume case.

Files:
- `src/genaiSpans.ts`: remove `startSessionSpan`,
  `startSubagentSpan`, `ctxFromSpanContext`, `isValidTraceId`,
  `isValidSpanId`, `addCompactionEvent`. Expand `startTurnSpan` to
  carry session-level attrs and produce a root span. Add
  `setCompactionAttrs`. Trim `WEAVE_SESSION_*` attribute keys that
  no longer have a host span.
- `src/daemon.ts`: rework `handleSessionStart`,
  `handleUserPromptSubmit`, `handlePreToolUse`,
  `handleSubagentStart`, `handleSubagentStop`, `handlePreCompact`,
  `handleStop`, `handleSessionEnd`. Drop the trace registry field
  and its load/upsert/resolve helpers. `SubagentTracker.span`
  becomes `spawningToolSpan` (the Agent tool span reference, kept
  for parenting chat spans even after the tool has closed).
- `src/traceRegistry.ts`: deleted.
- `README.md`: updated "What Gets Traced" hierarchy.

Build and `tsc --noEmit` are clean.
Empirical scan of 702 real subagent transcripts on disk:
- 96.9% are exactly 1 turn
- 2.3% are 2 turns
- 0 transcripts have 3+ turns

The 2-turn case happens when the parent agent's prior assistant message
is carried in as pre-context on line 0 of the subagent's sidechain
transcript, with the user prompt that actually fires the subagent on
line 1. The parser splits this into turn 1 (pre-context) and turn 2
(real subagent work).

My earlier refactor (a13e43d) iterated `parsed.turns` and emitted chat
spans for every turn. For the 2.6% of subagent runs with a pre-context
line, that produced a spurious `chat` span attributed to the spawning
Agent tool span — even though the underlying LLM call had happened in
the *parent* agent, not the subagent. Misattribution.

Restore the original last-turn-only behavior (matches the pre-refactor
`primaryModel()` call site).
Copy link
Copy Markdown
Collaborator

@chance-wnb chance-wnb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining comments are trivial. Approving first for efficiency.

rgao-coreweave and others added 3 commits May 18, 2026 17:59
Two follow-ups on the post-approval unresolved threads:

- contentBlocks: add a comment explaining why the `: []` branch exists.
  `message.content` is either an array of blocks (common), a bare string
  (legacy single-text format), or missing — the empty-array fallback keeps
  downstream code seeing a well-typed list instead of `undefined`.
- Rename the destructured `line: m` → `line` in `buildTurn`'s map callback.
  The aliasing was vestigial; `line` matches the AssistantLine interface
  field name and avoids the single-letter local Chance flagged.
When the user resumes a Claude Code session (--continue / --resume / mid-
session daemon restart), Claude Code generates a new process-level
session_id and stamps every transcript line with forkedFrom.sessionId
pointing at the immediate parent session. Each transcript file is named
after its session id and lives in the same project directory, so the
fork chain can be walked by reading sibling files' first lines.

Previously (before a13e43d) cross-process trace continuity was preserved
by forcing the new session span's trace_id to match the prior one via a
synthetic remote parent context. When the session span was dropped that
mechanism went with it, leaving resumed turns stranded under a new
conversation id and disconnected from their pre-resume turns in the
agents UI.

This restores continuity at the conversation level rather than the
trace-id level: at SessionStart we walk forkedFrom.sessionId to the root
ancestor and use that as gen_ai.conversation.id on every span the
session produces. The current process's session id stays on
weave.claude_code.session.id as a debug breadcrumb. Fresh (non-forked)
sessions are unaffected — chain walk returns the current session_id and
the two ids are equal.

Concretely:

- transcriptFile.ts: add readFirstTranscriptLine() — opens a transcript
  path safely (O_RDONLY|O_NOFOLLOW, regular-file stat) and returns the
  parsed first JSON line, or undefined on any failure. Used for both
  the current session's transcript (with retry, since SessionStart can
  fire before the first line has flushed) and ancestor transcripts (no
  retry — they are static by the time we walk them).

- daemon.ts: SessionState gains conversationId; resolveConversationId()
  walks the chain with a hard depth cap and cycle guard, retrying the
  first read up to 4×100ms while the transcript flushes. If a parent
  transcript is missing on disk (e.g., resumed across machines) the
  walk stops at the highest recorded parent — still a better stitching
  key than the current process's id. Logs at INFO when a resume is
  detected so the chosen conversation id is visible in the daemon log.

- genaiSpans.ts: TurnSpanArgs and ChatSpanArgs split sessionId (debug
  stamp) from conversationId (stitching key). The two were conflated
  before; the rename forces every call site to be explicit about which
  it means. Subagent chat spans build their conversation id from
  conversationId + ':' + agentId so subagent stitching also follows
  the chain root rather than the current process's session id.

Verified the chain walk against three on-disk transcripts:
- 569f2d63 (resumed) → resolves to 23ea869a (root, on disk)
- 23ea869a (root) → resolves to itself
- 121d80f6 (fresh, no forkedFrom) → resolves to itself

tsc --noEmit clean. npm run build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the Weave Agents chat-view reference in `weave-python/weave-public`,
subagents should be a nested `invoke_agent` span — child of the parent
turn, sibling of any regular `execute_tool` calls. The chat view's
`_walk_invoke_agent` then renders the subagent as its own `agent_start`
lifecycle marker, distinct from a tool-call event. The canonical fixture
is `test_subagent_spans_render_inline_with_agent_label_inheritance` in
`tests/trace_server/test_genai_chat_view.py`.

This PR previously did the opposite: it emitted an `execute_tool Agent`
span and attached the subagent's chat spans as its children, on the
theory that "the spawning tool span IS the agent invocation." That
worked mechanically but mis-rendered subagent dispatch as a generic tool
call in the chat view, lost the `agent_start` lifecycle marker for the
subagent's identity, and didn't match the reference's structural model
used by every other GenAI integration.

The corrected span tree:

  invoke_agent claude-code                  (turn — root)
  ├─ chat <model>                           (LLM calls)
  ├─ execute_tool <tool_name>               (Read, Bash, Grep, ...)
  └─ invoke_agent <subagent_type>           (subagent dispatch)
     ├─ chat <model>                        (subagent LLM calls)
     └─ execute_tool <tool_name>            (tools the subagent ran)

Concretely:

- `src/genaiSpans.ts`: add `startInvokeAgentSpan` builder for nested
  invoke_agent spans; add `weave.claude_code.subagent.spawning_tool_call_id`
  attribute key as a back-pointer from the inner invoke_agent to the
  parent's Agent tool_use_id.

- `src/daemon.ts`:
  * `handlePreToolUse`: when `toolName === 'Agent' && subagent_type`,
    emit `invoke_agent <subagent_type>` (NOT `execute_tool Agent`).
    Carry the firing prompt as `gen_ai.input.messages`.
  * `handlePostToolUse` / `handlePostToolUseFailure`: when the
    `tool_use_id` resolves to a subagent tracker, close the
    `invoke_agent` span with the canonical `tool_response` as
    `gen_ai.output.messages` (or ERROR on failure). Idempotent via the
    `ended` flag so SubagentStop/PostToolUse ordering doesn't matter.
  * `handleSubagentStart`: stamp `gen_ai.agent.id` on the existing
    invoke_agent span (matched path) OR create an orphan invoke_agent
    span under the current turn (orphan path).
  * `handleSubagentStop`: emit subagent chat spans as children of the
    inner invoke_agent span; remove the tracker only on the orphan
    path (matched trackers wait for PostToolUse).
  * Drop the `:${agentId}` suffix on subagent chats'
    `gen_ai.conversation.id`; subagent chats inherit the parent's
    conversation id, matching how the rest of the GenAI semconv
    treats nested invocations.
  * `handleSessionEnd`: close any leftover subagent invoke_agent spans
    so they export instead of leaking.
  * Rename `SubagentTracker.spawningToolSpan` → `invokeAgentSpan`,
    add `byToolUseId` lookup on `SubagentTracking`.

- `README.md`: updated "What Gets Traced" hierarchy.

Verified in-process via `scripts/smoke/verify-subagent-shape.mjs`
(InMemorySpanExporter): the resulting tree matches the canonical
reference fixture exactly (turn → execute_tool sibling → invoke_agent
sibling → chat under subagent), with agent_id/spawning_tool_call_id/
input.messages/output.messages all set as expected. 11/11 assertions
pass; tsc --noEmit and npm run build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rgao-coreweave
Copy link
Copy Markdown
Contributor Author

The remaining comments are trivial. Approving first for efficiency.

Fantastic work Rick! I leave many comments, but I am mostly concerned on the need of the root session span. According to your API design doc and your Python implementation, there is not a session span in the picture: turns will be stitched together by conversation_id.

If my understanding is wrong, please correct me. That means I need to change my other implementasion. but I highly suspect that your AI usage is greatly impacted by my weave-op based implementation.

You are absolutely right on this! Architecture was influenced by weave-op based implementation, Weave SDK design has the right shape:

invoke_agent claude-code                  (root — one trace per user prompt)
├─ chat <model>                           (each LLM API call within the turn)
├─ execute_tool <tool_name>               (each tool call: Read, Bash, Grep, ...)
└─ invoke_agent <subagent_type>           (subagent dispatched via the `Agent` tool)
   ├─ chat <model>                        (subagent LLM calls)
   └─ execute_tool <tool_name>            (tools the subagent ran)

This is fixed in the PR and will get checked in.

@rgao-coreweave
Copy link
Copy Markdown
Contributor Author

The remaining comments are trivial. Approving first for efficiency.

Hey @chance-wnb, thank you so much for the thoughtful review! I really appreciate the care you put into it. I filed any follow ups to tickets.

Landing this first, and any issues should be fixed in the follow up PRs.

@rgao-coreweave rgao-coreweave merged commit 7c24eac into main May 19, 2026
4 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants