feat: migrate to OTel GenAI semconv + new /agents/otel/v1/traces endpoint#51
Conversation
Replace the Weave JS SDK (call/start + call/end API) with the OTel JS SDK
emitting protobuf OTLP to /agents/otel/v1/traces. Spans follow the OTel
GenAI semantic conventions so they appear in the Weave Agents observability
surface.
Span mapping:
- session → invoke_agent claude-code (root)
- turn → invoke_agent claude-code (one per user prompt)
- chat → chat <model> (one per LLM API call, emitted at Stop from parsed
transcript with backdated timestamps)
- tool → execute_tool <tool_name>
- subagent → invoke_agent <subagent_type> (flat under turn; back-pointer
attribute to spawning Agent tool)
Permission requests appear as `weave.permission_request` span events on the
parent tool span; PreCompact appears as a `weave.compaction` event on the
session span.
Attribute namespace: gen_ai.* for everything in the OTel GenAI semconv,
weave.* for Claude-Code-specific extensions (session.id, cwd, source,
turn.number, tool.use_id, subagent.spawning_tool_call_id, ...).
Trace continuity on session resume is preserved by forcing the new session
span's traceId via a synthetic remote parent context.
Drops the `weave` npm dependency. Adds @opentelemetry/{api, sdk-trace-node,
sdk-trace-base, resources, semantic-conventions, exporter-trace-otlp-proto}.
Design spec: docs/superpowers/specs/2026-05-12-otel-semconv-migration-design.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
All contributors have signed the CLA ✍️ ✅ |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. |
HiveMind Sessions21 sessions · 5h 4m · $171
| Weave Claude Plugin Repo Context Inquiry View all sessions in HiveMind → Run |
The Weave Agents UI populates its Version column from `gen_ai.agent.version` (alias `weave.agent.version`). Without it, the Version column shows "(no version)" for every agent. Set it to the plugin VERSION on session, turn, and subagent invoke_agent spans. The plugin already exposed VERSION via `service.version` (resource) and `weave.claude_code.plugin.version` (custom span attr); neither is what the Agents UI keys off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
I have read the CLA Document and I hereby sign the CLA |
Pulls in protobufjs >= 8.0.2, which fixes the 7 CVEs flagged by Socket Security on PR #51 (high-severity code injection and prototype pollution plus several DoS vectors in 8.0.0-8.0.1). Other @opentelemetry/* deps already satisfy 0.218.0's peer constraints (core/resources/sdk-trace-base all at 2.7.1). npm audit reports 0 vulnerabilities post-bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- gen_ai.provider.name only on chat spans, derived from model id via providerFromModel() — not stamped as a constant on tool/turn/session - replace Math.random() span-id generator with crypto.randomBytes(8) - isValidTraceId / isValidSpanId in genaiSpans.ts; remove duplicated hex regexes in daemon.ts and traceRegistry.ts - split weave.permission_request into request + resolved events stamped at the time each actually happens; drop the deferred-write workaround - collapse subagentTrackers + subagentByAgentId into a single SubagentTracking class with byAgentId / findUnmatchedByProximity Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The rgao/ directory is excluded by the global core.excludesFile but the spec was committed before that rule existed; untrack it so it stops appearing in the PR diff. File preserved locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The orphan path (SubagentStart with no matching PreToolUse) was stamping toolUseId = agentId and spawningToolCallId = '' to satisfy required fields. Make those fields optional on SubagentTracker so orphans simply don't carry them, and omit the spawning_tool_call_id span attribute when no spawning tool exists rather than emitting an empty string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- drop dead permissionSuggestions field from PendingToolCall; pass payload value directly to addPermissionRequestEvent - rename permissionStartedAt: Date -> permissionRequested: boolean (the field served only as a sentinel after the event-split refactor) - extract resolvePermissionIfPending() helper — three near-identical call sites at PostToolUse, PostToolUseFailure, SessionEnd cleanup - disambiguate duplicate "invoke_agent claude-code" span name: session uses :session suffix, turn uses :turn suffix - drop weave.claude_code.tool.use_id from tool spans (was identical to gen_ai.tool.call.id) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in protobufjs >= 8.0.2, which fixes the 7 CVEs flagged by Socket Security on PR #51 (high-severity code injection and prototype pollution plus several DoS vectors in 8.0.0-8.0.1). Other @opentelemetry/* deps already satisfy 0.218.0's peer constraints (core/resources/sdk-trace-base all at 2.7.1). npm audit reports 0 vulnerabilities post-bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0920e3a to
5f4fb5c
Compare
Weave's chat_view consumer reads gen_ai.request.model directly off invoke_agent spans to populate the agent_start lifecycle card (services/weave-python/.../agents/chat_view.py:_walk_invoke_agent, _has_agent_start_payload). Previously only chat spans carried the model, so the session / turn / subagent cards rendered without a model badge. - Session span: model from the SessionStart hook payload (only when present). - Turn span: model from the parsed transcript's primaryModel() at Stop. - Subagent span: model from the parsed subagent transcript at SubagentStop (set alongside the existing response_model so both are populated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
on
Is this true? I seem have have seen a screenshot of the UI. That probably mean the spans are injected. Without a proper test, this might break users if they adopt the new version and running it against production. |
Updated the description, this is tested on live prod weave |
chance-wnb
left a comment
There was a problem hiding this comment.
Fantastic work Rick! I leave many comments, but I am mostly concerned on the need of the root session span. According to your API design doc and your Python implementation, there is not a session span in the picture: turns will be stitched together by conversation_id.
If my understanding is wrong, please correct me. That means I need to change my other implementasion. but I highly suspect that your AI usage is greatly impacted by my weave-op based implementation.
- daemon.ts: replace the for-loop in `SubagentTracking.byAgentId` with `Array.find` (review #51, daemon.ts:171). - genaiSpans.ts: `ctxWithParent` now builds on `context.active()` instead of `ROOT_CONTEXT`, so baggage on the active context propagates to children (review #51, genaiSpans.ts:138). - parser.ts: - Destructure `line: m` directly in the `buildTurn` callback. - Trust the transcript schema and drop the `typeof === 'string'` guards around `timestamp` — it's an ISO string or missing per the spec. - Rename single-letter locals (`l`/`u`/`c`/`b`) to descriptive names. - Extract `extractAssistantTextBlocks` and reuse it from `genaiSpans`'s `assistantBlocksToText`, eliminating the duplicated block-walk. Typecheck (`npx tsc --noEmit`) is clean.
Per review feedback from chance-wnb on PR #51, align with the Python implementation and the Weave Agents backend's design where one trace_id = one turn and conversations are stitched server-side by `gen_ai.conversation.id`. Source: weave/trace_server/agents in core (AgentConversationChatRes: "Each entry in `turns` corresponds to one trace_id, which Weave treats as one conversation turn"). Changes: - Turn spans are now root spans. Each user prompt produces one OTel trace; multi-turn conversations are reassembled by the backend via `gen_ai.conversation.id = session_id`. Session-level metadata (cwd, source, plugin.version) is stamped on every turn span. - Drop the `invoke_agent claude-code` session root. No more double `invoke_agent` cards in the chat view; no session-level usage aggregation; no `weave.claude_code.session.end_reason` etc. - Subagents no longer get a wrapper `invoke_agent <subagent_type>` span. The spawning `execute_tool Agent` span IS the subagent invocation in the chat view, and the subagent's per-LLM `chat` spans attach directly under that tool span. Nested tool calls from inside the subagent (PreToolUse with `agent_id`) likewise parent under the Agent tool span. - PreCompact stamps `weave.compaction.{summary,items_before, items_after}` as span attributes on the open turn span (or buffers them onto the next turn if no turn is active). The backend extracts these into dedicated columns and renders a `context_compacted` chat-view card — span events on a session span weren't being picked up the same way. - Drop the trace registry and the `weave_trace_id` / `weave_parent_call_id` env-parent path entirely. Without a session span there's nothing to force-attach a prior trace ID to; conversation_id stitching covers the resume case. Files: - `src/genaiSpans.ts`: remove `startSessionSpan`, `startSubagentSpan`, `ctxFromSpanContext`, `isValidTraceId`, `isValidSpanId`, `addCompactionEvent`. Expand `startTurnSpan` to carry session-level attrs and produce a root span. Add `setCompactionAttrs`. Trim `WEAVE_SESSION_*` attribute keys that no longer have a host span. - `src/daemon.ts`: rework `handleSessionStart`, `handleUserPromptSubmit`, `handlePreToolUse`, `handleSubagentStart`, `handleSubagentStop`, `handlePreCompact`, `handleStop`, `handleSessionEnd`. Drop the trace registry field and its load/upsert/resolve helpers. `SubagentTracker.span` becomes `spawningToolSpan` (the Agent tool span reference, kept for parenting chat spans even after the tool has closed). - `src/traceRegistry.ts`: deleted. - `README.md`: updated "What Gets Traced" hierarchy. Build and `tsc --noEmit` are clean.
Empirical scan of 702 real subagent transcripts on disk: - 96.9% are exactly 1 turn - 2.3% are 2 turns - 0 transcripts have 3+ turns The 2-turn case happens when the parent agent's prior assistant message is carried in as pre-context on line 0 of the subagent's sidechain transcript, with the user prompt that actually fires the subagent on line 1. The parser splits this into turn 1 (pre-context) and turn 2 (real subagent work). My earlier refactor (a13e43d) iterated `parsed.turns` and emitted chat spans for every turn. For the 2.6% of subagent runs with a pre-context line, that produced a spurious `chat` span attributed to the spawning Agent tool span — even though the underlying LLM call had happened in the *parent* agent, not the subagent. Misattribution. Restore the original last-turn-only behavior (matches the pre-refactor `primaryModel()` call site).
chance-wnb
left a comment
There was a problem hiding this comment.
The remaining comments are trivial. Approving first for efficiency.
Two follow-ups on the post-approval unresolved threads: - contentBlocks: add a comment explaining why the `: []` branch exists. `message.content` is either an array of blocks (common), a bare string (legacy single-text format), or missing — the empty-array fallback keeps downstream code seeing a well-typed list instead of `undefined`. - Rename the destructured `line: m` → `line` in `buildTurn`'s map callback. The aliasing was vestigial; `line` matches the AssistantLine interface field name and avoids the single-letter local Chance flagged.
When the user resumes a Claude Code session (--continue / --resume / mid- session daemon restart), Claude Code generates a new process-level session_id and stamps every transcript line with forkedFrom.sessionId pointing at the immediate parent session. Each transcript file is named after its session id and lives in the same project directory, so the fork chain can be walked by reading sibling files' first lines. Previously (before a13e43d) cross-process trace continuity was preserved by forcing the new session span's trace_id to match the prior one via a synthetic remote parent context. When the session span was dropped that mechanism went with it, leaving resumed turns stranded under a new conversation id and disconnected from their pre-resume turns in the agents UI. This restores continuity at the conversation level rather than the trace-id level: at SessionStart we walk forkedFrom.sessionId to the root ancestor and use that as gen_ai.conversation.id on every span the session produces. The current process's session id stays on weave.claude_code.session.id as a debug breadcrumb. Fresh (non-forked) sessions are unaffected — chain walk returns the current session_id and the two ids are equal. Concretely: - transcriptFile.ts: add readFirstTranscriptLine() — opens a transcript path safely (O_RDONLY|O_NOFOLLOW, regular-file stat) and returns the parsed first JSON line, or undefined on any failure. Used for both the current session's transcript (with retry, since SessionStart can fire before the first line has flushed) and ancestor transcripts (no retry — they are static by the time we walk them). - daemon.ts: SessionState gains conversationId; resolveConversationId() walks the chain with a hard depth cap and cycle guard, retrying the first read up to 4×100ms while the transcript flushes. If a parent transcript is missing on disk (e.g., resumed across machines) the walk stops at the highest recorded parent — still a better stitching key than the current process's id. Logs at INFO when a resume is detected so the chosen conversation id is visible in the daemon log. - genaiSpans.ts: TurnSpanArgs and ChatSpanArgs split sessionId (debug stamp) from conversationId (stitching key). The two were conflated before; the rename forces every call site to be explicit about which it means. Subagent chat spans build their conversation id from conversationId + ':' + agentId so subagent stitching also follows the chain root rather than the current process's session id. Verified the chain walk against three on-disk transcripts: - 569f2d63 (resumed) → resolves to 23ea869a (root, on disk) - 23ea869a (root) → resolves to itself - 121d80f6 (fresh, no forkedFrom) → resolves to itself tsc --noEmit clean. npm run build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the Weave Agents chat-view reference in `weave-python/weave-public`,
subagents should be a nested `invoke_agent` span — child of the parent
turn, sibling of any regular `execute_tool` calls. The chat view's
`_walk_invoke_agent` then renders the subagent as its own `agent_start`
lifecycle marker, distinct from a tool-call event. The canonical fixture
is `test_subagent_spans_render_inline_with_agent_label_inheritance` in
`tests/trace_server/test_genai_chat_view.py`.
This PR previously did the opposite: it emitted an `execute_tool Agent`
span and attached the subagent's chat spans as its children, on the
theory that "the spawning tool span IS the agent invocation." That
worked mechanically but mis-rendered subagent dispatch as a generic tool
call in the chat view, lost the `agent_start` lifecycle marker for the
subagent's identity, and didn't match the reference's structural model
used by every other GenAI integration.
The corrected span tree:
invoke_agent claude-code (turn — root)
├─ chat <model> (LLM calls)
├─ execute_tool <tool_name> (Read, Bash, Grep, ...)
└─ invoke_agent <subagent_type> (subagent dispatch)
├─ chat <model> (subagent LLM calls)
└─ execute_tool <tool_name> (tools the subagent ran)
Concretely:
- `src/genaiSpans.ts`: add `startInvokeAgentSpan` builder for nested
invoke_agent spans; add `weave.claude_code.subagent.spawning_tool_call_id`
attribute key as a back-pointer from the inner invoke_agent to the
parent's Agent tool_use_id.
- `src/daemon.ts`:
* `handlePreToolUse`: when `toolName === 'Agent' && subagent_type`,
emit `invoke_agent <subagent_type>` (NOT `execute_tool Agent`).
Carry the firing prompt as `gen_ai.input.messages`.
* `handlePostToolUse` / `handlePostToolUseFailure`: when the
`tool_use_id` resolves to a subagent tracker, close the
`invoke_agent` span with the canonical `tool_response` as
`gen_ai.output.messages` (or ERROR on failure). Idempotent via the
`ended` flag so SubagentStop/PostToolUse ordering doesn't matter.
* `handleSubagentStart`: stamp `gen_ai.agent.id` on the existing
invoke_agent span (matched path) OR create an orphan invoke_agent
span under the current turn (orphan path).
* `handleSubagentStop`: emit subagent chat spans as children of the
inner invoke_agent span; remove the tracker only on the orphan
path (matched trackers wait for PostToolUse).
* Drop the `:${agentId}` suffix on subagent chats'
`gen_ai.conversation.id`; subagent chats inherit the parent's
conversation id, matching how the rest of the GenAI semconv
treats nested invocations.
* `handleSessionEnd`: close any leftover subagent invoke_agent spans
so they export instead of leaking.
* Rename `SubagentTracker.spawningToolSpan` → `invokeAgentSpan`,
add `byToolUseId` lookup on `SubagentTracking`.
- `README.md`: updated "What Gets Traced" hierarchy.
Verified in-process via `scripts/smoke/verify-subagent-shape.mjs`
(InMemorySpanExporter): the resulting tree matches the canonical
reference fixture exactly (turn → execute_tool sibling → invoke_agent
sibling → chat under subagent), with agent_id/spawning_tool_call_id/
input.messages/output.messages all set as expected. 11/11 assertions
pass; tsc --noEmit and npm run build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
You are absolutely right on this! Architecture was influenced by weave-op based implementation, Weave SDK design has the right shape: This is fixed in the PR and will get checked in. |
Hey @chance-wnb, thank you so much for the thoughtful review! I really appreciate the care you put into it. I filed any follow ups to tickets. Landing this first, and any issues should be fixed in the follow up PRs. |
Summary
/agents/otel/v1/tracesso spans land in the Weave Agents observability surface.invoke_agent,chat,execute_tool).weavenpm dependency; add the standard@opentelemetry/*packages.Span mapping
claude_code.sessiongen_ai.conversation.id)claude_code.turninvoke_agent claude-code(root — one trace per user prompt)chat <model>— one per LLM API call, emitted atStopfrom parsed transcript with backdated timestamps and per-call usageclaude_code.tool.<name>execute_tool <tool_name>claude_code.subagent.<type>(nested inside Agent tool span)invoke_agent <subagent_type>— child of the turn, sibling of regularexecute_toolspans. The spawningAgenttool call does NOT emit anexecute_toolspan; the innerinvoke_agentis the agent invocation. Back-pointer to the spawning tool_use_id viaweave.claude_code.subagent.spawning_tool_call_id.claude_code.permission_request(child span)weave.permission_request+weave.permission_resolvedspan events on the parentexecute_toolspan (split so each is stamped at the time it actually happened)weave.compaction.*attributes on the turn span open at compaction time (or the next turn, if compaction fires between turns)Span tree
This matches the Weave Agents chat view's reference structure (
weave/trace_server/agents/chat_view.pyandtests/trace_server/test_genai_chat_view.py::test_subagent_spans_render_inline_with_agent_label_inheritance), where nestedinvoke_agentspans render as their ownagent_startlifecycle marker with the inner agent's identity, distinct from a tool-call event.Attribute namespace
gen_ai.*— everything in the OTel GenAI semconv catalog:gen_ai.operation.name,gen_ai.provider.name(chat spans only, derived from the model id viaproviderFromModel()),gen_ai.agent.name,gen_ai.agent.id,gen_ai.agent.version,gen_ai.conversation.id,gen_ai.request.model,gen_ai.response.model,gen_ai.response.id,gen_ai.response.finish_reasons,gen_ai.usage.*,gen_ai.tool.*,gen_ai.input.messages,gen_ai.output.messages,gen_ai.output.type,error.type.weave.claude_code.*— Claude-Code-specific extensions with no semconv equivalent:session.id(current process's session id, debug breadcrumb),cwd,source,plugin.version,turn.number,turn.tool_count,orphan_reason,display_name,subagent.spawning_tool_call_id(back-pointer from the innerinvoke_agent <subagent_type>span to the parent'sAgenttool_use_id).gen_ai.request.modelandgen_ai.agent.versionare stamped on everyinvoke_agentspan so the Weave Agents UI'sagent_startlifecycle card and Version column populate at every agent level.Wire format
POST ${WANDB_BASE_URL}/agents/otel/v1/tracesContent-Type: application/x-protobufwandb-api-key: <WANDB_API_KEY>headerwandb.entity,wandb.project,service.name=claude-code,service.version=<plugin version>Trace continuity (resume)
Each user prompt produces its own root trace. Multi-turn conversations are stitched together server-side via
gen_ai.conversation.id.For resumed sessions (
claude --continue,claude --resume <id>, daemon restart mid-session), Claude Code generates a new process-levelsession_idbut stamps every transcript line withforkedFrom.sessionIdpointing at the immediate parent session. AtSessionStartthe daemon walks theforkedFrom.sessionIdchain across sibling transcript files to the root ancestor and uses that root id asgen_ai.conversation.idon every span the resumed session produces. The current process's session id is still stamped on each span asweave.claude_code.session.idso the resume is visible in the trace.The walk has a hard depth cap and a cycle guard. If a parent transcript is missing on disk (e.g., resumed across machines) the walk stops at the highest recorded parent — still a better stitching key than the current process's session id. The first hop retries up to 4×100 ms while the transcript first line is flushing; ancestor reads are not retried because ancestor transcripts are static.
Fresh (non-forked) sessions are unaffected — the chain walk returns the current session id, and
conversation.idequalssession.id.Subagent correlation
SubagentStartcarries no pointer back to the spawningAgenttool'stool_use_id. The daemon correlates the subagent's runtimeagent_idto the open subagent tracker by content: atPreToolUse(Agent + subagent_type)it records the sha256 of the firing prompt; atSubagentStartit reads the subagent transcript's first user-message line (byte-identical to the parent'stool_input.prompt) and matches by(promptHash, subagent_type). No temporal window; deterministic across loaded CI and back-to-back identical Agent calls.If correlation fails (parent's PreToolUse never fired, or the firing prompt couldn't be read from the subagent transcript), an orphan
invoke_agentspan is created as a direct child of the current turn span and closed atSubagentStop. Theweave.claude_code.orphan_reasonattribute records why correlation didn't match.Files changed
src/genaiSpans.ts— attribute key constants, span builders (startTurnSpan,startInvokeAgentSpan,startToolSpan,emitChatSpan), span event emitters, transcript-to-chat-span conversionsrc/daemon.ts—NodeTracerProvider+BatchSpanProcessor+OTLPTraceExporterreplaceWeaveClient/saveOp/saveCallStart/saveCallEnd; per-session state storesSpanobjects instead of UUIDv7 call IDs; newPreCompacthandler;resolveConversationId()walksforkedFrom.sessionIdto the root ancestor for resume stitching; subagent dispatch emits a nestedinvoke_agent <subagent_type>span (not anexecute_tool Agentwrapper) and correlates by content-based prompt hashsrc/parser.ts— adds per-LLM-call detail (assistantCalls()) carrying per-call timestamps, model, usage, content blocks, response id, and finish reason so chat spans can be emitted with backdated start/end timessrc/transcriptFile.ts— addsreadFirstTranscriptLine()for safe (O_RDONLY|O_NOFOLLOW, regular-file-only) first-line reads of ancestor and subagent transcriptssrc/traceRegistry.ts— no longer needed; cross-process continuity is now derived fromforkedFrom.sessionIdin the transcript itself rather than a separate disk-side mappingpackage.json/package-lock.json— dropsweave; adds@opentelemetry/api,@opentelemetry/sdk-trace-node,@opentelemetry/sdk-trace-base,@opentelemetry/resources,@opentelemetry/semantic-conventions,@opentelemetry/exporter-trace-otlp-proto(pinned to ^0.218.0 — pulls in protobufjs ≥ 8.0.2 to clear the Socket Security CVEs flagged on earlier revisions of this PR)README.md— refreshed "What Gets Traced" hierarchyTest plan
npx tsc --noEmit— cleannpm audit— 0 vulnerabilities post protobufjs bumpSessionStart,UserPromptSubmit,PreToolUse,PermissionRequest,PostToolUse,PreCompact,Stop,SessionEnd), confirmedPOST /agents/otel/v1/traceswithContent-Type: application/x-protobufandwandb-api-keyheader; raw-protobuf scan of the body foundgen_ai.operation.name,invoke_agent,execute_tool,chat,weave.permission_request,weave.compaction,wandb.entity,wandb.projecttest_subagent_spans_render_inline_with_agent_label_inheritance) viascripts/smoke/verify-subagent-shape.mjs: turn →execute_toolsibling → nestedinvoke_agent <subagent_type>sibling →chatunder subagent;gen_ai.agent.id,weave.claude_code.subagent.spawning_tool_call_id,gen_ai.input.messages,gen_ai.output.messagesall set on the inner invoke_agentresolveConversationId()exercised against three real on-disk transcripts: a resumed session resolves to its root ancestor, an already-root session resolves to itself, and a fresh non-forked session resolves to itselfBreaking changes
claude_code.*op names will not see traces from this version./agents/otel/v1/traces) and a different backend (Agents observability vs. the Calls API), so traces produced by this version appear in a different surface in the Weave UI.🤖 Generated with Claude Code