Port OTEP-4947 thread-context writer from polarsignals/custom-labels#347
Port OTEP-4947 thread-context writer from polarsignals/custom-labels#347szegedi wants to merge 18 commits into
Conversation
|
Overall package sizeSelf size: 2.42 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | pprof-format | 2.2.2 | 500.53 kB | 500.53 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
5d8416f to
3e52394
Compare
ivoanjo
left a comment
There was a problem hiding this comment.
Really excited to see this progress! Left a few notes
| * application hands to its OTEP-4719 process-context publisher. | ||
| */ | ||
| export interface ProcessContextAttributes { | ||
| readonly 'threadlocal.schema_version': 'nodejs_v1'; |
There was a problem hiding this comment.
Minor: Should we start with nodejs_v1_dev in case we still want to "change our minds" for a while? That way if it turns out that for some reason wrapped_object_offset / tagged_size might need to be changed, we can still change it without creating confusion for the reader.
| readonly 'threadlocal.schema_version': 'nodejs_v1'; | |
| readonly 'threadlocal.schema_version': 'nodejs_v1_dev'; |
| readonly 'threadlocal.nodejs_v1.wrapped_object_offset': number; | ||
| readonly 'threadlocal.nodejs_v1.tagged_size': number; |
There was a problem hiding this comment.
Minor: I mentioned this on slack as well -- since the schema_version already identifies the expected keys and their semantics, I'm not sure namespacing again here is needed.
| readonly 'threadlocal.nodejs_v1.wrapped_object_offset': number; | |
| readonly 'threadlocal.nodejs_v1.tagged_size': number; | |
| readonly 'threadlocal.wrapped_object_offset': number; | |
| readonly 'threadlocal.tagged_size': number; |
| */ | ||
| export interface ProcessContextAttributes { | ||
| readonly 'threadlocal.schema_version': 'nodejs_v1'; | ||
| readonly 'threadlocal.attribute_key_map': readonly string[]; |
There was a problem hiding this comment.
For simplification of the reader, we're thinking of for now adopting a restriction here -- attribute_key_map can only ever grow (up to the limit of 255 keys) and never shrink. Would that be ok for our nodejs plans? E.g. do you foresee any issues?
There was a problem hiding this comment.
Not at all; thanks for raising it as a concern. Currently in my dd-trace-js bits (no PR yet, only a WIP branch at https://github.com/DataDog/dd-trace-js/tree/otel-thread-context-writer) I register the datadog.{trace_endpoint|thread_name|thread_id} keys defined in this document on tracer library startup and they remain fixed.
There was a problem hiding this comment.
More for future maintenance, maybe here on this file or as a README.md in this folder, I suggest mentioning that these files are a copy-paste of https://github.com/polarsignals/custom-labels/blob/otel-thread-ctx-wip/js/ and that in the future we plan to unify them and drop our vendored copy.
This will help everyone keep this link in mind, as otherwise it might not be obvious that we fully intend this file to be a vendored copy and in particular it shouldn't evolve separately from upstream (or at least not ideally -- would complicate merging back).
| void OtelThreadCtx::Init(Local<Object> exports) { | ||
| CtxWrap::Init(exports); | ||
| NODE_SET_METHOD(exports, "otelThreadCtxStoreAls", StoreAls); | ||
| NODE_SET_METHOD(exports, "otelThreadCtxGetStoredAlsHash", GetStoredAlsHash); |
There was a problem hiding this comment.
I'm curious -- why change the name of the class and the namespace vs upstream? (E.g. asking from the POV of "if the diff to upstream is minimal, it's way easier to keep them in-sync")
There was a problem hiding this comment.
I was lately iterating in this repo as I'm integrating with dd-trace-js; I'll soon make a follow-up PR in the polarsignals repo, especially since the planned API also didn't survive contact with the integration. (It didn't change much, but I decoupled creation of a record from its activation so that a span can create and store a single record it owns, and then make it active whenever it itself becomes active.) I was also bothered by the name CtxWrap as it being a "wrap" is an internal implementation detail of how Node.js creates "wrappers" for native objects. But I see there's still a remaining instance of the name – I'll probably fix that too :-)
There was a problem hiding this comment.
I was also looking at further minimising the diffs, and there's some low-hanging fruit, e.g. in pprof-nodejs we use Google-derived .clang-format and the polarsignals repo has none. Doing a one-time formatting in polarsignals repo reduced the diff from 412 lines to 170.
There are some things that need to stay as they are, e.g. the dd C++ namespace here in the port.
There's also a pretty big difference in the fact that polarsignals repo is JavaScript, and pprof-nodejs is rather deliberately TypeScript, so in PS we have a .js file + a .d.ts file while in pprof-nodejs we only have only a single .ts file. We can't really reconcile this lexically. I'm using Claude to keep the two in sync, it's pretty good at it. The test harness has a similar issue in addition to pprof-nodejs using Mocha while polarsignals uses Node.js built-in asserts.
Fortunately TS and JS are not that much different and the JS/TS code part is fairly small.
There was a problem hiding this comment.
This would also benefit from my note on otel-thread-ctx.cc -- making it clear via comments or in a README somewhere where this comes from and keeping the diff as minimal as possible.
Ports the in-development OpenTelemetry thread-context writer that lives on the otel-thread-ctx-node branch of polarsignals/custom-labels (szegedi fork) into this project. The two codebases will likely diverge again later; for now this is a snapshot of the current state. Structurally: - bindings/otel-thread-ctx.cc/.hh: the native addon code, namespaced in `dd::` and exposed via OtelThreadCtx::Init(exports) called from binding.cc. The thread_local otel_thread_ctx_nodejs_v1 discovery symbol stays in extern "C" at file scope so it's exported by name through the dd_pprof.node dynsym table. - ts/src/otel-thread-ctx.ts: the runWithContext / enterWithContext / makeNamedContext API, loading the native addon via node-gyp-build like the rest of this project. - ts/test/test-otel-thread-ctx.ts: mocha port of the node:test suite. Skipped wholesale on non-Linux. - binding.gyp: adds bindings/otel-thread-ctx.cc to both target source lists and the -mtls-dialect=gnu2 cflag on x86_64 Linux (required by the OTEP-4947 spec; on arm64 TLSDESC is the only dynamic TLS model so no flag is needed). Verified by mocha against the built dd_pprof.node in a Linux container (Node 22 with --experimental-async-context-frame): 35 passing.
Mirrors the test:docker mechanism in custom-labels/js: a Dockerfile under scripts/docker/ extending node:24-bookworm with python3 and build-essential, plus a launcher script that builds the image (cached), mounts the repo read-only, copies it into /tmp/work inside the container, and runs `npm install && npm test`. The host tree is never modified (no stray node_modules/, build/, out/). Node 24 is used so the full test suite — including the new OTEP-4947 thread-context tests, which need AsyncContextFrame — runs without extra Node flags. Run via `npm run test:docker`.
- Add a static_assert that offsetof(CtxWrap, record_) == sizeof(node::ObjectWrap), since that offset is part of the reader ABI. Restructure CtxWrap so record_, capacity_, and truncated_ live in a single public access section: C++ leaves cross-access-control field ordering implementation-defined, so splitting them would allow a conforming compiler to reorder the bookkeeping fields ahead of record_. - Add an acq_rel signal fence between the pointer swap and free() in the reallocate path. The pre-existing release fence only constrains prior writes; nothing was stopping the compiler from hoisting free() above the publication store, which would let a stopped reader follow self->record_ into freed memory. - Restore the [[unlikely]] annotation on the IsConstructCall() check. - Misc local cleanups (std::min/max in two spots, assert valid==1 after the memcpy instead of redundantly setting it).
The CtxWrap accessor that returns the raw record as a Uint8Array is only intended for tests and out-of-process-reader development. Naming it DebugBytes (and exposing it as wrap.debugBytes() on the JS prototype) makes that explicit at every call site.
v8::internal::Internals::kContinuationPreservedEmbedderDataOffset was introduced in Node 22. Older versions don't have ContinuationPreservedEmbedderData at all, and the TS layer already refuses to install the hook there via asyncContextFrameError, so StoreAls is never actually invoked on Node < 22 — we just need the addon to compile so the package installs.
Two more Node-version-sensitive spots blocking the prebuild against Node 18.0.0 headers: - ArrayBuffer::Data() wasn't exposed in V8 10.1 (Node 18.0). Switch to GetBackingStore()->Data(), which has been available since V8 7.4 / Node 12. The shared_ptr atomic is a per-call cost in CopyBytes (twice per CtxWrap::New) and DebugBytes — neither is a hot path. - kEmbedderDataSlotExternalPointerOffset is Node 22+. Same shape as the earlier kContinuationPreservedEmbedderDataOffset guard: publish a sentinel 0 on older Node so the addon's exported surface stays consistent across majors. A would-be reader can't reach a live record through it anyway (no CPED on Node < 22).
…removal) Node 26 ships V8 14.x, which removes the old String::Utf8Length / WriteUtf8 / NO_NULL_TERMINATION trio in favor of the V2 versions, and removes Object::GetIsolate() entirely. Switch the encode loop to the V2 forms on Node >= 24 (Node 22 ships V8 12.4 which never gets V2; Node 24's V8 13.6 has both, Node 26's V8 14.x has only V2). Replace exports->GetIsolate() with Isolate::GetCurrent() unconditionally — they're equivalent during module init and the latter is the only version that survives Node 26.
- Reformat ts/test/test-otel-thread-ctx.ts via gts (prettier). - Drop unused parameter declarations from the non-Linux stubs in ts/src/otel-thread-ctx.ts (they were carrying _-prefix names that gts's eslint still flags); TS allows fewer params on the assigned function than the declared variable's signature requires. - Use strict ==/!= equality instead of loose null-check. - Disable no-sparse-arrays in the test file: holes in attribute arrays are part of the wire format we're verifying. - Use `void` prefix on the runWithContext() call inside the sync test whose return type confuses no-floating-promises.
__attribute__((visibility("default"))) is a GCC/Clang extension that
MSVC doesn't recognize, breaking the Windows prebuild. Guard it
behind __GNUC__/__clang__. Visibility is irrelevant on Windows anyway
since the OTEP-4947 reader contract is ELF-TLSDESC and only meaningful
on Linux.
Two cases the prior unconditional describe didn't cover: - AsyncContextFrame (the writer's discovery substrate) is opt-in on Node 22/23 (requires --experimental-async-context-frame), on by default in Node 24+ (disable-able via --no-async-context-frame), and absent on Node < 22. The TS layer's asyncContextFrameError refuses to install the hook in each of those cases; the test now mirrors the same predicate so the suite is skipped instead of failing every test with "feature unavailable". - The discovery-contract test reads the addon binary from build/Release/dd_pprof.node, which exists only on the build-from-source path. The prebuild-install / node-gyp-build CI matrix uses a prebuilt binary from prebuilds/, so the path doesn't exist there. Skip that one test when the file isn't present.
Add an otelThreadCtx namespace alongside time/heap so consumers can
reach the writer (runWithContext, enterWithContext, clearContext,
appendAttributes, isContextTruncated, makeNamedContext) via
require('@datadog/pprof').otelThreadCtx without importing internal
paths. The debug-only _currentRecordBytes accessor stays unexposed.
…elpers Reshape the otelThreadCtx namespace around an explicitly-allocated CtxWrap object so consumers can cache one record per span and re-install it without allocating churn: - The native CtxWrap class is now constructable from JS via `new pprof.otelThreadCtx.CtxWrap(traceId, spanId, attributes?)`. - New top-level functions get/set/runWithContext take a wrap reference (or undefined). `getContext() === wrap` becomes the cheap identity check that replaces the previous currentSpanIdMatches dance. - The opts-form helpers (`enterWithContext`, `appendAttributes`, `clearContext`, `isContextTruncated`, `currentSpanIdMatches`) are removed at the top level. Callers go through the wrap directly: `wrap.appendAttributes(...)`, `wrap.isTruncated()`, `setContext(undefined)`. - NamedContext stays as a name→index resolver: `buildContext(opts)` returns a CtxWrap with attributes resolved positionally; `runWithContext` / `enterWithContext` / `clearContext` are kept as one-liner sugar that compose with the new top-level functions. Native: the CtxWrap class is registered with its new name (was "OtelThreadCtxWrap") and the per-instance "append" method is now "appendAttributes" for parity with the JS-side phrasing. The SpanIdMatches binding is dropped. Reasoning: under AsyncContextFrame each fork inherits the wrap by reference, so once dd-trace-js caches one CtxWrap per span and re-installs it on every storage:enter, both the allocation churn we saw in the wall profiler (PR dd-trace-js#8638) and the "different wraps in different CPEDs for the same span" stale-record edge case go away — there's exactly one record per span across the whole lifetime, and mutations via wrap.appendAttributes propagate naturally because the native realloc-on-append path updates the wrap's record_ pointer in place, never the JS wrapper.
`CtxWrap` leaks the C++ ObjectWrap implementation detail — as far as
the JS API is concerned, the object IS the thread context. Rename:
- The JS-visible class name (SetClassName) and TS interface:
`CtxWrap` → `ThreadContext`.
- The TS constructor interface: `CtxWrapCtor` → `ThreadContextCtor`.
- The addon constructor export: `addon.ctxWrap` → `addon.threadContext`.
- The non-Linux fallback class: `NoopCtxWrap` → `NoopThreadContext`.
- Parameter/variable names previously named `wrap` (or `_wrap`) →
`context` (or `_context`).
The native C++ class itself stays `CtxWrap` — that's the conventional
ObjectWrap-ish naming on the C++ side and isn't user-visible.
Error messages updated to refer to "ThreadContext" too ("not a
ThreadContext", "ThreadContext must be called with `new`", etc.).
Adds a Node.js writer for the OpenTelemetry Thread Local Context Record (OTEP-4947), ported from the in-development upstream at polarsignals/custom-labels#16.
What's in this branch
Native addon (
bindings/)otel-thread-ctx.cc+otel-thread-ctx.hh: the writer, namespaced asdd::OtelThreadCtx::Init(exports)and called frombinding.cc.otel_thread_ctx_nodejs_v1stays inextern "C"at file scope so it's exported by name through thedd_pprof.nodedynsym table. It's a 32-byte struct holding the four fields a reader needs:cped_slot(V8 isolate'sContinuationPreservedEmbedderDataslot pointer),als_handle(Global<Object>to the writer'sAsyncLocalStorage),als_identity_hash(JS identity hash for hash-bucket narrowing), andundefined_addr(cached per-isolate undefined singleton for clean "no context" detection by the reader).record_pointer in place, so every async-context frame that holds the same wrapper reference observes the new buffer (no per-CPED record divergence).binding.gypadds-mtls-dialect=gnu2on x86_64 Linux (required by OTEP-4947 for TLSDESC; on arm64 TLSDESC is the only dynamic TLS model so no flag is needed).TypeScript layer (
ts/src/otel-thread-ctx.ts)API surface (Linux only; no-op stubs elsewhere). The shape mirrors the wall profiler's
setContext/getContextpattern: callers construct oneThreadContextper logical scope (typically per trace span) and re-install the same JS reference on every async-context fire.class ThreadContext— constructable from JS vianew pprof.otelThreadCtx.ThreadContext(traceId, spanId, attributes?). Instance methods:appendAttributes(attrs)(append-only, mutates the record in place),isTruncated(), anddebugBytes()(debug-only). Lifetime is GC-managed: the underlying native record is freed when no JS or async-context-frame reference survives.getContext(): ThreadContext | undefined— returns whatever's attached to the active async-context frame.setContext(ctx)— installs (or detaches, withundefined) a context in the active async-context frame. Re-installing the same reference across many frames is cheap (no allocation); per-span caching of the context object is the intended usage.runWithContext(ctx, fn)— scoped variant:als.run-style.makeNamedContext(keys)— returns aNamedContextwithbuildContext(opts)(resolvesnamedAttributesagainst the captured key list and constructs aThreadContext) plus sugar methods (enterWithContext/runWithContext/clearContext) that composebuildContextwith the top-level functions. It also exposesprocessContextAttributes, a frozen snapshot of the OTEP-4719 entries the caller should publish.traceId/spanIdare rawUint8Array(16 and 8 bytes;Bufferworks as a subclass).attributesis positional: index N is the value for uint8 key index N on the wire; null/undefined/holes are skipped. Per-value cap of 255 UTF-8 bytes (uint8 length prefix), total attrs_data cap of 612 bytes (OTEP-recommended 640-byte total record minus 28-byte header).Process-context attributes
makeNamedContext(keys).processContextAttributesexposes a frozen snapshot ready to spread into whatever OTEP-4719 process-context publisher the application uses:Tests
Mocha suite under
ts/test/test-otel-thread-ctx.tscovering:ThreadContextconstruction and input parsing, on-the-wire record encoding (including multibyte UTF-8 truncation at the 255-byte per-value cap), the 612-byte cap andisTruncated, in-place append vs geometric realloc, async propagation,setContext/getContext/runWithContextlifecycle and idempotency,NamedContext.buildContextname resolution,processContextAttributesshape and immutability, and areadelf --dyn-symscheck that the TLS symbol is exported with the right binding / visibility / type. The whole describe block is skipped on non-Linux.A second commit adds a
scripts/docker/Dockerfile + launcher and atest:dockernpm script that builds the addon and runs the full test suite in a Linux container — useful for running the new tests from macOS dev machines. End-to-end verified at 158 passing (96 existing pprof tests + the new suite).