Skip to content

Port OTEP-4947 thread-context writer from polarsignals/custom-labels#347

Draft
szegedi wants to merge 18 commits into
mainfrom
otel-thread-ctx
Draft

Port OTEP-4947 thread-context writer from polarsignals/custom-labels#347
szegedi wants to merge 18 commits into
mainfrom
otel-thread-ctx

Conversation

@szegedi

@szegedi szegedi commented Jun 9, 2026

Copy link
Copy Markdown

Adds a Node.js writer for the OpenTelemetry Thread Local Context Record (OTEP-4947), ported from the in-development upstream at polarsignals/custom-labels#16.

What's in this branch

Native addon (bindings/)

  • otel-thread-ctx.cc + otel-thread-ctx.hh: the writer, namespaced as dd::OtelThreadCtx::Init(exports) and called from binding.cc.
  • The discovery TLS symbol otel_thread_ctx_nodejs_v1 stays in extern "C" at file scope so it's exported by name through the dd_pprof.node dynsym table. It's a 32-byte struct holding the four fields a reader needs: cped_slot (V8 isolate's ContinuationPreservedEmbedderData slot pointer), als_handle (Global<Object> to the writer's AsyncLocalStorage), als_identity_hash (JS identity hash for hash-bucket narrowing), and undefined_addr (cached per-isolate undefined singleton for clean "no context" detection by the reader).
  • Records use a C99 flexible-array-member tail and are right-sized to fit the encoded attribute payload, with a 36-byte attrs_data floor (one cache line total — matches the OTEP "frugal writer" guidance) and ×2 geometric growth on append, capped at the OTEP-recommended 612-byte attrs_data ceiling. Append-on-realloc updates the wrapper's internal record_ pointer in place, so every async-context frame that holds the same wrapper reference observes the new buffer (no per-CPED record divergence).
  • binding.gyp adds -mtls-dialect=gnu2 on x86_64 Linux (required by OTEP-4947 for TLSDESC; on arm64 TLSDESC is the only dynamic TLS model so no flag is needed).

TypeScript layer (ts/src/otel-thread-ctx.ts)

API surface (Linux only; no-op stubs elsewhere). The shape mirrors the wall profiler's setContext/getContext pattern: callers construct one ThreadContext per logical scope (typically per trace span) and re-install the same JS reference on every async-context fire.

  • class ThreadContext — constructable from JS via new pprof.otelThreadCtx.ThreadContext(traceId, spanId, attributes?). Instance methods: appendAttributes(attrs) (append-only, mutates the record in place), isTruncated(), and debugBytes() (debug-only). Lifetime is GC-managed: the underlying native record is freed when no JS or async-context-frame reference survives.
  • getContext(): ThreadContext | undefined — returns whatever's attached to the active async-context frame.
  • setContext(ctx) — installs (or detaches, with undefined) a context in the active async-context frame. Re-installing the same reference across many frames is cheap (no allocation); per-span caching of the context object is the intended usage.
  • runWithContext(ctx, fn) — scoped variant: als.run-style.
  • makeNamedContext(keys) — returns a NamedContext with buildContext(opts) (resolves namedAttributes against the captured key list and constructs a ThreadContext) plus sugar methods (enterWithContext / runWithContext / clearContext) that compose buildContext with the top-level functions. It also exposes processContextAttributes, a frozen snapshot of the OTEP-4719 entries the caller should publish.

traceId/spanId are raw Uint8Array (16 and 8 bytes; Buffer works as a subclass). attributes is positional: index N is the value for uint8 key index N on the wire; null/undefined/holes are skipped. Per-value cap of 255 UTF-8 bytes (uint8 length prefix), total attrs_data cap of 612 bytes (OTEP-recommended 640-byte total record minus 28-byte header).

Process-context attributes

makeNamedContext(keys).processContextAttributes exposes a frozen snapshot ready to spread into whatever OTEP-4719 process-context publisher the application uses:

{
    'threadlocal.schema_version': 'nodejs_v1',
    'threadlocal.attribute_key_map': [...keys],
    // V8 layout constants captured from the V8 headers the addon was
    // compiled against — let the reader walk our wrapper and V8's
    // OrderedHashMap layout without doing its own V8-internal-symbol
    // lookups for the pointer-compression / sandbox state.
    'threadlocal.nodejs_v1.wrapped_object_offset': 24,
    'threadlocal.nodejs_v1.tagged_size': 8,
}

Tests

Mocha suite under ts/test/test-otel-thread-ctx.ts covering: ThreadContext construction and input parsing, on-the-wire record encoding (including multibyte UTF-8 truncation at the 255-byte per-value cap), the 612-byte cap and isTruncated, in-place append vs geometric realloc, async propagation, setContext/getContext/runWithContext lifecycle and idempotency, NamedContext.buildContext name resolution, processContextAttributes shape and immutability, and a readelf --dyn-syms check that the TLS symbol is exported with the right binding / visibility / type. The whole describe block is skipped on non-Linux.

A second commit adds a scripts/docker/ Dockerfile + launcher and a test:docker npm script that builds the addon and runs the full test suite in a Linux container — useful for running the new tests from macOS dev machines. End-to-end verified at 158 passing (96 existing pprof tests + the new suite).

@datadog-prod-us1-3

datadog-prod-us1-3 Bot commented Jun 9, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 17 Pipeline jobs failed

Build | asan (24)   View in Datadog   GitHub Actions

Build | build / alpine-test-24   View in Datadog   GitHub Actions

Build | build / centos-test-22   View in Datadog   GitHub Actions

View all 17 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e03bacc | Docs | Datadog PR Page | Give us feedback!

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Overall package size

Self size: 2.42 MB
Deduped: 3.12 MB
No deduping: 3.12 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | pprof-format | 2.2.2 | 500.53 kB | 500.53 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@szegedi szegedi added the semver-minor Usually minor non-breaking improvements label Jun 11, 2026
@szegedi szegedi force-pushed the otel-thread-ctx branch 3 times, most recently from 5d8416f to 3e52394 Compare June 11, 2026 17:30

@ivoanjo ivoanjo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really excited to see this progress! Left a few notes

Comment thread ts/src/otel-thread-ctx.ts Outdated
* application hands to its OTEP-4719 process-context publisher.
*/
export interface ProcessContextAttributes {
readonly 'threadlocal.schema_version': 'nodejs_v1';

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Should we start with nodejs_v1_dev in case we still want to "change our minds" for a while? That way if it turns out that for some reason wrapped_object_offset / tagged_size might need to be changed, we can still change it without creating confusion for the reader.

Suggested change
readonly 'threadlocal.schema_version': 'nodejs_v1';
readonly 'threadlocal.schema_version': 'nodejs_v1_dev';

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, absolutely

Comment thread ts/src/otel-thread-ctx.ts Outdated
Comment on lines +58 to +59
readonly 'threadlocal.nodejs_v1.wrapped_object_offset': number;
readonly 'threadlocal.nodejs_v1.tagged_size': number;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I mentioned this on slack as well -- since the schema_version already identifies the expected keys and their semantics, I'm not sure namespacing again here is needed.

Suggested change
readonly 'threadlocal.nodejs_v1.wrapped_object_offset': number;
readonly 'threadlocal.nodejs_v1.tagged_size': number;
readonly 'threadlocal.wrapped_object_offset': number;
readonly 'threadlocal.tagged_size': number;

Comment thread ts/src/otel-thread-ctx.ts
*/
export interface ProcessContextAttributes {
readonly 'threadlocal.schema_version': 'nodejs_v1';
readonly 'threadlocal.attribute_key_map': readonly string[];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplification of the reader, we're thinking of for now adopting a restriction here -- attribute_key_map can only ever grow (up to the limit of 255 keys) and never shrink. Would that be ok for our nodejs plans? E.g. do you foresee any issues?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all; thanks for raising it as a concern. Currently in my dd-trace-js bits (no PR yet, only a WIP branch at https://github.com/DataDog/dd-trace-js/tree/otel-thread-context-writer) I register the datadog.{trace_endpoint|thread_name|thread_id} keys defined in this document on tracer library startup and they remain fixed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More for future maintenance, maybe here on this file or as a README.md in this folder, I suggest mentioning that these files are a copy-paste of https://github.com/polarsignals/custom-labels/blob/otel-thread-ctx-wip/js/ and that in the future we plan to unify them and drop our vendored copy.

This will help everyone keep this link in mind, as otherwise it might not be obvious that we fully intend this file to be a vendored copy and in particular it shouldn't evolve separately from upstream (or at least not ideally -- would complicate merging back).

Comment on lines +572 to +575
void OtelThreadCtx::Init(Local<Object> exports) {
CtxWrap::Init(exports);
NODE_SET_METHOD(exports, "otelThreadCtxStoreAls", StoreAls);
NODE_SET_METHOD(exports, "otelThreadCtxGetStoredAlsHash", GetStoredAlsHash);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious -- why change the name of the class and the namespace vs upstream? (E.g. asking from the POV of "if the diff to upstream is minimal, it's way easier to keep them in-sync")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was lately iterating in this repo as I'm integrating with dd-trace-js; I'll soon make a follow-up PR in the polarsignals repo, especially since the planned API also didn't survive contact with the integration. (It didn't change much, but I decoupled creation of a record from its activation so that a span can create and store a single record it owns, and then make it active whenever it itself becomes active.) I was also bothered by the name CtxWrap as it being a "wrap" is an internal implementation detail of how Node.js creates "wrappers" for native objects. But I see there's still a remaining instance of the name – I'll probably fix that too :-)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also looking at further minimising the diffs, and there's some low-hanging fruit, e.g. in pprof-nodejs we use Google-derived .clang-format and the polarsignals repo has none. Doing a one-time formatting in polarsignals repo reduced the diff from 412 lines to 170.

There are some things that need to stay as they are, e.g. the dd C++ namespace here in the port.

There's also a pretty big difference in the fact that polarsignals repo is JavaScript, and pprof-nodejs is rather deliberately TypeScript, so in PS we have a .js file + a .d.ts file while in pprof-nodejs we only have only a single .ts file. We can't really reconcile this lexically. I'm using Claude to keep the two in sync, it's pretty good at it. The test harness has a similar issue in addition to pprof-nodejs using Mocha while polarsignals uses Node.js built-in asserts.

Fortunately TS and JS are not that much different and the JS/TS code part is fairly small.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also benefit from my note on otel-thread-ctx.cc -- making it clear via comments or in a README somewhere where this comes from and keeping the diff as minimal as possible.

szegedi added 18 commits June 23, 2026 13:41
Ports the in-development OpenTelemetry thread-context writer that
lives on the otel-thread-ctx-node branch of polarsignals/custom-labels
(szegedi fork) into this project. The two codebases will likely
diverge again later; for now this is a snapshot of the current state.

Structurally:
- bindings/otel-thread-ctx.cc/.hh: the native addon code, namespaced
  in `dd::` and exposed via OtelThreadCtx::Init(exports) called from
  binding.cc. The thread_local otel_thread_ctx_nodejs_v1 discovery
  symbol stays in extern "C" at file scope so it's exported by name
  through the dd_pprof.node dynsym table.
- ts/src/otel-thread-ctx.ts: the runWithContext / enterWithContext /
  makeNamedContext API, loading the native addon via node-gyp-build
  like the rest of this project.
- ts/test/test-otel-thread-ctx.ts: mocha port of the node:test suite.
  Skipped wholesale on non-Linux.
- binding.gyp: adds bindings/otel-thread-ctx.cc to both target source
  lists and the -mtls-dialect=gnu2 cflag on x86_64 Linux (required by
  the OTEP-4947 spec; on arm64 TLSDESC is the only dynamic TLS model
  so no flag is needed).

Verified by mocha against the built dd_pprof.node in a Linux
container (Node 22 with --experimental-async-context-frame):
35 passing.
Mirrors the test:docker mechanism in custom-labels/js: a Dockerfile
under scripts/docker/ extending node:24-bookworm with python3 and
build-essential, plus a launcher script that builds the image
(cached), mounts the repo read-only, copies it into /tmp/work inside
the container, and runs `npm install && npm test`. The host tree is
never modified (no stray node_modules/, build/, out/).

Node 24 is used so the full test suite — including the new OTEP-4947
thread-context tests, which need AsyncContextFrame — runs without
extra Node flags.

Run via `npm run test:docker`.
- Add a static_assert that offsetof(CtxWrap, record_) ==
  sizeof(node::ObjectWrap), since that offset is part of the reader ABI.
  Restructure CtxWrap so record_, capacity_, and truncated_ live in a
  single public access section: C++ leaves cross-access-control field
  ordering implementation-defined, so splitting them would allow a
  conforming compiler to reorder the bookkeeping fields ahead of
  record_.
- Add an acq_rel signal fence between the pointer swap and free() in
  the reallocate path. The pre-existing release fence only constrains
  prior writes; nothing was stopping the compiler from hoisting free()
  above the publication store, which would let a stopped reader follow
  self->record_ into freed memory.
- Restore the [[unlikely]] annotation on the IsConstructCall() check.
- Misc local cleanups (std::min/max in two spots, assert valid==1 after
  the memcpy instead of redundantly setting it).
The CtxWrap accessor that returns the raw record as a Uint8Array is only
intended for tests and out-of-process-reader development. Naming it
DebugBytes (and exposing it as wrap.debugBytes() on the JS prototype)
makes that explicit at every call site.
v8::internal::Internals::kContinuationPreservedEmbedderDataOffset was
introduced in Node 22. Older versions don't have
ContinuationPreservedEmbedderData at all, and the TS layer already
refuses to install the hook there via asyncContextFrameError, so
StoreAls is never actually invoked on Node < 22 — we just need the
addon to compile so the package installs.
Two more Node-version-sensitive spots blocking the prebuild against
Node 18.0.0 headers:

- ArrayBuffer::Data() wasn't exposed in V8 10.1 (Node 18.0). Switch to
  GetBackingStore()->Data(), which has been available since V8 7.4 /
  Node 12. The shared_ptr atomic is a per-call cost in CopyBytes
  (twice per CtxWrap::New) and DebugBytes — neither is a hot path.
- kEmbedderDataSlotExternalPointerOffset is Node 22+. Same shape as the
  earlier kContinuationPreservedEmbedderDataOffset guard: publish a
  sentinel 0 on older Node so the addon's exported surface stays
  consistent across majors. A would-be reader can't reach a live record
  through it anyway (no CPED on Node < 22).
…removal)

Node 26 ships V8 14.x, which removes the old String::Utf8Length /
WriteUtf8 / NO_NULL_TERMINATION trio in favor of the V2 versions, and
removes Object::GetIsolate() entirely. Switch the encode loop to the
V2 forms on Node >= 24 (Node 22 ships V8 12.4 which never gets V2;
Node 24's V8 13.6 has both, Node 26's V8 14.x has only V2). Replace
exports->GetIsolate() with Isolate::GetCurrent() unconditionally —
they're equivalent during module init and the latter is the only
version that survives Node 26.
- Reformat ts/test/test-otel-thread-ctx.ts via gts (prettier).
- Drop unused parameter declarations from the non-Linux stubs in
  ts/src/otel-thread-ctx.ts (they were carrying _-prefix names that
  gts's eslint still flags); TS allows fewer params on the assigned
  function than the declared variable's signature requires.
- Use strict ==/!= equality instead of loose null-check.
- Disable no-sparse-arrays in the test file: holes in attribute arrays
  are part of the wire format we're verifying.
- Use `void` prefix on the runWithContext() call inside the sync test
  whose return type confuses no-floating-promises.
__attribute__((visibility("default"))) is a GCC/Clang extension that
MSVC doesn't recognize, breaking the Windows prebuild. Guard it
behind __GNUC__/__clang__. Visibility is irrelevant on Windows anyway
since the OTEP-4947 reader contract is ELF-TLSDESC and only meaningful
on Linux.
Two cases the prior unconditional describe didn't cover:

- AsyncContextFrame (the writer's discovery substrate) is opt-in on
  Node 22/23 (requires --experimental-async-context-frame), on by
  default in Node 24+ (disable-able via --no-async-context-frame), and
  absent on Node < 22. The TS layer's asyncContextFrameError refuses
  to install the hook in each of those cases; the test now mirrors
  the same predicate so the suite is skipped instead of failing every
  test with "feature unavailable".

- The discovery-contract test reads the addon binary from
  build/Release/dd_pprof.node, which exists only on the
  build-from-source path. The prebuild-install / node-gyp-build CI
  matrix uses a prebuilt binary from prebuilds/, so the path doesn't
  exist there. Skip that one test when the file isn't present.
Add an otelThreadCtx namespace alongside time/heap so consumers can
reach the writer (runWithContext, enterWithContext, clearContext,
appendAttributes, isContextTruncated, makeNamedContext) via
require('@datadog/pprof').otelThreadCtx without importing internal
paths. The debug-only _currentRecordBytes accessor stays unexposed.
…elpers

Reshape the otelThreadCtx namespace around an explicitly-allocated
CtxWrap object so consumers can cache one record per span and re-install
it without allocating churn:

- The native CtxWrap class is now constructable from JS via
  `new pprof.otelThreadCtx.CtxWrap(traceId, spanId, attributes?)`.
- New top-level functions get/set/runWithContext take a wrap reference
  (or undefined). `getContext() === wrap` becomes the cheap identity
  check that replaces the previous currentSpanIdMatches dance.
- The opts-form helpers (`enterWithContext`, `appendAttributes`,
  `clearContext`, `isContextTruncated`, `currentSpanIdMatches`) are
  removed at the top level. Callers go through the wrap directly:
  `wrap.appendAttributes(...)`, `wrap.isTruncated()`,
  `setContext(undefined)`.
- NamedContext stays as a name→index resolver: `buildContext(opts)`
  returns a CtxWrap with attributes resolved positionally;
  `runWithContext` / `enterWithContext` / `clearContext` are kept as
  one-liner sugar that compose with the new top-level functions.

Native: the CtxWrap class is registered with its new name (was
"OtelThreadCtxWrap") and the per-instance "append" method is now
"appendAttributes" for parity with the JS-side phrasing. The
SpanIdMatches binding is dropped.

Reasoning: under AsyncContextFrame each fork inherits the wrap by
reference, so once dd-trace-js caches one CtxWrap per span and
re-installs it on every storage:enter, both the allocation churn we
saw in the wall profiler (PR dd-trace-js#8638) and the "different
wraps in different CPEDs for the same span" stale-record edge case
go away — there's exactly one record per span across the whole
lifetime, and mutations via wrap.appendAttributes propagate
naturally because the native realloc-on-append path updates the
wrap's record_ pointer in place, never the JS wrapper.
`CtxWrap` leaks the C++ ObjectWrap implementation detail — as far as
the JS API is concerned, the object IS the thread context. Rename:

- The JS-visible class name (SetClassName) and TS interface:
  `CtxWrap` → `ThreadContext`.
- The TS constructor interface: `CtxWrapCtor` → `ThreadContextCtor`.
- The addon constructor export: `addon.ctxWrap` → `addon.threadContext`.
- The non-Linux fallback class: `NoopCtxWrap` → `NoopThreadContext`.
- Parameter/variable names previously named `wrap` (or `_wrap`) →
  `context` (or `_context`).

The native C++ class itself stays `CtxWrap` — that's the conventional
ObjectWrap-ish naming on the C++ side and isn't user-visible.

Error messages updated to refer to "ThreadContext" too ("not a
ThreadContext", "ThreadContext must be called with `new`", etc.).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver-minor Usually minor non-breaking improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants