Add span-derived primary tags (CSS v1.3.0) by dougqh · Pull Request #11402 · DataDog/dd-trace-java

dougqh · 2026-05-18T14:37:04Z

Summary

Sits on top of #11389. Implements Client-Side Stats v1.3.0 span-derived primary tags on the new producer/consumer ClientStatsAggregator architecture. Users configure DD_TRACE_STATS_ADDITIONAL_TAGS (comma-separated tag keys); the tracer extracts the matching span tag values and includes them as additional aggregation dimensions on ClientGroupedStats.AdditionalMetricTags.

Mirrors the design and constants from the PoC PR (#11358), translated onto the producer/consumer split: all canonicalization (length cap → blocked sentinel, UTF8BytesString interning, cardinality cap) runs on the aggregator thread; the producer just captures raw values into a String[] parallel to the schema.

Design

Wire format: new AdditionalMetricTags field on ClientGroupedStats, emitted as repeated string of "<key>:<value>" entries (mirrors PeerTags). Schema-ordered (alphabetical by key); null slots skipped; field omitted when no slots are populated so customers who don't configure additional tags pay zero payload overhead.
Configuration:
- DD_TRACE_STATS_ADDITIONAL_TAGS / dd.trace.stats.additional.tags — comma-separated tag keys.
- DD_TRACE_STATS_ADDITIONAL_TAGS_CARDINALITY_LIMIT / dd.trace.stats.additional.tags.cardinality.limit — default 100; ≤ 0 → warn + fallback to 100.
Cardinality protection (constants from the PoC):
- MAX_ADDITIONAL_TAG_KEYS = 10 — configured-key count cap. Excess keys dropped at startup with a warn log.
- MAX_ADDITIONAL_TAG_VALUE_LENGTH = 250 — per-value length cap. Overlong values get the per-key "<key>:blocked_by_tracer" sentinel.
- Per-bucket stat-entry cap (default 100). When the bucket is full, brand-new entries with additional tags have all their present slots replaced by the per-key blocked sentinel, so they collapse into a small number of "shape" entries rather than fragmenting (or polluting the no-additional-tags base bucket).
Threading: aggregator thread is the sole writer of the table + the cardinality limiter, so the counter is a plain int (no AtomicInteger overhead).
Acknowledged spec deviation: single-global counter for per-bucket cardinality (matches the PoC). A misconfigured tag can starve another tag's admission of new entries within a bucket, but every span still gets emitted with its dimension keys preserved (values masked).

What's new vs. PoC

All canonicalization moved to the aggregator thread (per design discussion). Producer path: unsafeGetTag(name) per configured key → String[] parallel to the schema. No length-cap work on the producer thread.
Schema is immutable, built once at construction; no per-trace sync.
Per-key blocked sentinels pre-built as UTF8BytesString at schema construction (used by length-cap collapse and bucket-cap collapse, both substituting into Canonical.additionalTagsBuffer then re-hashing).
Wire emission walks the pre-built UTF8BytesString[] on the entry, writing each non-null slot directly — no per-write byte composition.

New files

dd-trace-core/src/main/java/datadog/trace/common/metrics/AdditionalTagsSchema.java
dd-trace-core/src/main/java/datadog/trace/common/metrics/AdditionalTagsCardinalityLimiter.java
dd-trace-core/src/test/java/datadog/trace/common/metrics/AdditionalTagsSchemaTest.java
dd-trace-core/src/test/java/datadog/trace/common/metrics/AdditionalTagsCardinalityLimiterTest.java
dd-trace-core/src/test/java/datadog/trace/common/metrics/AggregateTableAdditionalTagsTest.java
dd-trace-core/src/test/java/datadog/trace/common/metrics/SerializingMetricWriterAdditionalTagsTest.java

Health metric

HealthMetrics.onAdditionalTagValueCardinalityBlocked(String tagKey) — fires for both length-blocked and bucket-cap-blocked values (per masked slot).
TracerHealthMetrics surfaces this as stats.additional_tag.cardinality_blocked (untagged counter).

Benchmarks

Cardinality-isolation companions (8 producer threads, 2×15s warmup + 5×15s)

HighCardinalityResourceMetricsBenchmark and HighCardinalityPeerMetricsBenchmark (added in #11381) pin every dimension except one. The benchmarks set no additional tags, so they measure the cost of the additional-tags plumbing being threaded through the pipeline but not actually populated. Re-measured 2026-05-26 after master sync (master now includes #11381 and #11444's UTF8BytesString hashCode caching). This PR was re-run with 3 forks after a single-fork outlier showed an apparent regression; 3-fork numbers below. The rest of the stack used the standard 1-fork config.

HighCardinalityResourceMetricsBenchmark — only resource varies (~1M distinct):

	master (1f)	#11382 (1f)	#11387 (1f)	#11389 (1f)	this PR (3f)
Throughput avg (ops/s)	5,958,808 ± 383K	39,589,423 ± 2.52M	26,406,668 ± 2.55M	25,043,910 ± 1.77M	24,253,320 ± 3.70M
`onStatsAggregateDropped`	338,983,453	17,692,824	0	0	0

HighCardinalityPeerMetricsBenchmark — only peer.hostname varies (~32K distinct):

	master (1f)	#11382 (1f)	#11387 (1f)	#11389 (1f)	this PR (3f)
Throughput avg (ops/s)	9,223,002 ± 5.53M	37,856,491 ± 10.07M	23,056,495 ± 1.05M	25,504,264 ± 1.77M	25,976,286 ± 2.78M
`onStatsAggregateDropped`	185,595,358	16,431,892	0	0	0

Conclusion: on both axes this PR is within noise of #11389. A 1-fork run had landed at 21.12M (resource) / 27.50M (peer); the 3-fork re-run shows that 1 fork in 3 hits a sticky-bad JIT compilation early and stays there for all 5 measurement iterations, dragging the single-fork mean. Per-fork breakdown (resource): fork1 ~19.6M, fork2 ~26.5M, fork3 ~26.7M. Per-fork (peer): fork1 ~27.9M, fork2 ~27.4M, fork3 ~22.6M. With the outlier averaged out, the additional-tags plumbing carries no measurable cost when additionalTagsSchema is absent — the per-snapshot path is a single null check on the schema and an early return in Canonical.populateAdditionalTags.

onStatsAggregateDropped = 0 on both, confirming the cardinality cap from #11387 keeps holding through #11402's plumbing changes.

A benchmark with additionalTagsSchema populated (so the new code path actually runs) is a follow-up — the current HighCardinality* benches were authored before this feature existed and don't exercise it.

Test plan

:dd-trace-core:test --tests "datadog.trace.common.metrics.*" — all pass (existing + four new test files for the feature).
Schema normalization: alphabetical sort, dedupe, cap at 10.
Length cap: 251+ char values collapse to "<key>:blocked_by_tracer".
Per-bucket cap: 3rd unique value after a cap of 2 collapses to the sentinel; existing entries still hit normally.
Wire format: AdditionalMetricTags field present with schema-ordered "key:value" entries; omitted when nothing matches; null slots skipped.
No producer-thread regression (canonicalization stays on the aggregator).

Notes for reviewers

The PoC PR (implementation for span derived primary tags #11358) ships against master and ConflatingMetricsAggregator. This branch sits on top of Memory-efficiency pass on ClientStatsAggregator + adversarial benchmark #11389 (memory-efficiency stack) and ships against the producer/consumer-split ClientStatsAggregator. The behavior reachable from outside the metrics package is intended to be the same modulo the spec deviation noted above.
Out of scope for this PR (deferred per spec): setMeasuredTag(key, value) programmatic API, DD_TAGS handling.

🤖 Generated with Claude Code

Implements the span-derived primary tags feature on the new producer/ consumer architecture: users configure DD_TRACE_STATS_ADDITIONAL_TAGS (comma-separated tag keys); the tracer extracts the matching span tag values and includes them as additional aggregation dimensions on ClientGroupedStats.AdditionalMetricTags. Design choices, matched to the PoC where reasonable: - Wire format: repeated string of "<key>:<value>" entries, in schema (alphabetical-by-key) order; field omitted when no slots are populated. Customers who don't configure additional tags pay zero payload overhead. - Cardinality protection: MAX_ADDITIONAL_TAG_KEYS = 10 -- configured-key count cap; MAX_ADDITIONAL_TAG_VALUE_LENGTH = 250 -- per-value length cap; DD_TRACE_STATS_ADDITIONAL_TAGS_CARDINALITY_LIMIT = 100 (config- urable, <=0 -> warn + fallback) -- per-bucket stat-entry cap. - Single-global counter for the per-bucket cap, single-threaded (aggregator thread is the sole writer of the table + limiter), so a plain int suffices -- no AtomicInteger. - All canonicalization stays on the aggregator thread, consistent with the rest of the post-redesign pipeline: producer just captures raw String values into SpanSnapshot.additionalTagValues parallel to the schema; Canonical.populate applies the length cap and builds the per-slot UTF8BytesString "key:value" form; AggregateTable.findOrInsert applies the bucket cap by rebuilding the canonical with per-key blocked sentinels if needed. - Acknowledged spec deviation: single-global counter rather than per-tag isolation. A misconfigured tag can starve another tag's admission of new entries within a bucket, but every span still gets emitted with its dimension keys preserved (values masked). Adds onAdditionalTagValueCardinalityBlocked(String tagKey) callback on HealthMetrics and TracerHealthMetrics's "stats.additional_tag.cardin- ality_blocked" counter (length-blocks + bucket-cap blocks). Test coverage: - AdditionalTagsSchemaTest: empty-config sentinel, sort+dedupe+cap, per-key blocked sentinels. - AdditionalTagsCardinalityLimiterTest: length cap behavior, counter + cap + reset, recordCardinalityBlock health-metric firing. - AggregateTableAdditionalTagsTest: distinct/same identity, overlong values collapse to one entry, cardinality cap collapses new entries to the blocked sentinel while existing entries continue. - SerializingMetricWriterAdditionalTagsTest: AdditionalMetricTags wire field shape, omission when empty, null-slot skip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateTable.java # dd-trace-core/src/main/java/datadog/trace/common/metrics/Aggregator.java # dd-trace-core/src/main/java/datadog/trace/common/metrics/ClientStatsAggregator.java # dd-trace-core/src/main/java/datadog/trace/common/metrics/SerializingMetricWriter.java # dd-trace-core/src/main/java/datadog/trace/core/monitor/HealthMetrics.java # dd-trace-core/src/main/java/datadog/trace/core/monitor/TracerHealthMetrics.java

datadog-official · 2026-05-20T04:01:17Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

DataDog/apm-reliability/dd-trace-java | spotless

🔧 Fix in code (Fix with Cursor).
Formatting issues detected in src/main/java/datadog/trace/api/Config.java. Run './gradlew spotlessApply' to fix violations.

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 3b63dad | Docs | Datadog PR Page | Give us feedback!}

…rbitrary-tags

Drops the fixed-size additionalTagsBuffer sized at Canonical construction time. The buffer is now growable, and Canonical tracks additionalTagsCount = snapshot.additionalTagsSchema.size() per populate -- length-aware hash, match, and toEntry use the (buffer, count) pair, mirroring how peer tags already work. AggregateTable and Aggregator drop their schema parameters since Canonical no longer needs one; schema lives where it's used (ClientStatsAggregator + the snapshot). AdditionalTagsMetricsBenchmark mirrors AdversarialMetricsBenchmark for the additional-tags hot path: two configured keys with a per-key cardinality cap of 100, unique values per op so the cap saturates fast. Catches future regressions on producer-side capture, schema.register, and the per-cycle block-counter flush. Adds an onTagCardinalityBlocked override to the shared CountingHealthMetrics so both benchmarks observe the new flush counter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rbitrary-tags

…11387 review) Companion to commit on dougqh/control-tag-cardinality. The @nullable additions here apply only to the downstream lazy-errorLatencies / Canonical buffer state and therefore can't ride along on the #11387 commit; landing them on the tip where those features actually exist. - @nullable on errorLatencies field (lazy-init, null until first error) - @nullable on getErrorLatencies() return - @nullable on Canonical.populatePeerTags / populateAdditionalTags schema + values params Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rbitrary-tags

Ports the adversarial JMH benchmark from #11402 down to this branch so we can compare #11381 vs master on a high-cardinality, high-throughput workload. Adapted to use ConflatingMetricsAggregator (pre-rename) and the FixedAgentFeaturesDiscovery / NullSink helpers already in ConflatingMetricsAggregatorBenchmark. 8 producer threads hammer publish() with unique (service, operation, resource, peer.hostname) per op so the aggregate cache fills+evicts continuously and the inbox saturates. tearDown prints the drop counters (inboxFull vs aggregateDropped) so the test verifies the subsystem stayed bounded under attack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rbitrary-tags

…rbitrary-tags Resolved conflicts: - AggregateEntry.java: dropped AtomicLongArray import (recordDurations batch API was removed upstream), kept javax.annotation.Nullable import (still used for @nullable on the lazy errorLatencies field). - AdversarialMetricsBenchmark.java: merged the upstream LongAdder upgrade with this branch's tagCardinalityBlocked field -- now all three counters use LongAdder (inboxFull, aggregateDropped, tagCardinalityBlocked). - AdditionalTagsMetricsBenchmark.java: dropped the traceComputedCalls and totalSpansCounted printouts (those fields no longer exist on the shared CountingHealthMetrics class), and switched the remaining printouts to .sum() for the LongAdder backed fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…' into dougqh/metrics-arbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

…' into dougqh/metrics-arbitrary-tags

Trim per-span work on metrics aggregator publish path ConflatingMetricsAggregator.publish does a handful of redundant operations on every span. None individually is large; together they show as ~2.5% on the existing JMH benchmark once the benchmark actually exercises span.kind. - dedup span.isTopLevel(): publish() reads it into a local, then shouldComputeMetric read it again. Pass the cached value in. - resolve spanKind to String once: master called toString() twice per span (once inside spanKindEligible, once at the getPeerTags call site) and used HashSet contains on a CharSequence (which routes through equals on String). Normalize to String up front and reuse. - lazy-allocate the peer-tag list: getPeerTags() always allocated an ArrayList sized to features.peerTags() even when the span had none of those tags set. Defer allocation until the first match; return Collections.emptyList() when none hit. MetricKey already treats null/empty peerTags as emptyList, so no behavior change. Drop the spanKindEligible helper — the HashSet.contains call inlines fine in shouldComputeMetric. Update the JMH benchmark to set span.kind=client on every span. Without it the filter path short-circuits before the peer-tag and toString work, so the wins above aren't measurable. With it: baseline 6.755 us/op (CI [6.560, 6.950], stdev 0.129) optimized 6.585 us/op (CI [6.536, 6.634], stdev 0.033) 2 forks x 5 iterations x 15s. ~2.5% mean improvement and much tighter variance fork-to-fork. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Add SpanKindFilter and CoreSpan.isKind for bitmask-based kind checks Introduce SpanKindFilter -- a tiny builder-built immutable filter whose state is an int bitmask indexed by the span.kind ordinals already cached on DDSpanContext. Each include* on the builder sets one bit (1 << ordinal); the runtime check is a single AND against (1 << span's ordinal). CoreSpan.isKind(SpanKindFilter) is the new entry point. DDSpan overrides it to do the bit-test directly against the cached ordinal -- no virtual call, no tag-map lookup. The two existing test-only CoreSpan impls (SimpleSpan and TraceGenerator.PojoSpan, the latter in two source sets) implement isKind by reading the span.kind tag and delegating to SpanKindFilter.matches(String), which converts via DDSpanContext.spanKindOrdinalOf and does the same AND. Refactor: DDSpanContext.setSpanKindOrdinal(String) now delegates to a new package-private static spanKindOrdinalOf(String) so the same string-to-ordinal mapping serves both the tag interceptor path and SpanKindFilter.matches. This is groundwork -- nothing in the codebase calls isKind yet. The next commit will replace the HashSet-based eligibility checks in ConflatingMetricsAggregator with SpanKindFilter instances. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Use SpanKindFilter in ConflatingMetricsAggregator Replace the two ELIGIBLE_SPAN_KINDS_FOR_* HashSet<String> constants and the SPAN_KIND_INTERNAL.equals check with three SpanKindFilter instances: METRICS_ELIGIBLE_KINDS, PEER_AGGREGATION_KINDS, INTERNAL_KIND. Eligibility checks now go through span.isKind(filter), which on DDSpan is a volatile byte read against the already-cached span.kind ordinal plus a single bit-test. Also defer the span.kind tag read: previously read at the top of the publish loop and threaded through both shouldComputeMetric and the inner publish. isKind no longer needs the string, so the read can move down into the inner publish where it's still needed for the SPAN_KINDS cache key / MetricKey. Supporting changes: - DDSpanContext.spanKindOrdinalOf(String) is now public so non-DDSpan CoreSpan impls can compute the ordinal at tag-write time. - SpanKindFilter gains a public matches(byte) fast-path overload that callers with a pre-computed ordinal use directly. - SimpleSpan caches the ordinal in setTag(SPAN_KIND, ...), mirroring what TagInterceptor does for DDSpanContext, and its isKind now hits the byte fast path. Without this, the JMH benchmark (which uses SimpleSpan) would re-derive the ordinal on every isKind call and overstate the cost. Benchmark on the bench updated last commit (kind=client on every span, 4 forks x 5 iter x 15s): prior commit 6.585 ± 0.049 us/op this commit 6.903 ± 0.096 us/op The slight regression is a SimpleSpan-via-groovy-dispatch artifact -- the interface call to isKind through CoreSpan, then through SimpleSpan, then through SpanKindFilter.matches, doesn't fold as aggressively as a HashSet contains on a static field. In production DDSpan.isKind inlines to a context field read + ordinal byte read + bit-test, so the production path is faster than the prior HashSet approach. A DDSpan-based benchmark would show this; the existing SimpleSpan-based one doesn't. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Add DDSpan-based variant of ConflatingMetricsAggregator JMH benchmark The existing ConflatingMetricsAggregatorBenchmark uses SimpleSpan, a groovy mock. That's enough for measuring queue/CHM/MetricKey work, but it conceals the production cost of CoreSpan.isKind: SimpleSpan's isKind goes through groovy interface dispatch into SpanKindFilter.matches, while DDSpan.isKind inlines to a context byte-read + bit-test. This new benchmark uses real DDSpan instances created through a CoreTracer (with a NoopWriter so finishing doesn't reach the agent). Same shape as the SimpleSpan bench (64-span trace, span.kind=client, peer.hostname set). Numbers (2 forks x 5 iter x 15s): master: 6.428 +- 0.189 us/op (HashSet eligibility checks) this branch: 6.343 +- 0.115 us/op (SpanKindFilter bitmask) About 1.3% faster on the production path. The SimpleSpan benchmark in the same conditions shows a ~2.2% slowdown -- the mock's dispatch shape gives a misleading signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Tighten SpanKindFilter encapsulation Make SpanKindFilter.kindMask and its constructor private now that DDSpan.isKind no longer needs direct field access -- it delegates to SpanKindFilter.matches(byte). The Builder.build() in the same outer class still constructs instances via the private constructor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Defer MetricKey construction and cache lookups to the aggregator thread Replace the producer-side conflation pipeline with a thin per-span SpanSnapshot posted to the existing aggregator thread. The aggregator now builds the MetricKey, does the SERVICE_NAMES / SPAN_KINDS / PEER_TAGS_CACHE lookups, and updates the AggregateMetric directly -- all off the producer's hot path. What the producer does now, per span: - filter (shouldComputeMetric, resource-ignored, longRunning) - collect tag values into a SpanSnapshot (1 allocation per span) - inbox.offer(snapshot) + return error flag for forceKeep What moved off the producer: - MetricKey construction and its hash computation - SERVICE_NAMES.computeIfAbsent (UTF8 encoding of service name) - SPAN_KINDS.computeIfAbsent (UTF8 encoding of span.kind) - PEER_TAGS_CACHE lookups (peer-tag name+value UTF8 encoding) - pending/keys ConcurrentHashMap operations - Batch pooling, batch atomic ops, batch contributeTo Removed entirely: - Batch.java -- the conflation primitive is no longer needed; the aggregator's existing LRUCache<MetricKey, AggregateMetric> IS the conflation point now. - pending ConcurrentHashMap<MetricKey, Batch> - keys ConcurrentHashMap<MetricKey, MetricKey> (canonical dedup) - batchPool MessagePassingQueue<Batch> - The CommonKeyCleaner role of tracking keys.keySet() on LRU eviction -- AggregateExpiry now just reports drops to healthMetrics. Added: - SpanSnapshot: immutable value carrying the raw MetricKey inputs + a tagAndDuration long (duration | ERROR_TAG | TOP_LEVEL_TAG). - AggregateMetric.recordOneDuration(long tagAndDuration) -- the single-hit equivalent of the existing recordDurations(int, AtomicLongArray). - Peer-tag values flow through the snapshot as a flattened String[] of [name0, value0, name1, value1, ...]; the aggregator encodes them through PEER_TAGS_CACHE on its own thread. Benchmark results (2 forks x 5 iter x 15s): ConflatingMetricsAggregatorDDSpanBenchmark prior commit 6.343 +- 0.115 us/op this commit 2.506 +- 0.044 us/op (~60% faster) ConflatingMetricsAggregatorBenchmark (SimpleSpan) prior commit 6.585 +- 0.049 us/op this commit 3.116 +- 0.032 us/op (~53% faster) Caveat on the benchmark: without conflation, the producer pushes 1 inbox item per span instead of ~1 per 64. At the benchmark's synthetic rate the consumer can't keep up and inbox.offer silently drops. The numbers measure producer publish() latency only; consumer throughput at realistic span rates is a follow-up to validate. Tuning maxPending matters more in this design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Report aggregator inbox-full drops via health metrics With the per-span SpanSnapshot inbox path, the producer can lose snapshots when the bounded MPSC queue is full -- silently, since inbox.offer() returns a boolean we previously ignored. The conflating-Batch design used to absorb ~64x more producer pressure per inbox slot, so this is a new failure mode worth surfacing. Wire it through the existing HealthMetrics path: - HealthMetrics.onStatsInboxFull() (no-op default). - TracerHealthMetrics gets a statsInboxFull LongAdder and a new reason tag reason:inbox_full reported under the same stats.dropped_aggregates metric used for LRU evictions. Two LongAdders, two tagged time series. - ConflatingMetricsAggregator.publish increments the counter when inbox.offer(snapshot) returns false. This doesn't fix the drop -- tuning maxPending and/or building producer-side batching are the actual fixes. But it makes the failure visible in the same place ops already watches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Merge branch 'master' into dougqh/conflating-metrics-producer-wins Merge branch 'dougqh/conflating-metrics-producer-wins' into dougqh/conflating-metrics-background-work Resize previousCounts for inbox-full health metric The new reason:inbox_full reportIfChanged call advances countIndex to 51, but previousCounts was still sized for 51 counters (max index 50), so the metric never emitted and the resize warning fired every flush. Bump the array to 52 and add a regression test that exercises the flush path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Skip SpanSnapshot allocation when the inbox is already at capacity publish() previously did all of the tag extraction (peer-tag pairs, HTTP method/endpoint, span kind, gRPC status) and the SpanSnapshot allocation before calling inbox.offer; on a full inbox the offer failed and everything became garbage. Early-out with an approximate size() vs capacity() check up front. The jctools MPSC queue's size() is best-effort but that's fine: under- estimation falls through to the existing offer-as-source-of-truth path, over-estimation drops a snapshot that would have fit (and onStatsInboxFull was about to fire on the next span anyway). error is computed first so the force-keep return is correct whether or not the snapshot is built. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Merge remote-tracking branch 'origin/master' into dougqh/conflating-metrics-background-work Introduce slim PeerTagSchema; capture peer-tag values not pairs Addresses sarahchen6's review comment on ConflatingMetricsAggregator extractPeerTagPairs: replaces the worst-case-allocation + trim-and-copy flat-pairs layout with a parallel-array carrier. - New PeerTagSchema: minimal carrier of String[] names. Two flavors -- a static INTERNAL singleton (one entry: base.service) for internal-kind spans, and per-discovery built schemas for client/producer/consumer spans. Deliberately no cardinality limiters or per-cycle state; that layers on top in a later PR. - ConflatingMetricsAggregator: caches the peer-aggregation schema keyed on reference equality of features.peerTags() -- a single volatile read + a long compare on the steady-state producer hot path, no allocation. The producer now captures only a String[] of values parallel to the schema's names; the schema reference is carried on SpanSnapshot. The prior "build worst-case pairs then trim" code is gone. - SpanSnapshot: replaces String[] peerTagPairs with PeerTagSchema + String[] peerTagValues. Producer drops the schema reference if no values fired so the consumer short-circuits on null. - Aggregator.materializePeerTags: now reads name/value pairs at the same index from (schema.names, snapshot.peerTagValues). Counts hits once for exact-size allocation; preserves the singletonList fast path for the common one-entry case (e.g. internal-kind base.service). Producer-side cost goes from "allocate String[2n] + walk + maybe trim" to "single volatile read + walk + lazy String[n] only on first hit". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Address PR #11381 review (round 2) - Aggregator.materializePeerTags: fold the firstHit-discovery nested if into a single guarded post-increment (amarziali, #3279243138). One body line: `if (values[i] != null && hitCount++ == 0) firstHit = i;`. - Drop redundant isKind(SpanKindFilter) overrides in both TraceGenerator.groovy files (amarziali, #3279264553 / #3279382648). CoreSpan.java:84 already supplies a default implementation that reads the same span.kind tag. - Bump TRACER_METRICS_MAX_PENDING default from 2048 -> 131072 to address the capacity regression amarziali flagged (#3279378375). Without producer-side conflation, the inbox now holds 1 SpanSnapshot per metrics-eligible span instead of 1 conflated Batch per ~64 spans; restoring effective capacity parity (~2048 * ~64 = 131072) prevents a ~64x rise in inbox-full drops at the same span rate. ~100 B per SpanSnapshot puts the worst-case heap floor at ~13 MB -- bounded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Cover inbox-full fast-path in ConflatingMetricsAggregator.publish Addresses PR #11381 review (amarziali, #3279325340 -- "Are the existing tests covering this case?"). New ConflatingMetricsAggregatorInboxFullTest constructs the aggregator with a small inbox (queueSize=8), deliberately does NOT call start() so the consumer thread never drains, then publishes enough spans to overflow the inbox. Verifies that healthMetrics.onStatsInboxFull() is called at least once -- the fast-path's `inbox.size() >= inbox.capacity()` short-circuit triggers when the producer-side queue is at capacity. Test is Java + JUnit 5 + Mockito per the project convention for new tests; uses a CoreSpan Mockito mock rather than the SimpleSpan Groovy fixture so we don't depend on Groovy-then-Java compile order from the test source set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Reconcile PeerTagSchema once per reporting cycle on the aggregator thread Addresses amarziali's review comment #3279340181 ("It would be more efficient to trigger from the other side"). The producer-side reference compare on every publish goes away; the aggregator thread reconciles the cached schema against feature discovery once per reporting cycle. - DDAgentFeaturesDiscovery: expose getLastTimeDiscovered() so callers can detect a discovery refresh without copying the peerTags Set. - PeerTagSchema: add `long lastTimeDiscovered` (plain, aggregator-only) and `hasSameTagsAs(Set)`. of(Set, long) takes the timestamp; INTERNAL uses a -1L sentinel since it's never reconciled. - ConflatingMetricsAggregator: * Drop the cachedPeerTagsSource volatile and the per-publish reference compare. * Producer fast path is now `cachedPeerTagSchema` volatile read + null-check; first publish takes the one-time synchronized bootstrap. * Add reconcilePeerTagSchema() that runs once per cycle on the aggregator thread: fast-path timestamp compare, slow-path set compare, bump-in-place when the set is unchanged. - Aggregator: new `Runnable onReportCycle` constructor parameter, run at the start of report() (before the flush, so any test awaiting writer.finishBucket() observes the schema in its post-reconcile state and so the next publish sees the new schema without a handoff). - Update "should create bucket for each set of peer tags" to drive two reporting cycles separated by a report() that triggers reconcile. The old test relied on per-publish reference detection, which the new design intentionally doesn't preserve -- the schema is now stable within a cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Add bootstrap + reconcile coverage for PeerTagSchema Addresses round-3 review nice-to-haves on PR #11381. - PeerTagSchemaTest: unit coverage for hasSameTagsAs() (the predicate that drives the reconcile fast/slow path split), the of(Set, long) factory, and the INTERNAL singleton. The hasSameTagsAs cases include same-content-different-Set-reference (the case the reconcile fast path relies on after a discovery refresh) and content-mismatch in either direction. - ConflatingMetricsAggregatorBootstrapTest: integration coverage for the producer-side bootstrap + aggregator-thread reconcile flow. * bootstrapHappensOnceOnFirstPublish -- three publishes against an un-started aggregator (no consumer thread, no reconciles); verifies features.peerTags() and features.getLastTimeDiscovered() are each called exactly once. * reconcileSkipsDeepCompareWhenTimestampMatches -- two cycles with constant features.getLastTimeDiscovered(); each post-report reconcile short-circuits on the timestamp fast path, so peerTags() is called only by bootstrap (1 total). * reconcileSurvivesTimestampBumpWhenTagsUnchanged -- timestamps bump every reconcile, forcing the slow set-compare path; the tag set stays identical, so the schema is preserved and continues to flush buckets correctly across cycles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Use writer.finishBucket() count in bootstrap test for cascade compatibility The verify(writer).add(MetricKey, AggregateMetric) signature is unique to #11381; downstream branches use AggregateEntry. Switching to verify(writer, times(2)).finishBucket() keeps the same behavioral guarantee (both cycles flushed) across the stack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Merge branch 'master' into dougqh/conflating-metrics-background-work Preserve TRACER_METRICS_MAX_PENDING semantic + drop stale imports TRACER_METRICS_MAX_PENDING previously counted conflating Batch slots (~64 spans each). The inbox now holds 1 SpanSnapshot per slot, so multiply the configured value by LEGACY_BATCH_SIZE (64) to keep pre-existing customer overrides delivering the same effective span-throughput capacity. Default stays at 2048 logical -> 131072 snapshot slots, identical to the prior 2048 batches * 64 spans. Also drops two unused datadog.trace.core.SpanKindFilter imports left behind in TraceGenerator.groovy after the isKind() override was removed in favor of the CoreSpan default implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Add AdversarialMetricsBenchmark for capacity-bound stress testing Ports the adversarial JMH benchmark from #11402 down to this branch so we can compare #11381 vs master on a high-cardinality, high-throughput workload. Adapted to use ConflatingMetricsAggregator (pre-rename) and the FixedAgentFeaturesDiscovery / NullSink helpers already in ConflatingMetricsAggregatorBenchmark. 8 producer threads hammer publish() with unique (service, operation, resource, peer.hostname) per op so the aggregate cache fills+evicts continuously and the inbox saturates. tearDown prints the drop counters (inboxFull vs aggregateDropped) so the test verifies the subsystem stayed bounded under attack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Trim AdversarialMetricsBenchmark counters and clarify printout Drop traceComputedCalls / totalSpansCounted: under 8-way contention the volatile-long ++/+= pattern was losing ~20% of updates (296M counted vs 245M reported), and the numbers duplicate signal JMH's ops/s already provides. Switch inboxFull / aggregateDropped to LongAdder so the printed drop shape (the order-of-magnitude story the bench is built to tell) is accurate under contention. Replace the stale "both forks combined for this run" string with text that matches the actual @fork(value=1) config and notes that counters accumulate across warmup + measurement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Close PeerTagSchema reconcile race + cover the swap branch buildPeerTagSchema previously read features.peerTags() before features.getLastTimeDiscovered(). DDAgentFeaturesDiscovery exposes those as two separate accessors against its volatile State -- a state-swap interleaving could leave the cached schema tagged with a NEWER timestamp than its names, after which the next reconcile short-circuits on the timestamp compare and misses the tag-set update until the next discovery refresh (~minute later). Swap the read order so timestamp is captured first. With this ordering, an interleaving leaves the schema OLDER than its names instead -- the next reconcile sees a timestamp mismatch, runs the deep compare, and self-heals on the very next cycle. Also adds reconcileSwapsSchemaWhenTagSetChanges, which closes the test gap on the slow-path swap branch (cachedPeerTagSchema = PeerTagSchema.of(...)). End-to-end check via the writer's captured MetricKeys: pre-swap snapshot carries only peer.hostname, post-swap snapshot carries both peer.hostname and peer.service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Clarify materializePeerTags hit-counting loop Splits the `if (values[i] != null && hitCount++ == 0)` conjunction into nested ifs. Same semantics, no codegen impact after JIT -- just visibly says what the loop is doing rather than relying on post-increment-inside-conjunction. Closes amarziali's review thread on this block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Drop unused Tags imports flagged by codenarc Leftover from removing the isKind() override in TraceGenerator earlier in this session -- I dropped the SpanKindFilter import but missed datadog.trace.bootstrap.instrumentation.api.Tags, which is no longer referenced in either file. Resolves codenarcTest and codenarcTraceAgentTest UnusedImport violations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Update dd-trace-core/src/main/java/datadog/trace/common/metrics/PeerTagSchema.java Co-authored-by: Sarah Chen <sarah.chen@datadoghq.com> Address sarahchen6's review pass PeerTagSchema.java: drop the duplicate Javadoc line that the GitHub UI suggestion accept inadvertently added (it added rather than replaced), collapsing back to the single intended line per sarahchen6's suggestion. Original line said "no cardinality limiters or per-cycle state" which was misleading since lastTimeDiscovered IS per-cycle state; suggestion rightly drops that clause. Config.java: wrap the TRACER_METRICS_MAX_PENDING * LEGACY_BATCH_SIZE multiplication in Math.multiplyExact to fail fast on absurd customer overrides (>= ~33M) rather than silently wrap to a negative int and explode the MPSC queue allocation with a confusing downstream error. Per sarahchen6's suggestion citing the codex bot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Clamp TRACER_METRICS_MAX_PENDING instead of throwing on overflow The previous Math.multiplyExact approach would fail the agent startup with ArithmeticException on absurd customer overrides (>= ~33M for the configured value). Clamping is gentler -- the agent starts successfully and just runs with a capped inbox. Long-promote the multiplication to a long so the product can't wrap, then clamp to MAX_SAFE_ARRAY_SIZE (Integer.MAX_VALUE - 8, the JDK's own SOFT_MAX_ARRAY_LENGTH convention for array allocations). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Merge branch 'master' into dougqh/conflating-metrics-background-work Suppress forbiddenApis for tearDown's System.err diagnostics AdversarialMetricsBenchmark.tearDown prints drop counters via System.err so a benchmark run shows how saturated each capacity bound was (inbox-full drops, aggregate-cache drops). forbiddenApisJmh disallows System.err by default to prevent excess logging in production code -- not a concern for a JMH benchmark, where stderr is the conventional channel for diagnostic output and matches the existing pattern in ExtractorBenchmark / InjectorBenchmark. Annotates tearDown with @SuppressForbidden (method-scoped, not class- scoped) so the suppression is narrowly targeted to the three println calls and any future hot-path code that lands in the benchmark stays gated by the check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Merge branch 'master' into dougqh/conflating-metrics-background-work Use DDAgentFeaturesDiscovery.state() hash for PeerTagSchema reconcile Addresses amarziali's review on getLastTimeDiscovered(): the existing state() accessor returns a SHA-256 of the discovery response, which is a more precise change key than the timestamp. Timestamp advances on every successful refresh regardless of content; the hash only advances when something actually changed -- so reconcile fast-path now fires only on real change, not every cycle. - PeerTagSchema: long lastTimeDiscovered -> String state. Factory signature of(Set, long) -> of(Set, String). INTERNAL carries null (it is never reconciled). - ConflatingMetricsAggregator: read features.state() first then peerTags() (same defensive ordering rationale -- if a discovery refresh interleaves, leave the schema with stale state rather than stale tags so the next reconcile re-runs the deep compare). Objects.equals for null-tolerant comparison (state can be null before discovery has produced a response). - DDAgentFeaturesDiscovery: drop the public getLastTimeDiscovered() accessor added on this branch -- the field stays private for the existing throttling logic in discoverIfOutdated(). - Tests updated to mock state() instead of getLastTimeDiscovered(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Convert TRACER_METRICS_MAX_PENDING rationale to /* */ block comment Addresses amarziali's readability nit (#3289149416) -- multi-line prose reads better as a single block comment than as a stack of // lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Merge remote-tracking branch 'origin/dougqh/conflating-metrics-background-work' into dougqh/conflating-metrics-background-work Add cardinality-isolation companions to AdversarialMetricsBenchmark Two new JMH benches that hold every dimension constant except one, to attribute throughput deltas to a specific axis: - HighCardinalityResourceMetricsBenchmark: ~1M distinct resource values; service/operation/peer.hostname pinned. Exercises the aggregate-cache LRU on the resource axis specifically. - HighCardinalityPeerMetricsBenchmark: ~32K distinct peer.hostname values; service/operation/resource pinned. Isolates the peer-tag encoding hot path (PEER_TAGS_CACHE lookups, UTF8 encoding, parallel-array capture in SpanSnapshot). Same shape as AdversarialMetricsBenchmark (8 threads, 2x15s warmup + 5x15s measurement, 1 fork) and reuse its CountingHealthMetrics so the inbox-full vs aggregate-dropped counters print on teardown for an apples-to-apples comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Merge branch 'master' into dougqh/conflating-metrics-background-work Co-authored-by: devflow.devflow-routing-intake <devflow.devflow-routing-intake@kubernetes.us1.ddbuild.io>

…' into dougqh/metrics-arbitrary-tags

dd-octo-sts · 2026-05-26T18:35:25Z

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite	Status
Startup	🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Startup Time

Scenario	This PR	master	Change
insecure-bank / iast	14,021 ms	13,923 ms	+0.7%
insecure-bank / tracing	12,872 ms	13,029 ms	-1.2%
petclinic / appsec	16,695 ms	16,547 ms	+0.9%
petclinic / iast	16,572 ms	16,723 ms	-0.9%
petclinic / profiling	16,404 ms	16,517 ms	-0.7%
petclinic / tracing	15,877 ms	15,938 ms	-0.4%

Commit: 41d7967f · CI Pipeline · Benchmarking Platform UI

Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

…' into dougqh/metrics-arbitrary-tags

…' into dougqh/metrics-arbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

…' into dougqh/metrics-arbitrary-tags

…rbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

…rbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java # dd-trace-core/src/main/java/datadog/trace/common/metrics/SerializingMetricWriter.java

…rbitrary-tags # Conflicts: # dd-trace-api/src/main/java/datadog/trace/api/config/GeneralConfig.java # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

…rbitrary-tags

…rbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

dougqh added type: enhancement Enhancements and improvements comp: metrics Metrics tag: ai generated Largely based on code generated by an AI or LLM labels May 18, 2026

dougqh force-pushed the dougqh/metrics-memory-efficiency branch from 46c04bd to 823a5d4 Compare May 18, 2026 19:28

dougqh force-pushed the dougqh/metrics-arbitrary-tags branch from c552e73 to 42947dd Compare May 18, 2026 19:30

dougqh and others added 9 commits May 20, 2026 00:16

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

ce6e24c

…rbitrary-tags

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

597da72

…rbitrary-tags

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

6cea7a7

…rbitrary-tags

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

e7b6771

…rbitrary-tags

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

fab4fe2

…rbitrary-tags

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

460d5b7

…rbitrary-tags

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

38aa16a

…rbitrary-tags

dougqh and others added 4 commits May 21, 2026 16:51

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

2f8cdd8

…rbitrary-tags

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

ba365ea

…' into dougqh/metrics-arbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

da91078

…' into dougqh/metrics-arbitrary-tags

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

333fdd1

…' into dougqh/metrics-arbitrary-tags

dougqh added 5 commits May 26, 2026 16:04

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

ea420a2

…' into dougqh/metrics-arbitrary-tags

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

e262b78

…' into dougqh/metrics-arbitrary-tags

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

e01f9f0

…' into dougqh/metrics-arbitrary-tags

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

bcedd35

…' into dougqh/metrics-arbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

23f9bb4

…' into dougqh/metrics-arbitrary-tags

dougqh added 6 commits May 26, 2026 17:16

Merge remote-tracking branch 'origin/dougqh/metrics-memory-efficiency…

82e650c

…' into dougqh/metrics-arbitrary-tags

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

1a7817a

…rbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

41d7967

…rbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java # dd-trace-core/src/main/java/datadog/trace/common/metrics/SerializingMetricWriter.java

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

d17e282

…rbitrary-tags # Conflicts: # dd-trace-api/src/main/java/datadog/trace/api/config/GeneralConfig.java # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

9cc3930

…rbitrary-tags

Merge branch 'dougqh/metrics-memory-efficiency' into dougqh/metrics-a…

3b63dad

…rbitrary-tags # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add span-derived primary tags (CSS v1.3.0)#11402

Add span-derived primary tags (CSS v1.3.0)#11402
dougqh wants to merge 27 commits into
dougqh/metrics-memory-efficiencyfrom
dougqh/metrics-arbitrary-tags

dougqh commented May 18, 2026 •

edited

Loading

Uh oh!

datadog-official Bot commented May 20, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

Uh oh!

dd-octo-sts Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dougqh commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

What's new vs. PoC

New files

Health metric

Benchmarks

Cardinality-isolation companions (8 producer threads, 2×15s warmup + 5×15s)

Test plan

Notes for reviewers

Uh oh!

datadog-official Bot commented May 20, 2026 • edited by datadog-datadog-prod-us1-2 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

dd-octo-sts Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟢 Java Benchmark SLOs — All performance SLOs passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dougqh commented May 18, 2026 •

edited

Loading

datadog-official Bot commented May 20, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

dd-octo-sts Bot commented May 26, 2026 •

edited

Loading