Skip to content

Update client-side stats to use light weight Hashtable#11382

Open
dougqh wants to merge 134 commits into
masterfrom
dougqh/optimize-metric-key
Open

Update client-side stats to use light weight Hashtable#11382
dougqh wants to merge 134 commits into
masterfrom
dougqh/optimize-metric-key

Conversation

@dougqh
Copy link
Copy Markdown
Contributor

@dougqh dougqh commented May 15, 2026

What Does This Do

Replaces the MetricKey based HashMap with a new AggregateTable based on the light Hashtable

Motivation

By using the light Hashtable, I'm able to avoid the biggest source of allocation in client-side stats: MetricKey

Hashtable provides utilities for searching the entries without constructing a new composite key object

First, the components are hashed together to find the corresponding bucket
Then the bucket can be traversed to see if the entries match the key components

And the custom Entry can hold multiple fields that comprise the data / metadata needed for metric aggregation and eviction policy

The end result is that both MetricKey and AggregateMetric can be merged into a single class AggregateEntry that is only constructed when there's no existing matching entry

Additional Notes

Stacked on top of master. #11381 (producer/consumer split + SpanSnapshot inbox) and #11409 (Hashtable utility) have both landed in master, so the diff shown here is only the AggregateTable / AggregateEntry / ClearSignal work plus follow-up cleanups.

Restructures the consumer-side aggregate store. Three logical commits, intended to be reviewed in order:

1. Add AggregateTable + AggregateEntry backed by Hashtable

Introduces a multi-key hash table that lets the consumer thread look up the {labels → counters} entry directly from a SpanSnapshot's raw fields — no MetricKey allocation per snapshot, no per-snapshot UTF8 cache lookups, no CHM operations. Hot-path lookup is keyHash computeHashtable.Support.bucket → bucket walk → matches(keyHash, snapshot) → returned entry has the counters to mutate in place.

This commit is standalone — no call sites yet, only the new classes + unit tests for hit/miss/cap-overrun/expunge/clear behavior.

2. Swap Aggregator to use AggregateTable + route disable() clear through a ClearSignal

Replaces LRUCache<MetricKey, AggregateMetric> with AggregateTable in Aggregator. Drops the AggregateExpiry listener — drop reporting (onStatsAggregateDropped) moves to the cap-overrun path inside Drainer.accept.

Threading fix bundled here: ConflatingMetricsAggregator.disable() used to call aggregator.clearAggregates() and inbox.clear() directly from the Sink's IO callback thread, racing with the aggregator thread. That race was tolerable for LinkedHashMap (worst case = corrupted internal state right before everything got cleared anyway); it's not tolerable for Hashtable (chain corruption can NPE or loop). disable() now offers a ClearSignal to the inbox so the aggregator thread itself performs the clear — preserves the single-writer invariant for AggregateTable end-to-end. The offer is best-effort; the system self-heals on a subsequent downgrade cycle if the inbox happens to be full (commented at the call site).

Cap-overrun semantic change: the old LRUCache evicted least-recently-used in O(1). AggregateTable instead scans for a hitCount==0 entry to recycle (O(N) worst-case), and drops the new key if none exists. Practical impact: in steady state, an unrelated burst of new keys gets dropped (and reported via onStatsAggregateDropped) rather than evicting established keys. The cost trade-off is commented at the eviction site — eviction is expected rare because the cap is sized to the working set; cursor-caching is the future option if a workload runs persistently at cap. The existing test that asserted "service0 evicted in favor of service10" is updated to assert the new semantics; the other cap-related test ("evicted entry was already flushed") still passes unchanged.

3. Fold MetricKey + AggregateMetric into AggregateEntry

MetricKey existed for two reasons — being the LRUCache key (replaced by AggregateTable's Hashtable mechanics) and being the labels arg to MetricWriter.add (the only thing left). AggregateMetric was the counter/histogram counterpart. Folds both onto a single AggregateEntry (10 UTF8 label fields + 3 primitives + counters + histograms), changes MetricWriter.add(MetricKey, AggregateMetric)add(AggregateEntry), and deletes MetricKey.java + MetricKeys.java + AggregateMetric.java.

The 12 UTF8 caches that used to be split between MetricKey (9) and ConflatingMetricsAggregator (3, with overlap) are consolidated on AggregateEntry. One cache per field type now.

Latent bug fix: the prior matches(SpanSnapshot) used Objects.equals on raw fields. If the same logical key was delivered once as String and once as UTF8BytesString (different CharSequence impls of identical content), Objects.equals returned false and the table would split into two entries for the same key. The new matches uses content-equality (UTF8BytesString.toString() returns the underlying String in O(1)), collapsing them correctly.

Test impact: AggregateEntry.of(...) mirrors the prior new MetricKey(...) positional args, so test diffs are mostly mechanical. About 56 test sites migrated across ConflatingMetricAggregatorTest, SerializingMetricWriterTest, and MetricsIntegrationTest.

Review polish

Follow-up commits address review feedback:

  • Use Hashtable.Support.create(maxAggregates, Support.MAX_RATIO) + Support.bucket + Support.insertHeadEntry(buckets, keyHash, entry) + Support.mutatingTableIterator to delegate to the helpers added on Add Hashtable and LongHashingUtils utilities #11409 — drops ~50 lines of bespoke bucket-array code.
  • Inline the report-time forEach lambda instead of a static BiConsumer constant (the JIT reuses non-capturing lambdas).
  • AggregateEntry.matches(long keyHash, SpanSnapshot) overload that pre-checks the hash, so chain walks read as one call.
  • @Nullable (javax.annotation) annotations on the four nullable label fields + their getters + of(...) parameters.
  • Objects.equals import in AggregateEntry.equals() (no more fully-qualified refs).
  • Design-trade-off comments on evictOneStale (O(N) scan rationale) and disable() (best-effort offer rationale).

Additional cleanups

A second round of review surfacing landed these:

  • MetricsIntegrationTest .aggregate runtime bug — the legacy entry.aggregate.recordDurations(...) form compiled under Groovy's dynamic dispatch but would have thrown MissingPropertyException at runtime. Fixed to call recordDurations directly on the entry. (Bot-flagged.)
  • Spock >> { closure } no-op assertions in ConflatingMetricAggregatorTest — the >> operator stubs a return value, so the closures verifying e.getHitCount() == X && ... were being evaluated and discarded. Wrapped 31 sites in assert so Groovy power-assert surfaces mismatches. All 41 tests still pass, so the previously-unverified assertions happened to hold. (Bot-flagged.)
  • Drop dead recordDurations(int, AtomicLongArray) batch API — vestige of master's Batch design. Production now only calls recordOneDuration(long). Migrated the three remaining test callers (AggregateEntryTest, SerializingMetricWriterTest, MetricsIntegrationTest) to loops of recordOneDuration calls, then deleted the batched method and its AtomicLongArray imports.
  • AggregateEntry.of() colon-split Javadoc warning — the test factory recovers (name, value) pairs from "name:value" strings by splitting at the first :, which is brittle if a peer-tag value contains a colon (URLs, IPv6, service:env). Added an explicit warning so callers know to keep test data colon-free in values.
  • ConflatingMetricsAggregatorDisableTest — new JUnit 5 coverage for the disable() → ClearSignal threading routing. The test fires DOWNGRADED from the test thread, waits for the no-flush window, then publishes a marker span with a distinct resource name and asserts the next flush captures only the marker — proving CLEAR actually wiped the original entry from the table. Catches both the missing-clear regression and the bucket-chain-corruption regression that the original threading race could produce.

Benchmarks

Producer publish() latency (single-threaded, 2 forks × 5 iter × 15s)

Prior commit (stacked base) This PR
SimpleSpan bench 3.116 µs/op 3.123 µs/op
DDSpan bench 2.506 µs/op 2.412 µs/op

All within noise on the producer side — this PR is a consumer-side refactor, so producer publish() shouldn't move much. The structural wins (one less class, no per-snapshot MetricKey allocation on the consumer, no double-cache lookups, smaller per-entry footprint) only become visible when the consumer is hammered hard enough that snapshot processing rate matters. That's exactly what the adversarial bench measures.

AdversarialMetricsBenchmark (8 producer threads, 2×15s warmup + 5×15s, 1 fork)

Same benchmark used on #11381 (high-cardinality (service, operation, resource, peer.hostname) per op, random durations across 1ns–1s, random error/topLevel flags — designed to saturate every capacity bound at once).

master #11381 (parent) this PR
Throughput avg (ops/s) 395,806 ± 2,619,133 4,889,660 ± 390,175 27,915,800 ± 1,219,470
Per-iteration progression (ops/s) warmup 2,536,145 → 205,314 → 95,888 → 47,301 → 24,378 4,886,778 → 4,875,195 → 4,731,827 → 4,959,992 → 4,994,511 28,043,909 → 28,112,828 → 27,354,721 → 28,074,395 → 27,993,147
Stdev (ops/s) 680,180 101,327 316,692
onStatsInboxFull (drops at handoff) n/a (no inbox on master) 199,862,634 2,893,855,052
onStatsAggregateDropped 11,642,039 84,002,323 16,301,696

~70× faster than master, ~5.7× faster than #11381 alone. The producer fast-path drop check (inbox.size() >= inbox.capacity()) returns immediately when the inbox is saturated, so when the consumer can drain ~5× faster (this PR's consumer-side win), the producer spins through publish() at the fast-path rate and inbox-full drops climb correspondingly (200M on #11381 → 2.9 B here). That's the design working as intended: under attack, load shedding happens at the cheapest place in the pipeline.

The aggregate-cache drop count actually fell (84M → 16M) because the faster consumer keeps the table from staying at cap as often — fewer snapshots arrive at a cap-overrun state with no stale entry to recycle.

Per-iteration shape stays flat (27.4M–28.1M ops/s, no warmup → measurement degradation), confirming the design holds steady-state under sustained load.

Cardinality-isolation companions (8 producer threads, 2×15s warmup + 5×15s, 1 fork)

HighCardinalityResourceMetricsBenchmark and HighCardinalityPeerMetricsBenchmark (added in #11381) pin every dimension except one to attribute throughput deltas to a specific axis. Re-measured 2026-05-26 after the master sync (which now includes #11381 and #11444's UTF8BytesString hashCode caching).

HighCardinalityResourceMetricsBenchmark — only resource varies (~1M distinct), service/operation/peer.hostname pinned:

master this PR
Throughput avg (ops/s) 5,958,808 ± 382,650 39,589,423 ± 2,522,726
Per-iter ops/s (5 measurement) 5,994K → 6,088K → 5,819K → 5,916K → 5,977K 38,789K → 39,009K → 39,882K → 40,311K → 39,956K
onStatsInboxFull 300,279,917 4,159,380,122
onStatsAggregateDropped 338,983,453 17,692,824

HighCardinalityPeerMetricsBenchmark — only peer.hostname varies (~32K distinct), service/operation/resource pinned:

master this PR
Throughput avg (ops/s) 9,223,002 ± 5,530,752 37,856,491 ± 10,070,619
Per-iter ops/s (5 measurement) 10,924K → 10,660K → 8,190K → 8,107K → 8,234K 39,775K → 36,223K → 39,538K → 39,724K → 34,023K
onStatsInboxFull 781,593,320 3,988,966,412
onStatsAggregateDropped 185,595,358 16,431,892

+564 % over master on the resource axis; +311 % on the peer axis. This PR runs at ~38–40 M ops/s on either bench because the consumer is no longer the bottleneck (one MetricKey allocation per snapshot eliminated; AggregateTable.findOrInsert walks buckets by keyHash instead). Note that master here is the post-#11381 baseline (the producer/consumer split landed on master separately), so the relative comparison is "AggregateTable consumer redesign" not "the whole CSS rework" — the latter is a much larger multiplier when measured against pre-#11381 master.

The drop-counter shape confirms the design intent: master's onStatsAggregateDropped (339 M / 186 M) dominates -- the LRU cache thrashes under high cardinality. This PR's drops shift overwhelmingly to inbox-full (235× / 243× ratio against aggregate-dropped), confirming the AggregateTable absorbs cardinality pressure on the consumer side and backpressure surfaces only at the cheap producer fast-path.

Net code delta: +1280 / −903 = +377 lines across 16 files. The growth is dominated by new test coverage (AggregateTableTest, AggregateEntryTest) plus the consolidated UTF8 caches landing on AggregateEntry; the production-code core (less MetricKey + MetricKeys + AggregateMetric minus AggregateEntry's additions) is roughly flat.

Known memory items addressed downstream

Two memory concerns visible at this layer of the stack are addressed in subsequent PRs — flagging here so reviewers don't worry about them in isolation:

  • Two Histogram instances per entry are eagerly allocated (worst case ~4 MB heap floor at default maxAggregates=2048 × 2 × ~1 KB DDSketch). Most entries never see an error and so the errorLatencies histogram is unused. Fixed in Memory-efficiency pass on ClientStatsAggregator + adversarial benchmark #11389, which makes errorLatencies lazy: it stays null until the first error is recorded, and getErrorLatencies() returns a shared empty histogram in the no-error case.

  • PEER_TAGS_CACHE worst case is 64 outer × 512 inner = 32 K cached UTF8BytesString peer-tag pairs, heap-pinned for the JVM lifetime. Past the inner cap the per-name LRU starts thrashing under unbounded cardinality, and the cache size itself is the only memory backstop. Fixed in Per-component / tag cardinality limits in client-side stats #11387, which adds per-tag TagCardinalityHandler budgets that fold overflow values into a "<tag>:blocked_by_tracer" sentinel; cardinality is bounded per reporting interval rather than per JVM lifetime, and the worst-case cache occupancy collapses to the configured budget.

Test plan

  • ./gradlew :dd-trace-core:test --tests 'datadog.trace.common.metrics.*' passes (incl. the new AggregateTableTest and AggregateEntryTest)
  • ./gradlew :dd-trace-core:compileJava :dd-trace-core:compileTestGroovy :dd-trace-core:compileJmhJava :dd-trace-core:compileTraceAgentTestGroovy all green
  • ./gradlew spotlessCheck clean
  • CI muzzle / integration suites
  • Validate stats.dropped_aggregates semantics at high cardinality (especially the new "drop new on cap overrun" path vs. the old "evict LRU" path)

🤖 Generated with Claude Code

dougqh and others added 7 commits May 15, 2026 12:06
ConflatingMetricsAggregator.publish does a handful of redundant operations on
every span. None individually is large; together they show as ~2.5% on the
existing JMH benchmark once the benchmark actually exercises span.kind.

- dedup span.isTopLevel(): publish() reads it into a local, then shouldComputeMetric
  read it again. Pass the cached value in.
- resolve spanKind to String once: master called toString() twice per span (once
  inside spanKindEligible, once at the getPeerTags call site) and used HashSet
  contains on a CharSequence (which routes through equals on String). Normalize
  to String up front and reuse.
- lazy-allocate the peer-tag list: getPeerTags() always allocated an ArrayList
  sized to features.peerTags() even when the span had none of those tags set.
  Defer allocation until the first match; return Collections.emptyList() when
  none hit. MetricKey already treats null/empty peerTags as emptyList, so no
  behavior change.

Drop the spanKindEligible helper — the HashSet.contains call inlines fine in
shouldComputeMetric.

Update the JMH benchmark to set span.kind=client on every span. Without it the
filter path short-circuits before the peer-tag and toString work, so the wins
above aren't measurable. With it:

  baseline   6.755 us/op (CI [6.560, 6.950], stdev 0.129)
  optimized  6.585 us/op (CI [6.536, 6.634], stdev 0.033)

2 forks x 5 iterations x 15s. ~2.5% mean improvement and much tighter variance
fork-to-fork.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce SpanKindFilter -- a tiny builder-built immutable filter whose state
is an int bitmask indexed by the span.kind ordinals already cached on
DDSpanContext. Each include* on the builder sets one bit (1 << ordinal); the
runtime check is a single AND against (1 << span's ordinal).

CoreSpan.isKind(SpanKindFilter) is the new entry point. DDSpan overrides it
to do the bit-test directly against the cached ordinal -- no virtual call,
no tag-map lookup. The two existing test-only CoreSpan impls (SimpleSpan
and TraceGenerator.PojoSpan, the latter in two source sets) implement isKind
by reading the span.kind tag and delegating to SpanKindFilter.matches(String),
which converts via DDSpanContext.spanKindOrdinalOf and does the same AND.

Refactor: DDSpanContext.setSpanKindOrdinal(String) now delegates to a new
package-private static spanKindOrdinalOf(String) so the same string-to-ordinal
mapping serves both the tag interceptor path and SpanKindFilter.matches.

This is groundwork -- nothing in the codebase calls isKind yet. The next
commit will replace the HashSet-based eligibility checks in
ConflatingMetricsAggregator with SpanKindFilter instances.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the two ELIGIBLE_SPAN_KINDS_FOR_* HashSet<String> constants and the
SPAN_KIND_INTERNAL.equals check with three SpanKindFilter instances:
METRICS_ELIGIBLE_KINDS, PEER_AGGREGATION_KINDS, INTERNAL_KIND. Eligibility
checks now go through span.isKind(filter), which on DDSpan is a volatile
byte read against the already-cached span.kind ordinal plus a single bit-test.

Also defer the span.kind tag read: previously read at the top of the publish
loop and threaded through both shouldComputeMetric and the inner publish.
isKind no longer needs the string, so the read can move down into the inner
publish where it's still needed for the SPAN_KINDS cache key / MetricKey.

Supporting changes:

- DDSpanContext.spanKindOrdinalOf(String) is now public so non-DDSpan CoreSpan
  impls can compute the ordinal at tag-write time.
- SpanKindFilter gains a public matches(byte) fast-path overload that callers
  with a pre-computed ordinal use directly.
- SimpleSpan caches the ordinal in setTag(SPAN_KIND, ...), mirroring what
  TagInterceptor does for DDSpanContext, and its isKind now hits the byte
  fast path. Without this, the JMH benchmark (which uses SimpleSpan) would
  re-derive the ordinal on every isKind call and overstate the cost.

Benchmark on the bench updated last commit (kind=client on every span,
4 forks x 5 iter x 15s):

  prior commit  6.585 ± 0.049 us/op
  this commit   6.903 ± 0.096 us/op

The slight regression is a SimpleSpan-via-groovy-dispatch artifact -- the
interface call to isKind through CoreSpan, then through SimpleSpan, then
through SpanKindFilter.matches, doesn't fold as aggressively as a HashSet
contains on a static field. In production DDSpan.isKind inlines to a context
field read + ordinal byte read + bit-test, so the production path is faster
than the prior HashSet approach. A DDSpan-based benchmark would show this;
the existing SimpleSpan-based one doesn't.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing ConflatingMetricsAggregatorBenchmark uses SimpleSpan, a groovy
mock. That's enough for measuring queue/CHM/MetricKey work, but it conceals
the production cost of CoreSpan.isKind: SimpleSpan's isKind goes through
groovy interface dispatch into SpanKindFilter.matches, while DDSpan.isKind
inlines to a context byte-read + bit-test.

This new benchmark uses real DDSpan instances created through a CoreTracer
(with a NoopWriter so finishing doesn't reach the agent). Same shape as the
SimpleSpan bench (64-span trace, span.kind=client, peer.hostname set).

Numbers (2 forks x 5 iter x 15s):

  master:        6.428 +- 0.189 us/op  (HashSet eligibility checks)
  this branch:   6.343 +- 0.115 us/op  (SpanKindFilter bitmask)

About 1.3% faster on the production path. The SimpleSpan benchmark in the
same conditions shows a ~2.2% slowdown -- the mock's dispatch shape gives a
misleading signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make SpanKindFilter.kindMask and its constructor private now that DDSpan.isKind
no longer needs direct field access -- it delegates to SpanKindFilter.matches(byte).

The Builder.build() in the same outer class still constructs instances via the
private constructor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the producer-side conflation pipeline with a thin per-span SpanSnapshot
posted to the existing aggregator thread. The aggregator now builds the
MetricKey, does the SERVICE_NAMES / SPAN_KINDS / PEER_TAGS_CACHE lookups, and
updates the AggregateMetric directly -- all off the producer's hot path.

What the producer does now, per span:

  - filter (shouldComputeMetric, resource-ignored, longRunning)
  - collect tag values into a SpanSnapshot (1 allocation per span)
  - inbox.offer(snapshot) + return error flag for forceKeep

What moved off the producer:

  - MetricKey construction and its hash computation
  - SERVICE_NAMES.computeIfAbsent (UTF8 encoding of service name)
  - SPAN_KINDS.computeIfAbsent (UTF8 encoding of span.kind)
  - PEER_TAGS_CACHE lookups (peer-tag name+value UTF8 encoding)
  - pending/keys ConcurrentHashMap operations
  - Batch pooling, batch atomic ops, batch contributeTo

Removed entirely:

  - Batch.java -- the conflation primitive is no longer needed; the
    aggregator's existing LRUCache<MetricKey, AggregateMetric> IS the
    conflation point now.
  - pending ConcurrentHashMap<MetricKey, Batch>
  - keys ConcurrentHashMap<MetricKey, MetricKey> (canonical dedup)
  - batchPool MessagePassingQueue<Batch>
  - The CommonKeyCleaner role of tracking keys.keySet() on LRU eviction --
    AggregateExpiry now just reports drops to healthMetrics.

Added:

  - SpanSnapshot: immutable value carrying the raw MetricKey inputs + a
    tagAndDuration long (duration | ERROR_TAG | TOP_LEVEL_TAG).
  - AggregateMetric.recordOneDuration(long tagAndDuration) -- the single-hit
    equivalent of the existing recordDurations(int, AtomicLongArray).
  - Peer-tag values flow through the snapshot as a flattened String[] of
    [name0, value0, name1, value1, ...]; the aggregator encodes them through
    PEER_TAGS_CACHE on its own thread.

Benchmark results (2 forks x 5 iter x 15s):

  ConflatingMetricsAggregatorDDSpanBenchmark
    prior commit  6.343 +- 0.115 us/op
    this commit   2.506 +- 0.044 us/op  (~60% faster)

  ConflatingMetricsAggregatorBenchmark (SimpleSpan)
    prior commit  6.585 +- 0.049 us/op
    this commit   3.116 +- 0.032 us/op  (~53% faster)

Caveat on the benchmark: without conflation, the producer pushes 1 inbox
item per span instead of ~1 per 64. At the benchmark's synthetic rate the
consumer can't keep up and inbox.offer silently drops. The numbers measure
producer publish() latency only; consumer throughput at realistic span rates
is a follow-up to validate. Tuning maxPending matters more in this design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With the per-span SpanSnapshot inbox path, the producer can lose snapshots
when the bounded MPSC queue is full -- silently, since inbox.offer() returns
a boolean we previously ignored. The conflating-Batch design used to absorb
~64x more producer pressure per inbox slot, so this is a new failure mode
worth surfacing.

Wire it through the existing HealthMetrics path:

- HealthMetrics.onStatsInboxFull() (no-op default).
- TracerHealthMetrics gets a statsInboxFull LongAdder and a new reason tag
  reason:inbox_full reported under the same stats.dropped_aggregates metric
  used for LRU evictions. Two LongAdders, two tagged time series.
- ConflatingMetricsAggregator.publish increments the counter when
  inbox.offer(snapshot) returns false.

This doesn't fix the drop -- tuning maxPending and/or building producer-side
batching are the actual fixes. But it makes the failure visible in the same
place ops already watches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dougqh dougqh added type: enhancement Enhancements and improvements comp: core Tracer core tag: performance Performance related changes tag: no release notes Changes to exclude from release notes comp: metrics Metrics tag: ai generated Largely based on code generated by an AI or LLM labels May 15, 2026
dougqh and others added 3 commits May 18, 2026 11:19
Two general-purpose utilities used by the client-side stats aggregator
work (PR #11382 and follow-ups), extracted into their own change so the
metrics-specific PRs can build on a smaller, reviewable foundation.

  - Hashtable: a generic open-addressed-ish bucket table abstraction
    keyed by a 64-bit hash, with a public abstract Entry type so client
    code can subclass it for higher-arity keys. The metrics aggregator
    uses it to back its AggregateTable.

  - LongHashingUtils: chained 64-bit hash combiners with primitive
    overloads (boolean, short, int, long, Object). Used in place of
    varargs combiners to avoid Object[] allocation and boxing on the
    hot path.

No callers within internal-api itself yet -- the metrics aggregator PR
will introduce the first usages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dougqh and others added 3 commits May 18, 2026 15:18
Standalone classes for swapping the consumer-side LRUCache<MetricKey,
AggregateMetric> with a multi-key Hashtable in the next commit. No call sites
use them yet.

- AggregateEntry extends Hashtable.Entry, holds the canonical MetricKey, the
  mutable AggregateMetric, and copies of the 13 raw SpanSnapshot fields for
  matches(). The 64-bit lookup hash is computed via chained
  LongHashingUtils.addToHash calls (no varargs, no boxing of short/boolean).
- AggregateTable wraps a Hashtable.Entry[] from Hashtable.Support.create.
  findOrInsert(SpanSnapshot) walks the bucket comparing raw fields, falling
  back to MetricKeys.fromSnapshot on a true miss. On cap overrun, it scans
  for an entry with hitCount==0 and unlinks it; if none, it returns null and
  the caller drops the data point.
- MetricKeys.fromSnapshot extracts the canonicalization logic (DDCache
  lookups + UTF8 encoding) from Aggregator.buildMetricKey, so the helper can
  be called from AggregateTable on miss.

This also commits Hashtable and LongHashingUtils (added earlier, previously
uncommitted) and lifts Hashtable.Entry / Hashtable.Support visibility so
client code outside datadog.trace.util can build higher-arity tables -- the
case the javadoc describes but the original visibility didn't actually
support. Specifically: Entry is now public abstract with a protected ctor;
keyHash, next(), and setNext() are public; Support's create / clear /
bucketIndex / bucketIterator / mutatingBucketIterator methods are public.

Tests: AggregateTableTest covers hit, miss, distinct-by-spanKind, peer-tag
identity (including null vs non-null), cap overrun with stale victim, cap
overrun with no victim (returns null), expungeStaleAggregates, forEach,
clear, and that the canonical MetricKey is built at insert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace LRUCache<MetricKey, AggregateMetric> with the AggregateTable added
in the prior commit. The hot path in Drainer.accept becomes:

  AggregateMetric aggregate = aggregates.findOrInsert(snapshot);
  if (aggregate != null) {
      aggregate.recordOneDuration(snapshot.tagAndDuration);
      dirty = true;
  } else {
      healthMetrics.onStatsAggregateDropped();
  }

On the steady-state hit path the lookup is a 64-bit hash compute + bucket
walk + matches(snapshot) -- no MetricKey allocation, no SERVICE_NAMES /
SPAN_KINDS / PEER_TAGS_CACHE lookups. The canonical MetricKey is now built
once per unique key at insert time, in MetricKeys.fromSnapshot.

Behavioral change in the cap-overrun path
-----------------------------------------

The old LRUCache evicted least-recently-used: at cap, a new insert would
push out the oldest entry regardless of whether it was live or stale.
AggregateTable instead scans for a hitCount==0 entry to recycle, and drops
the new key if none exists. Practical impact: in the common case where
the table holds a stable set of recurring keys, an unrelated burst of new
keys is dropped (and reported via onStatsAggregateDropped) rather than
evicting the established keys. The existing test that asserted "service0
evicted in favor of service10" is updated to assert the new semantics.
The other cap-related test ("should not report dropped aggregate when
evicted entry was already flushed") still passes unchanged: after report()
clears all entries to hitCount=0, the next wave of inserts recycles them.

Threading fix
-------------

ConflatingMetricsAggregator.disable() used to call aggregator.clearAggregates()
and inbox.clear() directly from the Sink's IO event thread, racing with the
aggregator thread mid-write. The race was tolerable for LinkedHashMap; it
is not for AggregateTable (chain corruption can NPE or loop). disable()
now offers a ClearSignal to the inbox so the aggregator thread itself
performs the table clear and the inbox.clear(). Adds one SignalItem
subclass + one branch in Drainer.accept; preserves the single-writer
invariant for AggregateTable end-to-end.

Removed: LRUCache import, AggregateExpiry inner class, the static
buildMetricKey / materializePeerTags / encodePeerTag helpers (now in
MetricKeys).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MetricKey existed for two reasons -- the prior LRUCache key role (now handled
by AggregateTable's Hashtable.Entry mechanics) and as the labels argument
to MetricWriter.add. The first is gone; the second is the only thing keeping
MetricKey alive. Fold its UTF8-encoded label fields onto AggregateEntry,
change MetricWriter.add to take AggregateEntry directly, and delete
MetricKey + MetricKeys.

What AggregateEntry now holds
-----------------------------

- 10 UTF8BytesString label fields (resource, service, operationName,
  serviceSource, type, spanKind, httpMethod, httpEndpoint, grpcStatusCode,
  and a List<UTF8BytesString> peerTags for serialization).
- 3 primitives (httpStatusCode, synthetic, traceRoot).
- AggregateMetric (the value being accumulated).
- The raw String[] peerTagPairs is retained alongside the encoded peerTags
  -- matches() compares it positionally against the snapshot's pairs; the
  encoded form is only consumed by the writer.

matches(SpanSnapshot) compares the entry's UTF8 forms to the snapshot's raw
String / CharSequence fields via content-equality (UTF8BytesString.toString()
returns the underlying String in O(1)). This closes a latent bug in the
prior raw-vs-raw matches(): if one snapshot delivered a tag value as String
and a later snapshot delivered the same content as UTF8BytesString, the old
Objects.equals would return false and the table would split into two
entries. Content-equality matching collapses them into one.

Consolidated caches
-------------------

The static UTF8 caches that used to live partly on MetricKey (RESOURCE_CACHE,
OPERATION_CACHE, SERVICE_SOURCE_CACHE, TYPE_CACHE, KIND_CACHE,
HTTP_METHOD_CACHE, HTTP_ENDPOINT_CACHE, GRPC_STATUS_CODE_CACHE, SERVICE_CACHE)
and partly on ConflatingMetricsAggregator (SERVICE_NAMES, SPAN_KINDS,
PEER_TAGS_CACHE) are all now on AggregateEntry. The split was duplicating
work -- SERVICE_NAMES and SERVICE_CACHE both cached service-name to
UTF8BytesString. One cache per field now.

API change: MetricWriter.add
----------------------------

Was: add(MetricKey key, AggregateMetric aggregate)
Now: add(AggregateEntry entry)

The aggregate lives on the entry. Single-arg.

SerializingMetricWriter reads the same UTF8 fields off AggregateEntry that it
previously read off MetricKey; the wire format is byte-identical.

Test impact
-----------

AggregateEntry.of(...) takes the same 13 positional args new MetricKey(...)
took, so test diffs are mostly mechanical:
  new MetricKey(args) -> AggregateEntry.of(args)
  writer.add(key, _)  -> writer.add(entry)

ValidatingSink in SerializingMetricWriterTest now iterates List<AggregateEntry>
directly. ConflatingMetricAggregatorTest's Spock matchers (~36 sites) rely
on AggregateEntry.equals comparing the 13 label fields (not the aggregate)
so the mock matches by labels regardless of the aggregate state at call time;
post-invocation closures verify aggregate state.

Benchmarks (2 forks x 5 iter x 15s)
-----------------------------------

The change is consumer-thread only; producer publish() is unchanged.

  SimpleSpan bench:   3.123 +- 0.025 us/op   (prior: 3.119 +- 0.018)
  DDSpan bench:       2.412 +- 0.022 us/op   (prior: 2.463 +- 0.041)

Both within noise -- the win is structural (one less class, one less
allocation per miss, one fewer cache layer) rather than benchmarked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dougqh dougqh force-pushed the dougqh/optimize-metric-key branch from 050a998 to 3738c85 Compare May 18, 2026 19:20
@dougqh dougqh changed the base branch from dougqh/conflating-metrics-background-work to dougqh/util-hashtable May 18, 2026 19:21
dougqh and others added 7 commits May 18, 2026 15:40
LongHashingUtilsTest (14 cases):
  - hashCodeX null sentinel + non-null pass-through
  - all primitive hash() overloads match the boxed Java hashCodes
  - hash(Object...) 2/3/4/5-arg overloads match the chained addToHash
    formula they are documented to constant-fold to
  - addToHash(long, primitive) overloads match the Object-version
  - linear-accumulation invariant (31 * h + v) holds across a sequence
  - iterable / deprecated int[] / deprecated Object[] variants match
    chained addToHash
  - intHash treats null as 0 (observable via hash(null, "x"))

HashtableTest (24 cases across 5 nested classes):
  - D1: insert/get/remove/insertOrReplace/clear/forEach, in-place value
    mutation, null-key handling, hash-collision chaining with disambig-
    uating equals, remove-from-collided-chain leaves siblings intact
  - D2: pair-key identity, remove(pair), insertOrReplace matches on
    both parts, forEach
  - Support: capacity rounds up to a power of two, bucketIndex stays
    in range across a wide hash sample, clear nulls every slot
  - BucketIterator: walks only matching-hash entries in a chain, throws
    NoSuchElementException when exhausted
  - MutatingBucketIterator: remove from head-of-chain unlinks, replace
    swaps the entry while preserving chain, remove() without prior
    next() throws IllegalStateException

Tests live in internal-api/src/test/java/datadog/trace/util and use the
already-present JUnit 5 setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bring the new util/ files in line with google-java-format
(tabs → spaces, line wrapping, javadoc list markup) so
spotlessCheck passes in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compares Hashtable.D1 and Hashtable.D2 against equivalent HashMap
usage for add, update, and iterate operations. Each benchmark thread
owns its own map (Scope.Thread), but @threads(8) is used so the
allocation/GC pressure that Hashtable is designed to avoid surfaces
in the throughput numbers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Guard Support.sizeFor against overflow and use Integer.highestOneBit;
  reject capacities above 1 << 30 instead of looping forever.
- Add braces around single-statement while bodies in BucketIterator.
- Split HashtableBenchmark into HashtableD1Benchmark / HashtableD2Benchmark.
- Add regression tests for Support.sizeFor bounds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 5-arg Object overload was forwarding only obj0..obj3 to the int
overload, silently dropping obj4. Also align LongHashingUtils.hash 3-arg
signature with its 2/4/5-arg siblings (int parameters) and strengthen
the 5-arg HashingUtilsTest to detect the missing-arg regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Split D1Tests and D2Tests into HashtableD1Test and HashtableD2Test;
  extract shared test entry classes into HashtableTestEntries.
- Reduce visibility of LongHashingUtils.hash(int...) chaining overloads
  to package-private; they are internal building blocks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dougqh and others added 5 commits May 22, 2026 11:31
Addresses amarziali's review on getLastTimeDiscovered(): the existing
state() accessor returns a SHA-256 of the discovery response, which is
a more precise change key than the timestamp. Timestamp advances on
every successful refresh regardless of content; the hash only advances
when something actually changed -- so reconcile fast-path now fires
only on real change, not every cycle.

- PeerTagSchema: long lastTimeDiscovered -> String state. Factory
  signature of(Set, long) -> of(Set, String). INTERNAL carries null
  (it is never reconciled).
- ConflatingMetricsAggregator: read features.state() first then
  peerTags() (same defensive ordering rationale -- if a discovery
  refresh interleaves, leave the schema with stale state rather than
  stale tags so the next reconcile re-runs the deep compare).
  Objects.equals for null-tolerant comparison (state can be null
  before discovery has produced a response).
- DDAgentFeaturesDiscovery: drop the public getLastTimeDiscovered()
  accessor added on this branch -- the field stays private for the
  existing throttling logic in discoverIfOutdated().
- Tests updated to mock state() instead of getLastTimeDiscovered().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses amarziali's readability nit (#3289149416) -- multi-line
prose reads better as a single block comment than as a stack of //
lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ound-work' into dougqh/conflating-metrics-background-work
Two new JMH benches that hold every dimension constant except one,
to attribute throughput deltas to a specific axis:

- HighCardinalityResourceMetricsBenchmark: ~1M distinct resource
  values; service/operation/peer.hostname pinned. Exercises the
  aggregate-cache LRU on the resource axis specifically.
- HighCardinalityPeerMetricsBenchmark: ~32K distinct peer.hostname
  values; service/operation/resource pinned. Isolates the peer-tag
  encoding hot path (PEER_TAGS_CACHE lookups, UTF8 encoding,
  parallel-array capture in SpanSnapshot).

Same shape as AdversarialMetricsBenchmark (8 threads, 2x15s warmup +
5x15s measurement, 1 fork) and reuse its CountingHealthMetrics so the
inbox-full vs aggregate-dropped counters print on teardown for an
apples-to-apples comparison.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ound-work' into dougqh/optimize-metric-key

# Conflicts:
#	dd-trace-core/src/main/java/datadog/trace/common/metrics/PeerTagSchema.java
#	dd-trace-core/src/test/java/datadog/trace/common/metrics/PeerTagSchemaTest.java
Comment thread dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateEntry.java Outdated
Comment thread dd-trace-core/src/test/java/datadog/trace/common/metrics/AggregateEntries.java Outdated
Comment thread dd-trace-core/src/test/java/datadog/trace/common/metrics/AggregateTableTest.java Outdated
Copy link
Copy Markdown
Contributor

@sarahchen6 sarahchen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nits, but otherwise looks reasonable! Pre-approving...

Also this was Claude's advice for the failing agent integration test:

Fix options (in order of invasiveness):
1. Add a public test-support module — extract a small AggregateEntryTestSupport factory into a test-support
jar that traceAgentTest can depend on, keeping the production classes package-private.
2. Make testSchema + forSnapshot public on their classes — simplest, but widens the API surface of
intentionally-internal classes.
3. Rewrite the test to not reference these classes directly — drive the test entirely via
ConflatingMetricsAggregator publishing real DDSpans, which avoids any dependency on the internal
AggregateEntry/PeerTagSchema types.

Option 3 is the cleanest because it also makes the integration test more realistic (it exercises the full
pipeline instead of bypassing the producer path).

dougqh and others added 3 commits May 26, 2026 11:59
Addresses sarahchen6 review:
- AggregateEntry.java:380 — early-return on null-or-empty `a`, then check
  `b` once, dropping the two split null branches and the duplicate
  String/UTF8BytesString instanceof checks.
- AggregateEntry.java:398 — String is a CharSequence, so the general
  contentEquals already handles both. Migrate the five service / spanKind /
  httpMethod / httpEndpoint / grpcStatusCode call sites in matches() and
  delete the helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses sarahchen6 review on AggregateEntries.java:13: the prior name
reads too close to the production AggregateEntry class. Pick a more
test-flavored name. Touches the file itself + the 8 callers across
ConflatingMetricAggregatorTest and SerializingMetricWriterTest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses sarahchen6 review on AggregateTableTest:237 and
ConflatingMetricsAggregatorDisableTest:143: comments narrated the prior-
behavior-and-fix path that led to each test, but the test itself is
self-evident -- a future reader only needs the expected behavior. Keep
the behavior summary, drop the "Regression:" / "prior CLEAR handler ..."
flavor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Base automatically changed from dougqh/conflating-metrics-background-work to master May 26, 2026 16:40
…ric-key

# Conflicts:
#	dd-trace-core/src/main/java/datadog/trace/common/metrics/AggregateMetric.java
#	dd-trace-core/src/main/java/datadog/trace/common/metrics/Aggregator.java
#	dd-trace-core/src/main/java/datadog/trace/common/metrics/ConflatingMetricsAggregator.java
#	dd-trace-core/src/main/java/datadog/trace/common/metrics/PeerTagSchema.java
#	dd-trace-core/src/main/java/datadog/trace/common/metrics/SpanSnapshot.java
#	dd-trace-core/src/test/groovy/datadog/trace/common/metrics/AggregateMetricTest.groovy
#	dd-trace-core/src/test/groovy/datadog/trace/common/metrics/ConflatingMetricAggregatorTest.groovy
#	dd-trace-core/src/test/java/datadog/trace/common/metrics/ConflatingMetricsAggregatorBootstrapTest.java
#	dd-trace-core/src/test/java/datadog/trace/common/metrics/PeerTagSchemaTest.java
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 26, 2026

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Startup Time

Scenario This PR master Change
insecure-bank / iast 14,012 ms 14,037 ms -0.2%
insecure-bank / tracing 12,959 ms 13,055 ms -0.7%
petclinic / appsec 16,527 ms 16,329 ms +1.2%
petclinic / iast 16,609 ms 16,633 ms -0.1%
petclinic / profiling 16,387 ms 16,531 ms -0.9%
petclinic / tracing 14,819 ms 16,014 ms -7.5%

Commit: d5065f26 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

dougqh and others added 3 commits May 26, 2026 16:01
The class itself is package-private, so the public modifier on these
constants is meaningless and misleads about the actual access surface.
All six call sites (ConflatingMetricsAggregator + tests) are in the
same package and continue to compile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eliminates the dual-equality-contract maintenance hazard on
AggregateEntry. Production code never invoked equals/hashCode --
AggregateTable bucketing goes through keyHash + matches(SpanSnapshot)
directly. The contract existed only to support Spock mock argument
matchers in tests.

- Delete equals/hashCode from production AggregateEntry; class stays
  final.
- Make peerTagNames/peerTagValues fields package-private so a sibling
  helper in the same package can read them.
- Add src/test AggregateEntryTestUtils.equals/hashCode that
  implements the same field-wise contract (raw-array based, consistent
  with hashOf) for tests.
- Update Spock argument matchers from `writer.add(fixture)` to
  `writer.add({ AggregateEntryTestUtils.equals(it, fixture) })`. For
  loop-driven expectations, hoist the fixture into a per-iteration
  `def expected = ...` local so it's captured by value rather than by
  reference to the loop variable.
- Update the JUnit contract tests to drive the helper directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* @param endBucket exclusive upper bound; must be in {@code [startBucket, buckets.length]}.
*/
public static final <TEntry extends Hashtable.Entry>
MutatingTableIterator<TEntry> mutatingTableIterator(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI to reviewers - added this routine to scan over a subset of buckets
Used by AggregateTable eviction policy to resume eviction sweep where the prior one left off

@dougqh dougqh removed the tag: no release notes Changes to exclude from release notes label May 26, 2026
dougqh added a commit that referenced this pull request May 26, 2026
Mirrors the #11382 cleanup. Production AggregateEntry never invokes
equals/hashCode -- AggregateTable bucketing goes through keyHash +
Canonical.matches directly. The contract existed only to support
Spock mock argument matchers.

- Delete equals/hashCode from production AggregateEntry; class stays
  final.
- Add src/test AggregateEntryTestUtils.equals/hashCode that implements
  the same field-wise contract (peerTags compared as an encoded list,
  consistent with hashOf on this branch).
- Update Spock argument matchers from `writer.add(AggregateEntry.of(...))`
  to `writer.add({ AggregateEntryTestUtils.equals(it, AggregateEntry.of(...)) })`.
- For loop-driven expectations, hoist the fixture into a per-iteration
  `def expected = ...` local so it's captured by value rather than by
  reference to the loop variable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dougqh and others added 3 commits May 26, 2026 16:49
Both classes existed only to support tests against AggregateEntry --
one for positional-args fixture construction, the other for value-
based equality matching. The split was artificial; folding them into
a single AggregateEntryTestUtils removes a file and gives test sites
one place to look for AggregateEntry test helpers.

- Move `of(...)` into AggregateEntryTestUtils alongside the existing
  `equals(a, b)` / `hashCode(e)` helpers.
- Delete AggregateEntryFixtures.java.
- Rename 51 caller sites across ConflatingMetricAggregatorTest and
  SerializingMetricWriterTest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two doc-only additions surfacing design context that reviewers
would otherwise have to reconstruct:

- AggregateEntry: name the "5 responsibilities concentrated on one
  object" tradeoff explicitly (UTF8 caches + label fields + raw
  peerTag arrays + encoded peerTag list + counter/histogram state).
  Prior MetricKey + AggregateMetric design allocated two objects per
  unique key on miss; folding them yields one. The class is wider as
  a result; that's the trade we chose.

- AggregateEntry + AggregateTable: note that the single-writer
  invariant is convention-enforced -- the @SuppressFBWarnings
  documents the assumption but nothing checks the calling thread at
  runtime. Point to ClearSignal as the explicit mechanism for
  funneling cross-thread mutators back onto the aggregator thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On the miss path, AggregateTable.findOrInsert computed the snapshot
hash for the lookup, then AggregateEntry.forSnapshot computed it
again via the same hashOf(s) call to set keyHash on the new entry.
Three reads per snapshot field on a miss (findOrInsert hashOf +
forSnapshot hashOf + constructor canonicalize), with two of those
also paying for the per-call Arrays.hashCode(peerTagValues).

Pass the hash that findOrInsert already computed into forSnapshot
instead. Two reads per field on miss, one Arrays.hashCode(peerTagValues)
per miss. Kept a no-arg forSnapshot overload for test callers that
don't have a precomputed hash on hand.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
// the aggregator drains existing snapshots and ships them on the next report cycle; the
// sink rejects that payload and fires DOWNGRADED again, which retries disable() against a
// now-empty inbox. Worst case: one extra reporting cycle of stale data.
inbox.offer(CLEAR);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLEAR is a completable future that's reused somehow (but not reset). I think this should be documented to avoid any future race if the usage will change. In any case if the inbox is full this will be silently dropped

* the same content array is deterministic so the recomputed value matches. {@code int} writes are
* atomic per JLS.
*/
private int cachedHashCode;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be volatile for correctness ?

}

@SuppressFBWarnings("AT_NONATOMIC_64BIT_PRIMITIVE")
void clear() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the peerTagValues are not nullified upon clear(). Not clear to me if it's done on purpose or a leftover

*/
private static boolean contentEquals(UTF8BytesString a, CharSequence b) {
if (a == null || a.length() == 0) {
return b == null || b.length() == 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null or empty had not the same semantics before. For instance httpMethod could be null meaning that we were not aggregating at all that's different from having an empty string (that was considered a bucket at the end). I'd like to be sure that we're not regressing the way the aggregation is done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core comp: metrics Metrics tag: ai generated Largely based on code generated by an AI or LLM tag: performance Performance related changes type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants