Skip to content

chore(datadog_metrics sink): switch series v2 and sketches to zstd compression#24956

Merged
vladimir-dd merged 9 commits intomasterfrom
vladimir-dd/metrics-v2-zstd
Mar 24, 2026
Merged

chore(datadog_metrics sink): switch series v2 and sketches to zstd compression#24956
vladimir-dd merged 9 commits intomasterfrom
vladimir-dd/metrics-v2-zstd

Conversation

@vladimir-dd
Copy link
Copy Markdown
Contributor

@vladimir-dd vladimir-dd commented Mar 18, 2026

Summary

Rationale: Switch Series v2 (/api/v2/series) and sketches(/api/beta/sketches) to zstd compression.

  • Add DatadogMetricsCompression enum (Zlib/Zstd) in config.rs with compressor(), content_encoding(), and max_compressed_size() methods
  • Add compression() method on DatadogMetricsEndpoint: Series v2 and Sketches → Zstd, Series v1 → Zlib
  • Add max_compressed_size(n) for each scheme: Zlib uses the DEFLATE stored-block worst-case formula; Zstd mirrors the ZSTD_compressBound C macro
  • Propagate content_encoding through DatadogMetricsRequest and the request builder instead of hardcoding "deflate"
  • Make DatadogMetricsEncoder::new() infallible — production limits from payload_limits() are always valid; remove CreateError and validate_payload_size_limits
  • Track buffered_bound for all compressor types (zstd 128KB blocks, zlib 4KB BufWriter) to avoid underestimating compressed payload size
  • Fix SMP regression benchmark (statsd_to_datadog_metrics): switch to ingress_throughput, which is a better default benchmark of overall throughput

Compressed size capacity estimate:

The encoder needs to decide whether accepting one more metric would exceed the compressed payload limit, without being able to back out a compressor write. The estimate splits into two parts:

  1. Bytes already flushed to the output buffer (get_ref().len()) — exact compressed size
  2. Bytes still in the compressor's internal buffer — estimated via max_compressed_size(buffered_bound + n) (worst-case upper bound)

All compressors buffer internally before flushing (zstd: 128 KB per block, zlib: 4 KB BufWriter). buffered_bound tracks an upper bound on uncompressed bytes not yet visible in get_ref().len(), resetting to n when a flush is detected.

Tests added:

  • max_compressed_size_is_upper_bound: empirically validates both Zlib and Zstd formulas are true upper bounds using incompressible (Xorshift64) data, and are not overly conservative (slack ≤ 1% + 64 bytes)
  • zstd_v2_payload_never_exceeds_512kb_with_incompressible_data: end-to-end test with real 512KB limit, verifies payload ≤ 512KB (safety) and > 95% utilization (efficiency) using high-entropy printable ASCII metric names
  • compressed_limit_is_respected_regardless_of_compressor_internal_buffering: regression test for zstd's 128KB internal buffering — uses a 512-byte compressed limit where get_ref().len() stays 0 throughout, verifying the encoder stops after a handful of metrics (not 100)
  • zstd_buffered_bound_resets_to_last_metric_size_after_block_flush: white-box test directly verifying buffered_bound resets to exactly n (not 0) after a zstd block flush
  • encode_series_v2_breaks_out_when_limit_reached_compressed: verifies the hot-path compressed-limit check works correctly for the zstd path
  • encoding_check_for_payload_limit_edge_cases_v2: proptest that any Series v2 payload decompresses cleanly with zstd and stays within configured limits
  • v2_series_default_limits_split_large_batches: validates 120k metrics are correctly split across multiple batches with v2 limits
  • default_batch_config_uses_endpoint_specific_size_limits / v1_batch_config_uses_v1_size_limit / explicit_max_bytes_applies_to_both_endpoints: verify per-endpoint batch size limits
Correctness analysis

V1/zlib path preserved

  • Series(V1).compression() and Sketches.compression() both return Zlib — no change in compressor selection
  • Zlib.content_encoding() returns "deflate" — same as the previously hardcoded Content-Encoding header
  • Zlib.compressor() returns Compression::zlib_default().into() — identical to the old get_compressor()
  • write_payload_header / write_payload_footer still emit JSON wrapping ({"series":[ / ]}) for V1, nothing for V2/Sketches
  • The zlib max_compressed_size(n) formula is algebraically identical to the old n + max_compressed_overhead_len(n):
    both compute n + (1 + n.saturating_sub(6) / 16384) * 5
  • The only behavioral change for zlib: buffered_bound now makes the compressed-size estimate slightly more conservative by accounting for the ~4 KB BufWriter buffer. This is more correct than before and the impact is negligible against the 3.2 MB compressed limit

V2/zstd path

  • The ZSTD_compressBound formula (n + (n >> 8) + correction for <128KB) matches the C library macro exactly
  • buffered_bound tracking is sound: accumulates on each write (+= n), resets to n (not 0) when a flush is detected — because the triggering write may straddle the block boundary, n is a safe upper bound on what remains buffered
  • Header/footer bytes written to the compressor are tracked in buffered_bound (header via try_encode, footer is 0 bytes for V2)
  • reset_state() creates the correct compressor for the endpoint (was previously always zlib via Default)
  • finish() retains its existing safety net: if the payload exceeds the compressed limit after finalization, it returns TooLarge with a recommended split count

Removed code

  • CreateError / FailedToBuild: construction is now infallible since limits always come from payload_limits()
  • validate_payload_size_limits: no longer needed — with_payload_limits() is gated behind #[cfg(test)], production code always uses well-known API limits
  • is_series(): only consumer was the removed validate_payload_size_limits
  • get_compressor() / max_compressed_overhead_len() / max_compression_overhead_len(): replaced by DatadogMetricsCompression::compressor() and max_compressed_size()

Vector configuration

sinks:
  datadog_metrics:
    type: datadog_metrics
    inputs: [...]
    default_api_key: "${DD_API_KEY}"
    series_api_version: v2  # now correctly uses zstd

How did you test this PR?

  • Unit tests: all datadog metrics encoder tests pass (cargo test --no-default-features --features sinks-datadog_metrics).

End-to-end correctness test (branch)

Ran scripts/validate_dd_metrics_correctness.py against the real Datadog API. All 18 metric checks passed for both v1 and v2, with identical values:

Metric v1 v2
counter 50.0 50.0 ✅
gauge 42.5 42.5 ✅
set 1.0 1.0 ✅
dist avg/count/sum/min/max
histogram count/avg
summary sum/count/ratio
multi-tag counter (group:a/b/*)
multi-tag gauge (group:a/b)

All 18 metrics match between v1 and v2.

v1/zlib vs v2/zstd performance benchmark (branch)

Ran scripts/benchmark_dd_metrics_v1_v2.py against the real API at 50k events/sec, 2 repeats, 15s warmup, 60s measure:

Metric v1/zlib v2/zstd Delta
Sent events/s 50,922 50,311 -1.2% (≈equal)
Compressed bytes/s* 3.33 MB/s 1.11 MB/s -66.6% (better compression)
Avg CPU % 169.7 131.7 -22.4%
Avg RSS (MB) 7,334 2,478 -66.2%
Peak RSS (MB) 10,162 2,710 -73.3%
Delivery ratio 1.27 1.20 ≈equal
HTTP requests/s 10.4 124.4 +1093% (expected: smaller 512KB batches vs 3.2MB)
  • bytes_sent() in the DD metrics service was changed from request_encoded_size()(uncompressed) to request_wire_size() (compressed/on-the-wire);

Key takeaway: v2 delivers the same metric throughput as v1 while using 22% less CPU, 66% less memory, and 67% less bandwidth. The higher HTTP request rate is expected due to the smaller v2 payload limit (512KB vs 3.2MB).

SMP regression benchmark

The statsd_to_datadog_metrics SMP benchmark reported a -69% drop in egress_throughput (compressed bytes received by the blackhole), while ingress_throughput has increased by ~75%:

ingress_throughput benchmark:
Screenshot 2026-03-20 at 08 10 46

egress_throughput benchmark - "regression" here is an improvement(OPW sends out 3x less bytes):
Screenshot 2026-03-20 at 08 20 03

Change Type

  • New feature

Is this a breaking change?

  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Mar 18, 2026
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch 14 times, most recently from c4c80b6 to fa052b6 Compare March 18, 2026 19:28
@vladimir-dd vladimir-dd changed the title feat(datadog_metrics sink): add zstd compression for series v2 endpoint feat(datadog_metrics sink): switch series v2 endpoint to zstd compression Mar 18, 2026
@vladimir-dd
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76fb1c59bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch 3 times, most recently from f5faf86 to 783f621 Compare March 19, 2026 09:34
@github-actions github-actions bot added the domain: releasing Anything related to releasing Vector label Mar 19, 2026
@vladimir-dd vladimir-dd changed the title feat(datadog_metrics sink): switch series v2 endpoint to zstd compression WIP: feat(datadog_metrics sink): switch series v2 endpoint to zstd compression Mar 19, 2026
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch 5 times, most recently from 67c992a to eccada1 Compare March 19, 2026 16:41
@vladimir-dd vladimir-dd changed the title WIP: feat(datadog_metrics sink): switch series v2 endpoint to zstd compression chore(datadog_metrics sink): switch series v2 endpoint to zstd compression Mar 20, 2026
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch from 4042a17 to d2df3d5 Compare March 20, 2026 07:56
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch from d2df3d5 to 48bdb12 Compare March 20, 2026 08:28
@vladimir-dd vladimir-dd marked this pull request as ready for review March 20, 2026 08:28
@vladimir-dd vladimir-dd requested a review from a team as a code owner March 20, 2026 08:28
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48bdb12f7e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vladimir-dd and others added 2 commits March 20, 2026 10:13
…zero limits

Start proptest ranges at 1 instead of 0 for uncompressed_limit and
compressed_limit. The old validate_payload_size_limits rejected zero
limits, but with_payload_limits is now infallible, so finish() can
panic on division-by-zero when computing recommended_splits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sion

Sketches endpoint now uses zstd instead of zlib, matching Series v2.
Only Series v1 remains on zlib.

Validated against real Datadog API: 36/36 correctness checks passed,
all 18 metrics match between v1 and v2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vladimir-dd vladimir-dd changed the title chore(datadog_metrics sink): switch series v2 endpoint to zstd compression chore(datadog_metrics sink): switch series v2 and sketches to zstd compression Mar 20, 2026
@pront
Copy link
Copy Markdown
Member

pront commented Mar 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4f8e56d64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4f8e56d64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vladimir-dd and others added 4 commits March 23, 2026 16:39
…zstd

The changelog incorrectly stated that Sketches continue to use zlib,
but the code routes Sketches to zstd compression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…common

Move zlib and zstd compression-bound constants from inline locals in
DatadogMetricsCompression::max_compressed_size to
lib/vector-common/src/constants.rs with descriptive names and doc
comments linking to their specifications.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move use statements from inside test helper function bodies to the top
of the test module, as is conventional in Rust.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ci-run-e2e-datadog-metrics

I believe we need to update the test code to use the correct decompression method based on the API version (v2 - zstd, v1 - zlib)

@vladimir-dd
Copy link
Copy Markdown
Contributor Author

vladimir-dd commented Mar 24, 2026

I believe we need to update the test code to use the correct decompression method based on the API version (v2 - zstd, v1 - zlib)

Seems like is_zstd should correctly guess the right decompression method, based on the first bytes.

@vladimir-dd vladimir-dd requested a review from pront March 24, 2026 15:49
@pront pront enabled auto-merge March 24, 2026 17:34
Copy link
Copy Markdown
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@pront pront added this pull request to the merge queue Mar 24, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 24, 2026
…zstd

The decompress_payload helper in integration_tests.rs hardcoded zlib
decompression, but Series v2 now uses zstd. Auto-detect the compression
format via is_zstd so tests work for both zlib and zstd payloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vladimir-dd vladimir-dd enabled auto-merge March 24, 2026 18:43
@vladimir-dd vladimir-dd added this pull request to the merge queue Mar 24, 2026
Merged via the queue into master with commit 8bac1db Mar 24, 2026
58 checks passed
@vladimir-dd vladimir-dd deleted the vladimir-dd/metrics-v2-zstd branch March 24, 2026 19:48
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: sinks Anything related to the Vector's sinks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants