chore(datadog_metrics sink): switch series v2 and sketches to zstd compression#24956
chore(datadog_metrics sink): switch series v2 and sketches to zstd compression#24956vladimir-dd merged 9 commits intomasterfrom
Conversation
c4c80b6 to
fa052b6
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76fb1c59bd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
f5faf86 to
783f621
Compare
67c992a to
eccada1
Compare
4042a17 to
d2df3d5
Compare
d2df3d5 to
48bdb12
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48bdb12f7e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…zero limits Start proptest ranges at 1 instead of 0 for uncompressed_limit and compressed_limit. The old validate_payload_size_limits rejected zero limits, but with_payload_limits is now infallible, so finish() can panic on division-by-zero when computing recommended_splits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sion Sketches endpoint now uses zstd instead of zlib, matching Series v2. Only Series v1 remains on zlib. Validated against real Datadog API: 36/36 correctness checks passed, all 18 metrics match between v1 and v2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4f8e56d64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
changelog.d/24956_datadog_metrics_zstd_series_v2.enhancement.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4f8e56d64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
changelog.d/24956_datadog_metrics_zstd_series_v2.enhancement.md
Outdated
Show resolved
Hide resolved
…zstd The changelog incorrectly stated that Sketches continue to use zlib, but the code routes Sketches to zstd compression. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…common Move zlib and zstd compression-bound constants from inline locals in DatadogMetricsCompression::max_compressed_size to lib/vector-common/src/constants.rs with descriptive names and doc comments linking to their specifications. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move use statements from inside test helper function bodies to the top of the test module, as is conventional in Rust. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pront
left a comment
There was a problem hiding this comment.
/ci-run-e2e-datadog-metrics
I believe we need to update the test code to use the correct decompression method based on the API version (v2 - zstd, v1 - zlib)
Seems like |
…zstd The decompress_payload helper in integration_tests.rs hardcoded zlib decompression, but Series v2 now uses zstd. Auto-detect the compression format via is_zstd so tests work for both zlib and zstd payloads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Rationale: Switch Series v2 (
/api/v2/series) and sketches(/api/beta/sketches) to zstd compression.DatadogMetricsCompressionenum (Zlib/Zstd) inconfig.rswithcompressor(),content_encoding(), andmax_compressed_size()methodscompression()method onDatadogMetricsEndpoint: Series v2 and Sketches → Zstd, Series v1 → Zlibmax_compressed_size(n)for each scheme: Zlib uses the DEFLATE stored-block worst-case formula; Zstd mirrors theZSTD_compressBoundC macrocontent_encodingthroughDatadogMetricsRequestand the request builder instead of hardcoding"deflate"DatadogMetricsEncoder::new()infallible — production limits frompayload_limits()are always valid; removeCreateErrorandvalidate_payload_size_limitsbuffered_boundfor all compressor types (zstd 128KB blocks, zlib 4KB BufWriter) to avoid underestimating compressed payload sizestatsd_to_datadog_metrics): switch toingress_throughput, which is a better default benchmark of overall throughputCompressed size capacity estimate:
The encoder needs to decide whether accepting one more metric would exceed the compressed payload limit, without being able to back out a compressor write. The estimate splits into two parts:
get_ref().len()) — exact compressed sizemax_compressed_size(buffered_bound + n)(worst-case upper bound)All compressors buffer internally before flushing (zstd: 128 KB per block, zlib: 4 KB BufWriter).
buffered_boundtracks an upper bound on uncompressed bytes not yet visible inget_ref().len(), resetting tonwhen a flush is detected.Tests added:
max_compressed_size_is_upper_bound: empirically validates both Zlib and Zstd formulas are true upper bounds using incompressible (Xorshift64) data, and are not overly conservative (slack ≤ 1% + 64 bytes)zstd_v2_payload_never_exceeds_512kb_with_incompressible_data: end-to-end test with real 512KB limit, verifies payload ≤ 512KB (safety) and > 95% utilization (efficiency) using high-entropy printable ASCII metric namescompressed_limit_is_respected_regardless_of_compressor_internal_buffering: regression test for zstd's 128KB internal buffering — uses a 512-byte compressed limit whereget_ref().len()stays 0 throughout, verifying the encoder stops after a handful of metrics (not 100)zstd_buffered_bound_resets_to_last_metric_size_after_block_flush: white-box test directly verifyingbuffered_boundresets to exactlyn(not 0) after a zstd block flushencode_series_v2_breaks_out_when_limit_reached_compressed: verifies the hot-path compressed-limit check works correctly for the zstd pathencoding_check_for_payload_limit_edge_cases_v2: proptest that any Series v2 payload decompresses cleanly with zstd and stays within configured limitsv2_series_default_limits_split_large_batches: validates 120k metrics are correctly split across multiple batches with v2 limitsdefault_batch_config_uses_endpoint_specific_size_limits/v1_batch_config_uses_v1_size_limit/explicit_max_bytes_applies_to_both_endpoints: verify per-endpoint batch size limitsCorrectness analysis
V1/zlib path preserved
Series(V1).compression()andSketches.compression()both returnZlib— no change in compressor selectionZlib.content_encoding()returns"deflate"— same as the previously hardcodedContent-EncodingheaderZlib.compressor()returnsCompression::zlib_default().into()— identical to the oldget_compressor()write_payload_header/write_payload_footerstill emit JSON wrapping ({"series":[/]}) for V1, nothing for V2/Sketchesmax_compressed_size(n)formula is algebraically identical to the oldn + max_compressed_overhead_len(n):both compute
n + (1 + n.saturating_sub(6) / 16384) * 5buffered_boundnow makes the compressed-size estimate slightly more conservative by accounting for the ~4 KB BufWriter buffer. This is more correct than before and the impact is negligible against the 3.2 MB compressed limitV2/zstd path
ZSTD_compressBoundformula (n + (n >> 8) + correction for <128KB) matches the C library macro exactlybuffered_boundtracking is sound: accumulates on each write (+= n), resets ton(not 0) when a flush is detected — because the triggering write may straddle the block boundary,nis a safe upper bound on what remains bufferedbuffered_bound(header viatry_encode, footer is 0 bytes for V2)reset_state()creates the correct compressor for the endpoint (was previously always zlib viaDefault)finish()retains its existing safety net: if the payload exceeds the compressed limit after finalization, it returnsTooLargewith a recommended split countRemoved code
CreateError/FailedToBuild: construction is now infallible since limits always come frompayload_limits()validate_payload_size_limits: no longer needed —with_payload_limits()is gated behind#[cfg(test)], production code always uses well-known API limitsis_series(): only consumer was the removedvalidate_payload_size_limitsget_compressor()/max_compressed_overhead_len()/max_compression_overhead_len(): replaced byDatadogMetricsCompression::compressor()andmax_compressed_size()Vector configuration
How did you test this PR?
cargo test --no-default-features --features sinks-datadog_metrics).End-to-end correctness test (branch)
Ran
scripts/validate_dd_metrics_correctness.pyagainst the real Datadog API. All 18 metric checks passed for both v1 and v2, with identical values:All 18 metrics match between v1 and v2.
v1/zlib vs v2/zstd performance benchmark (branch)
Ran
scripts/benchmark_dd_metrics_v1_v2.pyagainst the real API at 50k events/sec, 2 repeats, 15s warmup, 60s measure:bytes_sent()in the DD metrics service was changed fromrequest_encoded_size()(uncompressed) torequest_wire_size()(compressed/on-the-wire);Key takeaway: v2 delivers the same metric throughput as v1 while using 22% less CPU, 66% less memory, and 67% less bandwidth. The higher HTTP request rate is expected due to the smaller v2 payload limit (512KB vs 3.2MB).
SMP regression benchmark
The
statsd_to_datadog_metricsSMP benchmark reported a -69% drop inegress_throughput(compressed bytes received by the blackhole), whileingress_throughputhas increased by ~75%:ingress_throughput benchmark:

egress_throughput benchmark - "regression" here is an improvement(OPW sends out 3x less bytes):

Change Type
Is this a breaking change?
Does this PR include user facing changes?
References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this templatemake fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make test