Skip to content

Add observation and ser/de metrics for serde calls#5320

Open
MasterPtato wants to merge 1 commit into
stack/slop-claude-opus-4-8-medium-chore-add-profiling-cargo-profile-for-heaptrack-leak-investigation-xsoysnqsfrom
stack/add-observation-and-ser-de-metrics-for-serde-calls-ztnuwvts
Open

Add observation and ser/de metrics for serde calls#5320
MasterPtato wants to merge 1 commit into
stack/slop-claude-opus-4-8-medium-chore-add-profiling-cargo-profile-for-heaptrack-leak-investigation-xsoysnqsfrom
stack/add-observation-and-ser-de-metrics-for-serde-calls-ztnuwvts

Conversation

@MasterPtato

@MasterPtato MasterPtato commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@MasterPtato

MasterPtato commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Stack for rivet-dev/rivet

Current stack:

Dependencies:

Get stack: forklift get 5320
Push local edits: forklift submit
Merge when ready: forklift merge 5320

@MasterPtato MasterPtato force-pushed the stack/add-observation-and-ser-de-metrics-for-serde-calls-ztnuwvts branch from 7988eba to a0c66b8 Compare June 23, 2026 20:34
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

This PR introduces serde/observability wrapper macros (json_to_vec!, json_from_slice!, etc.) that instrument serialization/deserialization calls with timing and byte-size Prometheus metrics, plus an observe! macro for general-purpose duration tracking. It also deletes the ~2500-line UPS broadcast simulation module (sim.rs), which appears to be a clean removal with no dangling references.


engine/packages/util/src/serde.rsjson_from_slice! and bare_from_slice! double-evaluate $value

macro_rules! json_from_slice {
    ($value:expr) => {{
        let __bind = $value;
        // ... records __bind.len() for metrics ...
        $crate::observe!(serde_json::from_slice($value))  // BUG: should be from_slice(__bind)
    }};
}

$value is evaluated twice: once when bound to __bind (to measure length) and again when passed directly to serde_json::from_slice. The same bug exists in bare_from_slice!. Compare to json_from_str!, which correctly uses __bind in the second position. Current call sites all pass simple variable references so there is no runtime impact today, but any caller that passes an expression with side effects (e.g., a channel receive or iter.next().unwrap()) would evaluate it twice. The fix is to replace $value with __bind in the observe! invocation for both macros.


engine/packages/util/src/lib.rsobserve_with! docstring shows arguments in the wrong order

The docstring example:

/// observe_with!(task(), |dt, location| {
///     if dt > Duration::from_secs(10) { ... }
/// });

...but the actual macro signature is ($cb:expr, $($tt:tt)*) — callback first, then the expression to time. The correct invocation is observe_with!(|dt, location| { ... }, task()). A developer copying the docstring example will get a compile error (or silently pass task() as the callback).


engine/packages/util/src/lib.rs — unclosed code fence in observe! docstring

The doc comment opens a second ``` (after the closing ``` of the first example block) but never closes it:

/// Supports async work.
///	Use `observe_with!` for callback.
/// ```

Rustdoc will treat everything from that ``` onward as a code block, breaking the rendered docs and likely failing cargo test --doc.


engine/packages/util/src/metrics.rs — byte-size bucket arrays should be a named constant in buckets.rs (CLAUDE.md)

SERIALIZE_SIZE and DESERIALIZE_SIZE each define an identical inline vec![16.0, 32.0, ..., 16777216.0] bucket array. The project CLAUDE.md says:

Reuse existing histogram bucket constants from engine/packages/metrics/src/buckets.rs... Add a new constant to buckets.rs only if no existing constant covers the value range.

No existing constant covers byte sizes, so a new one (e.g., BYTE_SIZE_BUCKETS) should be added to buckets.rs and referenced from both metrics. Having the same array duplicated inline in two places means a future bucket adjustment would require changes in two spots.


engine/packages/util/src/metrics.rslocation label cardinality may exceed safe bounds over time (CLAUDE.md)

The location label (file!():line!():column!()) is bounded by compile-time call sites, not runtime values, so it won't explode the way actor_id would. That said, it is not in CLAUDE.md's explicit safe-label list (pool_name, workflow_name, activity_name, etc.), and as observe! gets adopted more broadly, each new instrumented call site adds a distinct time series to OBSERVATION_DURATION and LONG_OBSERVATION_DURATION. Worth discussing with the team whether a coarser label (e.g., module name or an explicit string tag passed to the macro) would better balance observability precision against series cardinality.


engine/packages/util/src/serde.rs — copy-paste doc comments on deserialization macros

json_from_str!, json_from_slice!, and bare_from_slice! all carry the doc comment /// Wraps serde_json::to_vec with observability. — wrong in both direction (these are from_* deserializers) and format (the function named is to_vec, not from_str/from_slice). These should say Wraps serde_json::from_str, Wraps serde_json::from_slice, and Wraps serde_bare::from_slice respectively.

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

PR Review: Add observation and ser/de metrics for serde calls

Good direction overall — centralizing serde calls behind instrumented macros makes it easy to get latency/size distributions across the whole engine without touching each call site individually. The sim.rs removal is clean. A few concrete issues to address:

Bug: double evaluation in json_from_slice! (engine/packages/util/src/serde.rs)

The $value expression is expanded twice — once into __bind (used for the size metric) and once directly into serde_json::from_slice($value) inside observe!. For a &[u8] reference this is harmless in practice, but it is inconsistent with json_from_str! (which correctly passes __bind to observe!) and will break if a caller passes an expression with side effects. Fix: change serde_json::from_slice($value) to serde_json::from_slice(__bind).

Metrics naming: no package prefix (engine/packages/util/src/metrics.rs)

observation_duration, long_observation_duration, serialize_size, deserialize_size are very generic names registered into the shared REGISTRY. All existing engine metrics use package-prefixed names (pegboard_*, gasoline_*, etc.). These four should be prefixed with rivet_util_ to avoid collisions and match convention.

Cardinality concern: location label on the generic observation histograms

Source-code locations are compile-time bounded, but the two generic histograms (OBSERVATION_DURATION / LONG_OBSERVATION_DURATION) invite broad adoption and will accumulate one histogram series per unique call site as they spread. The serde-specific SERIALIZE_SIZE / DESERIALIZE_SIZE histograms are fine because their call sites are finite and intentional. Consider whether the generic observation ones need the location label or could use a caller-supplied operation name instead.

Copy-pasted doc comments are incorrect (engine/packages/util/src/serde.rs)

Three macros have wrong doc headers: json_from_str! says "Wraps serde_json::to_vec" (should say from_str), json_from_slice! says "Wraps serde_json::to_vec" (should say from_slice), bare_from_slice! says "Wraps serde_bare::to_vec" (should say from_slice).

Malformed doc block on observe! (engine/packages/util/src/lib.rs)

The doc block has a tab character before "Use observe_with! for callback." and the closing triple-backtick has no matching opening fence, so rustdoc renders it broken. Separately, "Supports async work" is misleading: the macro evaluates its argument synchronously; async work is only timed correctly if the caller writes .await inside the invocation (e.g. observe!(task().await)).

Minor: inconsistent location formatting

observe! builds the location string with format!() directly, while observe_with! uses $crate::location!().to_string(). Both produce the same result through different paths. Using location!() everywhere would be cleaner.

No issues with the serde call replacements

The mechanical replacements throughout gasoline, cache, and api-peer look correct. The util -> rivet-util rename in depot/Cargo.toml is expected.

@railway-app

railway-app Bot commented Jun 23, 2026

Copy link
Copy Markdown

🚅 Deployed to the rivet-pr-5320 environment in rivet-frontend

Service Status Web Updated (UTC)
website ✅ Success (View Logs) Web Jun 24, 2026 at 7:09 pm
frontend-cloud 😴 Sleeping (View Logs) Web Jun 24, 2026 at 6:47 pm
kitchen-sink 😴 Sleeping (View Logs) Web Jun 24, 2026 at 6:26 pm
frontend-inspector 😴 Sleeping (View Logs) Web Jun 24, 2026 at 5:54 pm
ladle ✅ Success (View Logs) Web Jun 23, 2026 at 8:25 pm
mcp-hub ✅ Success (View Logs) Web Jun 23, 2026 at 8:24 pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant