Capture clickhouse profile events in query summaries.#10172
Conversation
bnaecker
left a comment
There was a problem hiding this comment.
This is great! I completely forgot that we already received and parsed these event messages.
I have a few suggestions, but they're fairly minor. Thanks!
oximeter/db/benches/oxql.rs
Outdated
| fn bench_metric() -> BenchMetric { | ||
| match std::env::var("BENCH_METRIC").as_deref() { | ||
| Ok("cpu") => BenchMetric::CpuTime, | ||
| _ => BenchMetric::Latency, |
There was a problem hiding this comment.
I'd probably match explicitly here, rather than the _ wildcard, and fail if the benchmark isn't a supported one.
oximeter/db/benches/oxql.rs
Outdated
|
|
||
| fn bench_metric() -> BenchMetric { | ||
| match std::env::var("BENCH_METRIC").as_deref() { | ||
| Ok("cpu") => BenchMetric::CpuTime, |
There was a problem hiding this comment.
Should the string match the variant name? E.g. "cpu_time"?
There was a problem hiding this comment.
Yes, this is better. Updated.
| s.profile_summary | ||
| .get("UserTimeMicroseconds") | ||
| .copied() | ||
| .unwrap_or(0) |
There was a problem hiding this comment.
I'm a little suspect of the unwrap_or(0). If the key isn't there, e.g., because of a change when we upgrade ClickHouse, I would want to know. Maybe at least an eprintln!() would help? Same note below too.
There was a problem hiding this comment.
I agreed with you and tried this out, but very occasionally we seem to get missing profile events for a query here and there, rarely enough that it's hard to reproduce. I'm leaving the unwrap_or with a comment for now.
There was a problem hiding this comment.
Hmm, ok that's frustrating. Maybe add an eprintln! or something too, so we have a notification if / when it keeps happening. Up to you though.
| /// Wall clock latency. | ||
| Latency, | ||
| /// Total cpu time. | ||
| CpuTime, |
There was a problem hiding this comment.
Let's make a note that this is user + system time as reported by the DB.
5e8371b to
b77e857
Compare
ClickHouse includes a collection of profile events by default when using the native tcp client. This patch captures those events, aggregating them by type and including aggregated profile events in the optional query profile section. We also make use of these profile summaries in the oxql benchmark, adding a new benchmark type that measures query cpu usage rather than latency.
b77e857 to
20549dc
Compare
ClickHouse includes a collection of profile events by default when using the native tcp client. This patch captures those events, aggregating them by type and including aggregated profile events in the optional query profile section. We also make use of these profile summaries in the oxql benchmark, adding a new benchmark type that measures query cpu usage rather than latency.
Context: I wanted to evaluate #10110 more rigorously, and Claude noticed that we had access to clickhouse profiling events already. Looking at cpu profiles for that patch actually showed that latency improvements came at the cost of higher cpu use, which is annoying but useful to know.