Skip to content

Conversation

@FRosner
Copy link
Contributor

@FRosner FRosner commented Nov 27, 2025

Changes

When using counters that represent a global rate (benchmark::Counter::kIsRate), before this PR, the rate was effectively computed per thread because we pass the sum of all seconds (wall or CPU time) passed across all threads. This breaks the definition of the global rate and subsequently, when using kAvgThreadsRate, the rate is divided by the number of threads (again), yielding non-sense results.

This is a regression introduced by #1836. This PR fixes it by dividing the total seconds count by the number of threads before passing it to the counter finalization, which then computes the rates etc.

We're also fixing the test expectations.

References

@google-cla
Copy link

google-cla bot commented Nov 27, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

v /= (cpu_time / num_threads);
}
if ((c.flags & Counter::kAvgThreads) != 0) {
v /= num_threads;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i should know this but i've lost track: can flags be both IsRate and AvgThreads? if so, are we then dividing twice incorrectly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i should know this but i've lost track: can flags be both IsRate and AvgThreads

IIUC, yes, that's what kAvgThreadsRate will do:

kAvgThreadsRate = kIsRate | kAvgThreads,

if so, are we then dividing twice incorrectly?

I think it is correct. For IsRate we are multiplying by the number of threads (note the brackets, a / (b / c) = a / b * c). Then for kAvgThreads we are dividing by the number of threads again a / b * c / c = a / b, so we get what we'd expect for the per-thread average?

But this should just be tested in some unit tests. I need to check where the existing tests are.

Copy link
Collaborator

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lossless way to do this would be to introduce kIsThreadInvariant.

@FRosner
Copy link
Contributor Author

FRosner commented Nov 27, 2025

The lossless way to do this would be to introduce kIsThreadInvariant.

Let's continue the high level discussion in #2080 (comment), since you left a longer response over there. I don't think introducing a new flag is the way to go here.

@FRosner FRosner changed the title Update counter.cc #2080: Fix rate and thread rate counter aggregates Nov 27, 2025
@LebedevRI
Copy link
Collaborator

(@dmah42 after merging #2089 the diff will make more sense..)

@dmah42
Copy link
Member

dmah42 commented Dec 8, 2025

merged 2089

@LebedevRI LebedevRI marked this pull request as ready for review December 8, 2025 20:12
@LebedevRI
Copy link
Collaborator

Well, this does what it claims to.
I think this will be correct for manual/wall-time/thread-time timers,
i'm not sure how ->MeasureProcessCPUTime() iteracts with ->Threads().
Does the semantics change make sense? If so, i think this is it.

@FRosner
Copy link
Contributor Author

FRosner commented Dec 9, 2025

i'm not sure how ->MeasureProcessCPUTime() iteracts with ->Threads().
Does the semantics change make sense? If so, i think this is it.

Is that a question for me? I haven't used MeasureProcessCPUTime, so I'd need to take a look. Is it just using a different "clock" to measure the total time?

@dmah42
Copy link
Member

dmah42 commented Dec 9, 2025

agreed, this does what the issue suggested. we still need some documentation in the docs somewhere, and yes please check the ProcessCPUTime also makes sense.

@FRosner
Copy link
Contributor Author

FRosner commented Dec 9, 2025

Thank you so much for adding all the tests @LebedevRI and @dmah42 and sorry I didn't get to it earlier. I updated the PR description and will look into the docs and check ProcessCPUTime before marking it as ready for review.

@FRosner
Copy link
Contributor Author

FRosner commented Dec 9, 2025

If I understand the docs correctly, MeasureProcessCPUTime affects only the way the number of required iterations is computed, right? I ran a few combinations on 1.9.4 (not this branch) and it seems that the setting has no effect on the counters.

static void BM_ExampleTiming(benchmark::State& state) {
    for (auto _ : state) {
        benchmark::DoNotOptimize(1 + 2);
        std::this_thread::sleep_for(std::chrono::milliseconds(1000));
        state.SetIterationTime(1);
    }
    state.counters["counter"] = benchmark::Counter(1);
    state.counters["counter_rate"] = benchmark::Counter(1, benchmark::Counter::kIsRate);
    state.counters["counter_thread_rate"] = benchmark::Counter(1, benchmark::Counter::kAvgThreadsRate);
}

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10);

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->UseManualTime();

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->UseRealTime();

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->MeasureProcessCPUTime();

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->UseManualTime()
    ->MeasureProcessCPUTime();

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->UseRealTime()
    ->MeasureProcessCPUTime();
---------------------------------------------------------------------------------------------------------------
Benchmark                                                     Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------
BM_ExampleTiming/threads:1                           1003846688 ns        40700 ns           10 counter=1 counter_rate=2.457k/s counter_thread_rate=2.457k/s
BM_ExampleTiming/threads:10                          1004385959 ns        40400 ns           10 counter=10 counter_rate=24.7525k/s counter_thread_rate=2.47525k/s
BM_ExampleTiming/manual_time/threads:1               1000000000 ns        54000 ns            1 counter=1 counter_rate=1/s counter_thread_rate=1/s
BM_ExampleTiming/manual_time/threads:10              1000000000 ns        44500 ns           10 counter=10 counter_rate=1/s counter_thread_rate=0.1/s
BM_ExampleTiming/real_time/threads:1                 1005048707 ns        43000 ns            1 counter=1 counter_rate=0.994977/s counter_thread_rate=0.994977/s
BM_ExampleTiming/real_time/threads:10                1002049363 ns        24600 ns           10 counter=10 counter_rate=0.997955/s counter_thread_rate=0.0997955/s
BM_ExampleTiming/process_time/threads:1              1002829254 ns        50700 ns           10 counter=1 counter_rate=1.97239k/s counter_thread_rate=1.97239k/s
BM_ExampleTiming/process_time/threads:10             1002060634 ns       310600 ns           10 counter=10 counter_rate=3.21958k/s counter_thread_rate=321.958/s
BM_ExampleTiming/process_time/manual_time/threads:1  1000000000 ns        64000 ns            1 counter=1 counter_rate=1/s counter_thread_rate=1/s
BM_ExampleTiming/process_time/manual_time/threads:10 1000000000 ns       406800 ns           10 counter=10 counter_rate=1/s counter_thread_rate=0.1/s
BM_ExampleTiming/process_time/real_time/threads:1    1003886083 ns        50000 ns            1 counter=1 counter_rate=0.996129/s counter_thread_rate=0.996129/s
BM_ExampleTiming/process_time/real_time/threads:10   1004308770 ns       307700 ns           10 counter=10 counter_rate=0.99571/s counter_thread_rate=0.099571/s

The rates are consistent as long as you don't use CPU time for the rate calculation (which makes sense given my sleep / manual timing). So I think we're good on this front?

@FRosner
Copy link
Contributor Author

FRosner commented Dec 9, 2025

I updated the docs in 894fa39. Let me know if you'd like me to add an example or if that's enough :)

@LebedevRI
Copy link
Collaborator

If I understand the docs correctly, MeasureProcessCPUTime affects only the way the number of required iterations is computed, right?

As the name suggests, it measures the Process CPU Time,
aka the time of all the threads that may have been created
in the function-under-benchmark.
I don't know how it's supposed to interact with ->Threads().

@dmah42
Copy link
Member

dmah42 commented Dec 10, 2025

If I understand the docs correctly, MeasureProcessCPUTime affects only the way the number of required iterations is computed, right?

As the name suggests, it measures the Process CPU Time, aka the time of all the threads that may have been created in the function-under-benchmark. I don't know how it's supposed to interact with ->Threads().

at this point i'm not sure either.

the docs say "// Measure the total CPU consumption, use it to decide for how long to
// run the benchmark loop. This will always measure to no less than the
// time spent by the main thread in single-threaded case."

the difference is (on Linux) between getrusage for Process time and using clock_gettime for Thread time (the default iirc).

i'm afraid i'll need to let you decide how this should correspond to Threads and timing outputs.

@LebedevRI LebedevRI merged commit 3e7dac6 into google:main Dec 10, 2025
135 of 136 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Rate counters are per-thread in multi-threaded benchmarks, kAvgThreadsRate does not make sense

3 participants