fix(callgrind): initialize stack on instrumentation start by not-matthias · Pull Request #15 · CodSpeedHQ/valgrind-codspeed

not-matthias · 2026-05-29T15:09:23Z

fix(callgrind): seed shadow call stack at instrumentation start
test(callgrind): add regression tests for runtime obj-skip

not-matthias · 2026-05-29T15:19:09Z

@greptile review

codspeed-hq · 2026-05-29T15:20:41Z

Merging this PR will degrade performance by 73.05%

⚡ 2 improved benchmarks
❌ 1 regressed benchmark
✅ 37 untouched benchmarks
⏩ 80 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`test_valgrind[valgrind-3.25.1, echo Hello, World!, full-with-inline]`	733.6 ms	51,937.6 ms	-98.59%
⚡	`test_valgrind[valgrind.codspeed, python3 testdata/test.py, full-with-inline]`	8.7 s	7 s	+25.01%
⚡	`test_valgrind[valgrind-3.25.1, python3 testdata/test.py, full-no-inline]`	7.5 s	6.8 s	+10.88%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing cod-2714-investigate-pytest-flamegraph-regression-final (ae9ee5c) with master (d2dd609)}

80 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Copilot

Pull request overview

Fixes a runtime obj-skip leak in Callgrind where, when CALLGRIND_START_INSTRUMENTATION is invoked several frames deep, subsequent returns underflow the (empty) shadow call stack and force-push the returned-into function — leaking functions from skipped objects as top-level fn= blocks. The fix seeds the shadow call stack from the native stack at the OFF→ON transition, and adds matching pop-time context restoration plus regression tests.

Changes:

Add CLG_(reconstruct_call_stack_from_native) that walks VG_(get_StackTrace) and seeds one jcc=0 call entry per native frame (with push_cxt for non-skipped frames); wire it into VG_USERREQ__START_INSTRUMENTATION. Add a new pop_call_stack branch to restore cxt/fn_sp for these seeded entries, and a new CLG_(get_fn_node_for_addr) helper plus exposed CLG_(new_recursion) / CLG_(insert_bbcc_into_hash).
Add two C-based regression tests (runtime_obj_skip_c, runtime_obj_skip_underflow) with companion shared libraries, vgtest configs, expected outputs, and Makefile/build wiring; extend filter_stderr to strip new diagnostic log lines.
Minor .gitignore additions and a new CLG_DEBUG trace in handleUnderflow.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
callgrind/main.c	Call new reconstruction helper from `START_INSTRUMENTATION`
callgrind/callstack.c	New stack reconstruction helper; restore cxt for seeded entries in `pop_call_stack`
callgrind/fn.c	New `CLG_(get_fn_node_for_addr)` resolving raw IPs to `fn_node`
callgrind/bbcc.c	Expose `new_recursion`/`insert_bbcc_into_hash` via `CLG_()`; add underflow debug log
callgrind/global.h	Declarations for new exported helpers
callgrind/tests/runtime_obj_skip_c.{c,vgtest,…}	Regression: first BB after START in a skipped object
callgrind/tests/runtime_obj_skip_c_lib.c	Skipped library that flips instrumentation on
callgrind/tests/runtime_obj_skip_underflow.{c,vgtest,…}	Regression: START deep in a recursive skipped lib
callgrind/tests/runtime_obj_skip_underflow_lib.c	Recursive skipped library to exercise underflow channel
callgrind/tests/Makefile.am	Build the new tests and shared libraries
callgrind/tests/filter_stderr	Drop new diagnostic log prefixes from stderr
.gitignore	Ignore new build artifacts and dump files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

greptile-apps · 2026-05-29T15:27:54Z

Greptile Summary

This PR seeds callgrind's shadow call stack from the native stack trace on CALLGRIND_START_INSTRUMENTATION, preventing stack underflows when instrumentation begins deep inside already-executing frames (e.g. a benchmark runner calling START several libpython frames deep). It also adds C-based regression tests to cover both the obj-skip leak and the underflow channel.

callstack.c: adds CLG_(reconstruct_call_stack_from_native) which walks the native stack bottom-up and pushes SP-only entries for skipped/JIT frames and full push_cxt entries for non-skipped frames, with pop_call_stack extended to restore cxt from these seeded entries.
bbcc.c: promotes new_recursion and insert_bbcc_into_hash from static to module-exported (CLG_-prefixed) symbols; adds debug logging to handleUnderflow.
callgrind/tests: two new C test programs (runtime_obj_skip_c, runtime_obj_skip_underflow) with shared-library shims, wired into Makefile.am with .so build rules, .vgtest files, and filter_stderr patterns.

Confidence Score: 5/5

The change is safe to merge. The core reconstruction logic is well-guarded, the ensure_stack_size/push_cxt ordering matches the established pattern, sentinel zeroing maintains push_cxt's invariant, and the new regression tests cover both the direct and underflow-channel leaks.

All changed paths are additive (new function, new else-if branch, new test programs). The seeding runs only once per START event and is guarded by cs->sp != 0. The obj-skip mirroring exactly matches bbcc.c's comparison logic. No existing paths are altered in a breaking way.

callgrind/callstack.c and callgrind/global.h are the most critical — the pop_call_stack else-if branch and the public API exposure in global.h are the areas worth a final read-through.

Important Files Changed

Filename	Overview
callgrind/callstack.c	Core fix: adds CLG_(reconstruct_call_stack_from_native) and extends pop_call_stack with an else-if branch for seeded entries. Logic is well-commented and ordering of ensure_stack_size before push_cxt matches the established pattern.
callgrind/bbcc.c	Promotes new_recursion and insert_bbcc_into_hash from static to CLG_-prefixed public symbols, and adds debug logging to handleUnderflow. Both promoted functions are only called within bbcc.c — no new external consumers.
callgrind/fn.c	Adds CLG_(get_fn_node_for_addr) to resolve a raw IP to a fn_node without a BB, with correct fallback for anonymous JIT frames using a stack-local buffer.
callgrind/main.c	Calls CLG_(reconstruct_call_stack_from_native)(tid) immediately after set_instrument_state on START_INSTRUMENTATION. Ordering is correct — reconstruction happens before the first instrumented BB is executed.
callgrind/global.h	Adds declarations for CLG_(new_recursion), CLG_(insert_bbcc_into_hash), CLG_(get_fn_node_for_addr), and CLG_(reconstruct_call_stack_from_native). The first two have no callers outside bbcc.c.
callgrind/tests/Makefile.am	Wires up runtime_obj_skip_c and runtime_obj_skip_underflow into check_PROGRAMS and check_DATA with shared-lib build rules and correct LDADD/LDFLAGS/DEPENDENCIES for each.
callgrind/tests/runtime_obj_skip_c.vgtest	Post-check correctly guards with test -f before grepping, avoiding the false-pass on missing output file noted in the previous review.
callgrind/tests/runtime_obj_skip_underflow.vgtest	Same structure as runtime_obj_skip_c.vgtest with file-existence guard in the post-check. Correctly tests the underflow-channel leak path.
callgrind/tests/filter_stderr	Extended with a regex to suppress new diagnostic log lines emitted by the verbose obj-skip / cxt / underflow tracing paths.

Sequence Diagram

sequenceDiagram
    participant Client as Client Thread
    participant Valgrind as Valgrind/CLG
    participant CallStack as Shadow Call Stack
    participant FnStack as Fn/Ctx Stack

    Client->>Valgrind: CALLGRIND_START_INSTRUMENTATION
    Valgrind->>Valgrind: CLG_(set_instrument_state)(True)
    Valgrind->>Valgrind: CLG_(reconstruct_call_stack_from_native)(tid)
    Valgrind->>Valgrind: VG_(get_StackTrace)(tid, ips[], sps[])
    loop for each frame (bottom-up: oldest to newest)
        Valgrind->>Valgrind: CLG_(get_fn_node_for_addr)(ips[frame])
        Valgrind->>Valgrind: latch obj-skip flag if not checked
        Valgrind->>CallStack: "ensure_stack_size(cs->sp + 1)"
        alt frame is NOT skipped
            Valgrind->>FnStack: CLG_(push_cxt)(fn) saves old cxt, pushes fn, updates cxt
        end
        Valgrind->>CallStack: "write ce->jcc=0, ce->sp=sps[frame+1], ce->ret_addr"
        Valgrind->>CallStack: "cs->sp++, sentinel entry[sp].cxt = 0"
    end
    Valgrind-->>Client: return (instrumentation active)
    Client->>Client: execute benchmark code (instrumented)
    loop for each RET unwinding a seeded frame
        Valgrind->>CallStack: CLG_(pop_call_stack)()
        alt "jcc != 0"
            Valgrind->>FnStack: restore cxt + fn_stack (normal path)
        else "lower_entry->cxt != 0 (seeded non-skipped frame)"
            Valgrind->>FnStack: restore cxt + fn_stack from saved values
        else "cxt == 0 (seeded skipped/JIT frame)"
            Note over Valgrind,FnStack: no cxt change needed
        end
    end
    Client->>Valgrind: CALLGRIND_STOP_INSTRUMENTATION
    Valgrind->>Valgrind: CLG_(set_instrument_state)(False)

_{Reviews (11): Last reviewed commit: "test(callgrind): add regression tests fo..." | Re-trigger Greptile}

greptile-apps · 2026-05-29T15:28:09Z

Greptile Summary

This PR fixes callgrind's shadow call stack initialization when CALLGRIND_START_INSTRUMENTATION is invoked from deep inside a native call stack. Without seeding, each ret past an unseen frame trips handleUnderflow and leaks skipped-object functions as top-level fn= blocks in the output.

callstack.c: Adds CLG_(reconstruct_call_stack_from_native) that walks VG_(get_StackTrace) bottom-up and pushes a jcc=0 call entry per frame; non-skipped frames also get a push_cxt call so the context chain is seeded. A matching else if (lower_entry->cxt != 0) branch in pop_call_stack restores context correctly when these synthetic entries are unwound.
fn.c / bbcc.c: Adds CLG_(get_fn_node_for_addr) for IP-to-fn_node resolution without a BB, and exports new_recursion/insert_bbcc_into_hash so they are accessible across translation units.
Tests: Adds two C-level regression tests (runtime_obj_skip_c, runtime_obj_skip_underflow) that build a separate shared library, register it for obj-skip, start instrumentation from inside the skipped lib, and assert no fn=skipme_* blocks leak into the dump.

Confidence Score: 4/5

Safe to merge; the core fix is well-reasoned and the two regression tests directly cover the reported leak channels.

The seeded-entry logic in pop_call_stack correctly distinguishes normal skip entries (cxt=0, no restore) from seeded non-skipped entries (cxt≠0, restore via new branch) and avoids interfering with the normal jcc path. The reconstruction loop ordering (push_cxt before ensure_stack_size) is safe under current constants but creates a silent coupling between CLG_RECON_MAX_FRAMES and N_CALL_STACK_INITIAL_ENTRIES. The ce->nonskipped field is left uninitialized in seeded entries and the static buffer in get_fn_node_for_addr relies on an implicit strdup contract — both minor, non-blocking concerns. Test post-scripts give a false-OK if the callgrind output file is absent.

callgrind/callstack.c (the new reconstruct_call_stack_from_native function and the pop_call_stack else-if branch) and callgrind/fn.c (the static buffer in CLG_(get_fn_node_for_addr)).

Important Files Changed

Filename	Overview
callgrind/callstack.c	Core change: adds `reconstruct_call_stack_from_native` to seed the shadow call stack on instrumentation start, and extends `pop_call_stack` with a new `else if (cxt != 0)` branch to restore context when seeded entries unwind. Logic is sound but has two latent issues: `ce->nonskipped` is not explicitly zeroed, and `push_cxt` is called before `ensure_stack_size`.
callgrind/fn.c	Adds `CLG_(get_fn_node_for_addr)` to resolve raw IPs to `fn_node`s for use by the stack reconstruction path. Logic is correct; the static buffer for anonymous-address formatting is safe under the current strdup contract but fragile by design.
callgrind/bbcc.c	Promotes `new_recursion` and `insert_bbcc_into_hash` from `static` to exported `CLG_`-prefixed functions; adds a debug-log line in `handleUnderflow`. All call sites updated consistently.
callgrind/main.c	Calls `CLG_(reconstruct_call_stack_from_native)(tid)` immediately after `CLG_(set_instrument_state)` on `VG_USERREQ__START_INSTRUMENTATION`. Ordering is correct and the `cs->sp != 0` guard prevents double-seeding.
callgrind/global.h	Declares the new exported symbols. `CLG_(new_recursion)` and `CLG_(insert_bbcc_into_hash)` are only used within `bbcc.c` in this PR; their export appears forward-looking.
callgrind/tests/Makefile.am	Adds build rules for two new shared libraries and their test binaries; links with `-l:*.so -ldl` and `RPATH=$ORIGIN` so tests find the libs at runtime without install.
callgrind/tests/runtime_obj_skip_c.vgtest	Regression test for the obj-skip leak via the `cxt==0` push path. Post-check grep can produce a false-OK if the callgrind output file is missing.
callgrind/tests/runtime_obj_skip_underflow.vgtest	Regression test for the underflow-channel obj-skip leak. Same false-OK risk as the sister test when the output file is absent.

Sequence Diagram

sequenceDiagram
    participant Client as Client code
    participant VG as Valgrind core
    participant CLG as Callgrind

    Client->>VG: CALLGRIND_START_INSTRUMENTATION
    VG->>CLG: handle_client_request(START)
    CLG->>CLG: set_instrument_state(ON)
    CLG->>VG: VG_(get_StackTrace)(tid, ips[], sps[])
    VG-->>CLG: n frames [bottom to top]
    loop "frame = n-1 downto 0 (oldest first)"
        CLG->>CLG: get_fn_node_for_addr(ips[frame])
        CLG->>CLG: "check obj_skip list -> fn->skip"
        alt fn not skipped
            CLG->>CLG: "push_cxt(fn) -> seed cxt chain"
        end
        CLG->>CLG: "push call_entry(jcc=0, sp=sps[frame+1])"
    end
    Note over CLG: shadow call stack seeded
    Client->>Client: execute benchmark (instrumented)
    loop each RET in instrumented code
        CLG->>CLG: unwind_call_stack(sp)
        CLG->>CLG: pop_call_stack()
        alt "jcc != 0"
            CLG->>CLG: restore cxt + fn_sp (normal path)
        else "cxt != 0 (seeded entry)"
            CLG->>CLG: restore cxt + fn_sp (new path)
        else "cxt == 0 (skipped entry)"
            CLG->>CLG: no cxt restore
        end
    end

_{Reviews (2): Last reviewed commit: "test(callgrind): add regression tests fo..." | Re-trigger Greptile}

art049

lgtm! we need to carefully test this before putting it ga though

greptile-apps · 2026-06-02T12:54:47Z

Want your agent to iterate on Greptile's feedback? Try greploops.

not-matthias · 2026-06-02T12:55:37Z

@codspeedbot investigate the regresions / flakyness

codspeed-hq · 2026-06-02T12:55:42Z

@codspeedbot investigate the regresions / flakyness

Regression Investigation

I analyzed the 4 benchmarks flagged by CodSpeed (2 regressions, 2 improvements) by comparing values across multiple runs on this branch and examining the flamegraphs. These are walltime flakiness, not regressions caused by the code changes in this PR.

Evidence: Cross-run variance on the same branch

test_valgrind[valgrind-3.25.1, echo Hello, World!, full-with-inline] (reported as -97.54%):

Run	Commit	Value
`6a19b668` (May 29)	`21ab8bb`	748.9 ms
`6a1ecc6f` (Jun 2)	`641c4d2`	71.6 s
`6a1ed177` (Jun 2)	`d324ccf`	29.8 s
BASE (master)	`d2dd609`	733.6 ms

This benchmark swings from ~750ms to 30–72s on the same branch code, confirming it's environmental noise rather than a code regression.

test_valgrind[valgrind.codspeed, python3 testdata/test.py, full-no-inline] (reported as -70.8%):

Run	Commit	Value
`6a1ecc6f` (Jun 2)	`641c4d2`	6.7 s
`6a1ed177` (Jun 2)	`d324ccf`	24.6 s
BASE (master)	`d2dd609`	7.2 s

Same pattern — the previous branch run measured 6.7s (in line with the 7.2s base), then it jumped to 24.6s on the very next run.

The "improvements" are also flaky

valgrind-3.26.0, python3 test.py, full-no-inline (+65.8%): The base value of 50.3s was itself an outlier — earlier runs on both branches measured ~5.5–6.7s. The HEAD value of 30.3s is also elevated.
valgrind.codspeed, python3 test.py, full-with-inline (+21%): 8.7s → 7.2s, within normal variance.

Flamegraph analysis confirms flakiness

When the echo Hello, World! benchmark "regresses" (29.8s vs 733ms), the flamegraph shows vgCallgrind_setup_bbcc dominating at 9.4s self time (31%) with only 2 calls, and cachesim_setref at 5.1s (17%). In the normal (733ms) base run, neither function even appears in the top 10 — the time is spent in expected startup work (debug info loading, JIT compilation). This pattern is consistent with system-level interference (memory pressure, cache effects, scheduling contention) inflating walltime unpredictably during valgrind's instrumented execution.

Conclusion

All 4 flagged changes (both regressions and improvements) are walltime noise affecting the full-with-inline and full-no-inline configurations. The PR's code changes do not cause any consistent performance degradation. These benchmarks can be safely acknowledged on CodSpeed.

Summary

Warming up the performance wizard...
Investigating regressions — Analysis complete — regressions are walltime flakiness, not code-related
Handling comment — Comment handling finished successfully

_{Was this response helpful? React with 👍 or 👎 to help us improve CodSpeedBot.}

not-matthias · 2026-06-02T13:00:36Z

@codspeedbot investigate why they are so flaky

codspeed-hq · 2026-06-02T13:00:41Z

@codspeedbot investigate why they are so flaky

Why These Benchmarks Are So Flaky

I did a deep analysis across 5 runs (3 on this branch, 2 on master), comparing flamegraphs for both normal and outlier measurements. The flakiness is structural — it stems from the nature of what's being benchmarked and how it's measured.

The Smoking Gun: Flamegraph Comparison

Normal run (echo Hello, World!, full-with-inline @ 714ms on master):

Time dominated by startup: debug info loading (50%), JIT compilation (25%)
vgCallgrind_setup_bbcc doesn't even appear in the top 10

Outlier run (same benchmark @ 71.6s on this branch — 100× slower):

vgCallgrind_setup_bbcc: 20.6s self time (28.9%) with only 2 calls
cachesim_setref: 6.5s with 6 calls
vgPlain_scheduler self time: 3.7s (vs ~2ms normally)

The same 2 calls to setup_bbcc that normally take microseconds are taking 10 seconds each. This is not a code regression — it's system-level interference inflating walltime for functions that happen to be running when the disturbance hits.

This same pattern appears on master too: valgrind-3.26.0, python3 test.py, full-no-inline measured 50.3s on the latest master run vs its normal ~5.5s, with the identical flamegraph signature (setup_bbcc at 13.1s/26% with 2 calls).

Root Causes

1. Profiler-on-profiler stacking
The benchmarks run samply (walltime profiler) → valgrind --tool=callgrind (instrumentation profiler) → guest program. Samply's signal-based sampling interrupts valgrind's tight scheduler dispatch loop (vgPlain_scheduler). In outlier runs, vgPlain_scheduler accumulates 3.7–3.9s of self time (7–16%), vs negligible in normal runs — this is time spent handling profiler signals between JIT block dispatches.

2. Single-shot measurement with no warmup or repetition
Each benchmark is one valgrind --tool=callgrind ... invocation, measured once. There's no warmup run and no averaging across iterations. A single transient system disturbance during the run produces a permanent outlier with no way to filter it out.

3. Cache simulation amplifies system noise exponentially
The full-* configs enable --cache-sim=yes, which simulates L1/L2/LL caches. The simulation's own working set (cache state arrays, cost counters) must stay in the physical CPU cache to perform well. When profiler signals or system noise evict this hot data, every subsequent cache simulation access becomes a cache miss — creating a cascading slowdown. This explains why cachesim_setref (6 calls) takes 6.5s in outlier runs vs being invisible normally.

4. Short benchmarks amplify relative noise
echo Hello, World! normally takes ~700ms. If a system hiccup adds 29s of overhead, that shows as a -97% regression. The same 29s added to a 300s benchmark would only be -10%. The shortest benchmarks will always be the noisiest in relative terms.

5. Transient system-level interference
Even on dedicated codspeed-macro runners, transient memory pressure, hypervisor-level scheduling, or NUMA effects can cause dramatic walltime inflation for memory-intensive workloads. Valgrind's JIT cache + cache simulation data structures make it one of the most memory-sensitive workloads possible.

Possible Mitigations

Mitigation	Impact	Effort
Add iteration/repetition — run each benchmark 3–5× and take the median	High — filters transient outliers	Medium — requires changes to the exec harness or a wrapper script
Add a warmup run — discard the first iteration	Medium — avoids cold-start outliers	Low
Drop the shortest benchmarks — `echo Hello, World!` is inherently too short for stable walltime measurement under valgrind	Medium — eliminates the worst offenders	Low
Separate profiler configs from non-profiler configs — run `full-*` configs without the samply profiler, or accept higher variance for them	Medium — reduces profiler-on-profiler interference	Low
Increase timeout and add outlier detection — flag runs where any benchmark is >5× its historical median	Low (detection only) — doesn't prevent flakiness but avoids false regression reports	Low

Summary

Warming up the performance wizard...
Investigating benchmark flakiness — Root cause analysis complete — identified 5 structural causes of walltime flakiness
Submitting findings — Detailed flakiness analysis submitted as PR comment
Handling comment — Comment handling finished successfully

_{Was this response helpful? React with 👍 or 👎 to help us improve CodSpeedBot.}

CodSpeed runs benchmarks with --instr-atstart=no and fires CALLGRIND_START_INSTRUMENTATION several frames deep (inside libpython, or behind a V8/JIT trampoline). The shadow stack starts at 0 while real frames exist, so every later return underflows and the inclusive cost collapses onto a phantom root. - Reconstruct the shadow stack from the native stack at the OFF->ON transition (CLG_(reconstruct_call_stack_from_native)), seeding each frame's entry SP so it pops correctly instead of underflowing. - Name anonymous JIT frames by address in get_fn_node_for_addr (mirroring the BB path) instead of "???", so the seeded root frame (e.g. __codspeed_root_frame__) is preserved and stays backend-symbolicatable via perf-<pid>.map. - Add --obj-skip and CALLGRIND_ADD_OBJ_SKIP to exclude whole objects (the node binary, libpython) from the call graph.

- runtime_obj_skip_c: a fn from a skipped object must not leak into the dump as a top-level fn= block when it is the first BB after START (the cxt==0 force-push path) - runtime_obj_skip_underflow: a RET past an empty call stack (handleUnderflow) must not re-leak the skipped fn -- the Python 3.14 deep recursive interpreter-dispatch shape Both exercise --obj-skip / CALLGRIND_ADD_OBJ_SKIP from a separately-linked .so. Also filters callgrind diagnostic logs from test stderr and gitignores the test build artifacts (binaries, .so, logs).

not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch 2 times, most recently from 35de9c7 to 651fea8 Compare May 29, 2026 15:15

not-matthias marked this pull request as ready for review May 29, 2026 15:17

not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from 651fea8 to c6fcb3b Compare May 29, 2026 15:17

not-matthias requested a review from Copilot May 29, 2026 15:18

Copilot started reviewing on behalf of not-matthias May 29, 2026 15:18 View session

not-matthias requested review from GuillaumeLagrange and art049 May 29, 2026 15:20

Copilot AI reviewed May 29, 2026

View reviewed changes

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

Comment thread callgrind/callstack.c Outdated

Comment thread callgrind/callstack.c Outdated

Comment thread callgrind/fn.c Outdated

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

Comment thread callgrind/callstack.c Outdated

Comment thread callgrind/callstack.c

Comment thread callgrind/fn.c

Comment thread callgrind/tests/runtime_obj_skip_c.vgtest Outdated

art049 approved these changes Jun 2, 2026

View reviewed changes

Comment thread callgrind/bbcc.c

not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from 21ab8bb to 641c4d2 Compare June 2, 2026 12:22

not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from d324ccf to 310ac3d Compare June 2, 2026 13:02

not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch 2 times, most recently from 6c726b7 to ae9ee5c Compare June 2, 2026 16:23

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread callgrind/tests/Makefile.am

not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from 14832cf to ae9ee5c Compare June 3, 2026 09:39

not-matthias merged commit ae9ee5c into master Jun 3, 2026
18 of 22 checks passed

not-matthias deleted the cod-2714-investigate-pytest-flamegraph-regression-final branch June 3, 2026 11:03

Conversation

not-matthias commented May 29, 2026

Uh oh!

not-matthias commented May 29, 2026

Uh oh!

codspeed-hq Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 73.05%

Performance Changes

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

greptile-apps Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 29, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

art049 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Uh oh!

not-matthias commented Jun 2, 2026

Uh oh!

codspeed-hq Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Investigation

Evidence: Cross-run variance on the same branch

The "improvements" are also flaky

Flamegraph analysis confirms flakiness

Conclusion

Uh oh!

not-matthias commented Jun 2, 2026

Uh oh!

codspeed-hq Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why These Benchmarks Are So Flaky

The Smoking Gun: Flamegraph Comparison

Root Causes

Possible Mitigations

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq Bot commented May 29, 2026 •

edited

Loading

greptile-apps Bot commented May 29, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 2, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 2, 2026 •

edited

Loading