fix(callgrind): initialize stack on instrumentation start#15
Conversation
not-matthias
commented
May 29, 2026
- fix(callgrind): seed shadow call stack at instrumentation start
- test(callgrind): add regression tests for runtime obj-skip
35de9c7 to
651fea8
Compare
651fea8 to
c6fcb3b
Compare
|
@greptile review |
Merging this PR will degrade performance by 73.05%
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance Changes
Tip Investigate this regression by commenting Comparing Footnotes
|
There was a problem hiding this comment.
Pull request overview
Fixes a runtime obj-skip leak in Callgrind where, when CALLGRIND_START_INSTRUMENTATION is invoked several frames deep, subsequent returns underflow the (empty) shadow call stack and force-push the returned-into function — leaking functions from skipped objects as top-level fn= blocks. The fix seeds the shadow call stack from the native stack at the OFF→ON transition, and adds matching pop-time context restoration plus regression tests.
Changes:
- Add
CLG_(reconstruct_call_stack_from_native)that walksVG_(get_StackTrace)and seeds onejcc=0call entry per native frame (withpush_cxtfor non-skipped frames); wire it intoVG_USERREQ__START_INSTRUMENTATION. Add a newpop_call_stackbranch to restorecxt/fn_spfor these seeded entries, and a newCLG_(get_fn_node_for_addr)helper plus exposedCLG_(new_recursion)/CLG_(insert_bbcc_into_hash). - Add two C-based regression tests (
runtime_obj_skip_c,runtime_obj_skip_underflow) with companion shared libraries, vgtest configs, expected outputs, and Makefile/build wiring; extendfilter_stderrto strip new diagnostic log lines. - Minor
.gitignoreadditions and a newCLG_DEBUGtrace inhandleUnderflow.
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| callgrind/main.c | Call new reconstruction helper from START_INSTRUMENTATION |
| callgrind/callstack.c | New stack reconstruction helper; restore cxt for seeded entries in pop_call_stack |
| callgrind/fn.c | New CLG_(get_fn_node_for_addr) resolving raw IPs to fn_node |
| callgrind/bbcc.c | Expose new_recursion/insert_bbcc_into_hash via CLG_(); add underflow debug log |
| callgrind/global.h | Declarations for new exported helpers |
| callgrind/tests/runtime_obj_skip_c.{c,vgtest,…} | Regression: first BB after START in a skipped object |
| callgrind/tests/runtime_obj_skip_c_lib.c | Skipped library that flips instrumentation on |
| callgrind/tests/runtime_obj_skip_underflow.{c,vgtest,…} | Regression: START deep in a recursive skipped lib |
| callgrind/tests/runtime_obj_skip_underflow_lib.c | Recursive skipped library to exercise underflow channel |
| callgrind/tests/Makefile.am | Build the new tests and shared libraries |
| callgrind/tests/filter_stderr | Drop new diagnostic log prefixes from stderr |
| .gitignore | Ignore new build artifacts and dump files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR seeds callgrind's shadow call stack from the native stack trace on
Confidence Score: 5/5The change is safe to merge. The core reconstruction logic is well-guarded, the ensure_stack_size/push_cxt ordering matches the established pattern, sentinel zeroing maintains push_cxt's invariant, and the new regression tests cover both the direct and underflow-channel leaks. All changed paths are additive (new function, new else-if branch, new test programs). The seeding runs only once per START event and is guarded by cs->sp != 0. The obj-skip mirroring exactly matches bbcc.c's comparison logic. No existing paths are altered in a breaking way. callgrind/callstack.c and callgrind/global.h are the most critical — the pop_call_stack else-if branch and the public API exposure in global.h are the areas worth a final read-through. Important Files Changed
|
Greptile SummaryThis PR fixes callgrind's shadow call stack initialization when
Confidence Score: 4/5Safe to merge; the core fix is well-reasoned and the two regression tests directly cover the reported leak channels. The seeded-entry logic in
Important Files Changed
|
art049
left a comment
There was a problem hiding this comment.
lgtm! we need to carefully test this before putting it ga though
21ab8bb to
641c4d2
Compare
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
|
@codspeedbot investigate the regresions / flakyness |
Regression InvestigationI analyzed the 4 benchmarks flagged by CodSpeed (2 regressions, 2 improvements) by comparing values across multiple runs on this branch and examining the flamegraphs. These are walltime flakiness, not regressions caused by the code changes in this PR. Evidence: Cross-run variance on the same branch
This benchmark swings from ~750ms to 30–72s on the same branch code, confirming it's environmental noise rather than a code regression.
Same pattern — the previous branch run measured 6.7s (in line with the 7.2s base), then it jumped to 24.6s on the very next run. The "improvements" are also flaky
Flamegraph analysis confirms flakinessWhen the ConclusionAll 4 flagged changes (both regressions and improvements) are walltime noise affecting the Summary
Was this response helpful? React with 👍 or 👎 to help us improve CodSpeedBot. |
|
@codspeedbot investigate why they are so flaky |
Why These Benchmarks Are So FlakyI did a deep analysis across 5 runs (3 on this branch, 2 on master), comparing flamegraphs for both normal and outlier measurements. The flakiness is structural — it stems from the nature of what's being benchmarked and how it's measured. The Smoking Gun: Flamegraph ComparisonNormal run (
Outlier run (same benchmark @ 71.6s on this branch — 100× slower):
The same 2 calls to This same pattern appears on master too: Root Causes1. Profiler-on-profiler stacking 2. Single-shot measurement with no warmup or repetition 3. Cache simulation amplifies system noise exponentially 4. Short benchmarks amplify relative noise 5. Transient system-level interference Possible Mitigations
Summary
Was this response helpful? React with 👍 or 👎 to help us improve CodSpeedBot. |
CodSpeed runs benchmarks with --instr-atstart=no and fires CALLGRIND_START_INSTRUMENTATION several frames deep (inside libpython, or behind a V8/JIT trampoline). The shadow stack starts at 0 while real frames exist, so every later return underflows and the inclusive cost collapses onto a phantom root. - Reconstruct the shadow stack from the native stack at the OFF->ON transition (CLG_(reconstruct_call_stack_from_native)), seeding each frame's entry SP so it pops correctly instead of underflowing. - Name anonymous JIT frames by address in get_fn_node_for_addr (mirroring the BB path) instead of "???", so the seeded root frame (e.g. __codspeed_root_frame__) is preserved and stays backend-symbolicatable via perf-<pid>.map. - Add --obj-skip and CALLGRIND_ADD_OBJ_SKIP to exclude whole objects (the node binary, libpython) from the call graph.
d324ccf to
310ac3d
Compare
- runtime_obj_skip_c: a fn from a skipped object must not leak into the dump as a top-level fn= block when it is the first BB after START (the cxt==0 force-push path) - runtime_obj_skip_underflow: a RET past an empty call stack (handleUnderflow) must not re-leak the skipped fn -- the Python 3.14 deep recursive interpreter-dispatch shape Both exercise --obj-skip / CALLGRIND_ADD_OBJ_SKIP from a separately-linked .so. Also filters callgrind diagnostic logs from test stderr and gitignores the test build artifacts (binaries, .so, logs).
6c726b7 to
ae9ee5c
Compare
14832cf to
ae9ee5c
Compare