Skip to content

fix(callgrind): initialize stack on instrumentation start#15

Merged
not-matthias merged 2 commits into
masterfrom
cod-2714-investigate-pytest-flamegraph-regression-final
Jun 3, 2026
Merged

fix(callgrind): initialize stack on instrumentation start#15
not-matthias merged 2 commits into
masterfrom
cod-2714-investigate-pytest-flamegraph-regression-final

Conversation

@not-matthias
Copy link
Copy Markdown
Member

  • fix(callgrind): seed shadow call stack at instrumentation start
  • test(callgrind): add regression tests for runtime obj-skip

@not-matthias not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch 2 times, most recently from 35de9c7 to 651fea8 Compare May 29, 2026 15:15
@not-matthias not-matthias marked this pull request as ready for review May 29, 2026 15:17
@not-matthias not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from 651fea8 to c6fcb3b Compare May 29, 2026 15:17
@not-matthias not-matthias requested a review from Copilot May 29, 2026 15:18
@not-matthias
Copy link
Copy Markdown
Member Author

@greptile review

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 29, 2026

Merging this PR will degrade performance by 73.05%

⚡ 2 improved benchmarks
❌ 1 regressed benchmark
✅ 37 untouched benchmarks
⏩ 80 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_valgrind[valgrind-3.25.1, echo Hello, World!, full-with-inline] 733.6 ms 51,937.6 ms -98.59%
test_valgrind[valgrind.codspeed, python3 testdata/test.py, full-with-inline] 8.7 s 7 s +25.01%
test_valgrind[valgrind-3.25.1, python3 testdata/test.py, full-no-inline] 7.5 s 6.8 s +10.88%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing cod-2714-investigate-pytest-flamegraph-regression-final (ae9ee5c) with master (d2dd609)

Open in CodSpeed

Footnotes

  1. 80 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a runtime obj-skip leak in Callgrind where, when CALLGRIND_START_INSTRUMENTATION is invoked several frames deep, subsequent returns underflow the (empty) shadow call stack and force-push the returned-into function — leaking functions from skipped objects as top-level fn= blocks. The fix seeds the shadow call stack from the native stack at the OFF→ON transition, and adds matching pop-time context restoration plus regression tests.

Changes:

  • Add CLG_(reconstruct_call_stack_from_native) that walks VG_(get_StackTrace) and seeds one jcc=0 call entry per native frame (with push_cxt for non-skipped frames); wire it into VG_USERREQ__START_INSTRUMENTATION. Add a new pop_call_stack branch to restore cxt/fn_sp for these seeded entries, and a new CLG_(get_fn_node_for_addr) helper plus exposed CLG_(new_recursion) / CLG_(insert_bbcc_into_hash).
  • Add two C-based regression tests (runtime_obj_skip_c, runtime_obj_skip_underflow) with companion shared libraries, vgtest configs, expected outputs, and Makefile/build wiring; extend filter_stderr to strip new diagnostic log lines.
  • Minor .gitignore additions and a new CLG_DEBUG trace in handleUnderflow.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
callgrind/main.c Call new reconstruction helper from START_INSTRUMENTATION
callgrind/callstack.c New stack reconstruction helper; restore cxt for seeded entries in pop_call_stack
callgrind/fn.c New CLG_(get_fn_node_for_addr) resolving raw IPs to fn_node
callgrind/bbcc.c Expose new_recursion/insert_bbcc_into_hash via CLG_(); add underflow debug log
callgrind/global.h Declarations for new exported helpers
callgrind/tests/runtime_obj_skip_c.{c,vgtest,…} Regression: first BB after START in a skipped object
callgrind/tests/runtime_obj_skip_c_lib.c Skipped library that flips instrumentation on
callgrind/tests/runtime_obj_skip_underflow.{c,vgtest,…} Regression: START deep in a recursive skipped lib
callgrind/tests/runtime_obj_skip_underflow_lib.c Recursive skipped library to exercise underflow channel
callgrind/tests/Makefile.am Build the new tests and shared libraries
callgrind/tests/filter_stderr Drop new diagnostic log prefixes from stderr
.gitignore Ignore new build artifacts and dump files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR seeds callgrind's shadow call stack from the native stack trace on CALLGRIND_START_INSTRUMENTATION, preventing stack underflows when instrumentation begins deep inside already-executing frames (e.g. a benchmark runner calling START several libpython frames deep). It also adds C-based regression tests to cover both the obj-skip leak and the underflow channel.

  • callstack.c: adds CLG_(reconstruct_call_stack_from_native) which walks the native stack bottom-up and pushes SP-only entries for skipped/JIT frames and full push_cxt entries for non-skipped frames, with pop_call_stack extended to restore cxt from these seeded entries.
  • bbcc.c: promotes new_recursion and insert_bbcc_into_hash from static to module-exported (CLG_-prefixed) symbols; adds debug logging to handleUnderflow.
  • callgrind/tests: two new C test programs (runtime_obj_skip_c, runtime_obj_skip_underflow) with shared-library shims, wired into Makefile.am with .so build rules, .vgtest files, and filter_stderr patterns.

Confidence Score: 5/5

The change is safe to merge. The core reconstruction logic is well-guarded, the ensure_stack_size/push_cxt ordering matches the established pattern, sentinel zeroing maintains push_cxt's invariant, and the new regression tests cover both the direct and underflow-channel leaks.

All changed paths are additive (new function, new else-if branch, new test programs). The seeding runs only once per START event and is guarded by cs->sp != 0. The obj-skip mirroring exactly matches bbcc.c's comparison logic. No existing paths are altered in a breaking way.

callgrind/callstack.c and callgrind/global.h are the most critical — the pop_call_stack else-if branch and the public API exposure in global.h are the areas worth a final read-through.

Important Files Changed

Filename Overview
callgrind/callstack.c Core fix: adds CLG_(reconstruct_call_stack_from_native) and extends pop_call_stack with an else-if branch for seeded entries. Logic is well-commented and ordering of ensure_stack_size before push_cxt matches the established pattern.
callgrind/bbcc.c Promotes new_recursion and insert_bbcc_into_hash from static to CLG_-prefixed public symbols, and adds debug logging to handleUnderflow. Both promoted functions are only called within bbcc.c — no new external consumers.
callgrind/fn.c Adds CLG_(get_fn_node_for_addr) to resolve a raw IP to a fn_node without a BB, with correct fallback for anonymous JIT frames using a stack-local buffer.
callgrind/main.c Calls CLG_(reconstruct_call_stack_from_native)(tid) immediately after set_instrument_state on START_INSTRUMENTATION. Ordering is correct — reconstruction happens before the first instrumented BB is executed.
callgrind/global.h Adds declarations for CLG_(new_recursion), CLG_(insert_bbcc_into_hash), CLG_(get_fn_node_for_addr), and CLG_(reconstruct_call_stack_from_native). The first two have no callers outside bbcc.c.
callgrind/tests/Makefile.am Wires up runtime_obj_skip_c and runtime_obj_skip_underflow into check_PROGRAMS and check_DATA with shared-lib build rules and correct LDADD/LDFLAGS/DEPENDENCIES for each.
callgrind/tests/runtime_obj_skip_c.vgtest Post-check correctly guards with test -f before grepping, avoiding the false-pass on missing output file noted in the previous review.
callgrind/tests/runtime_obj_skip_underflow.vgtest Same structure as runtime_obj_skip_c.vgtest with file-existence guard in the post-check. Correctly tests the underflow-channel leak path.
callgrind/tests/filter_stderr Extended with a regex to suppress new diagnostic log lines emitted by the verbose obj-skip / cxt / underflow tracing paths.

Sequence Diagram

sequenceDiagram
    participant Client as Client Thread
    participant Valgrind as Valgrind/CLG
    participant CallStack as Shadow Call Stack
    participant FnStack as Fn/Ctx Stack

    Client->>Valgrind: CALLGRIND_START_INSTRUMENTATION
    Valgrind->>Valgrind: CLG_(set_instrument_state)(True)
    Valgrind->>Valgrind: CLG_(reconstruct_call_stack_from_native)(tid)
    Valgrind->>Valgrind: VG_(get_StackTrace)(tid, ips[], sps[])
    loop for each frame (bottom-up: oldest to newest)
        Valgrind->>Valgrind: CLG_(get_fn_node_for_addr)(ips[frame])
        Valgrind->>Valgrind: latch obj-skip flag if not checked
        Valgrind->>CallStack: "ensure_stack_size(cs->sp + 1)"
        alt frame is NOT skipped
            Valgrind->>FnStack: CLG_(push_cxt)(fn) saves old cxt, pushes fn, updates cxt
        end
        Valgrind->>CallStack: "write ce->jcc=0, ce->sp=sps[frame+1], ce->ret_addr"
        Valgrind->>CallStack: "cs->sp++, sentinel entry[sp].cxt = 0"
    end
    Valgrind-->>Client: return (instrumentation active)
    Client->>Client: execute benchmark code (instrumented)
    loop for each RET unwinding a seeded frame
        Valgrind->>CallStack: CLG_(pop_call_stack)()
        alt "jcc != 0"
            Valgrind->>FnStack: restore cxt + fn_stack (normal path)
        else "lower_entry->cxt != 0 (seeded non-skipped frame)"
            Valgrind->>FnStack: restore cxt + fn_stack from saved values
        else "cxt == 0 (seeded skipped/JIT frame)"
            Note over Valgrind,FnStack: no cxt change needed
        end
    end
    Client->>Valgrind: CALLGRIND_STOP_INSTRUMENTATION
    Valgrind->>Valgrind: CLG_(set_instrument_state)(False)
Loading

Reviews (11): Last reviewed commit: "test(callgrind): add regression tests fo..." | Re-trigger Greptile

Comment thread callgrind/callstack.c Outdated
Comment thread callgrind/callstack.c Outdated
Comment thread callgrind/fn.c Outdated
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR fixes callgrind's shadow call stack initialization when CALLGRIND_START_INSTRUMENTATION is invoked from deep inside a native call stack. Without seeding, each ret past an unseen frame trips handleUnderflow and leaks skipped-object functions as top-level fn= blocks in the output.

  • callstack.c: Adds CLG_(reconstruct_call_stack_from_native) that walks VG_(get_StackTrace) bottom-up and pushes a jcc=0 call entry per frame; non-skipped frames also get a push_cxt call so the context chain is seeded. A matching else if (lower_entry->cxt != 0) branch in pop_call_stack restores context correctly when these synthetic entries are unwound.
  • fn.c / bbcc.c: Adds CLG_(get_fn_node_for_addr) for IP-to-fn_node resolution without a BB, and exports new_recursion/insert_bbcc_into_hash so they are accessible across translation units.
  • Tests: Adds two C-level regression tests (runtime_obj_skip_c, runtime_obj_skip_underflow) that build a separate shared library, register it for obj-skip, start instrumentation from inside the skipped lib, and assert no fn=skipme_* blocks leak into the dump.

Confidence Score: 4/5

Safe to merge; the core fix is well-reasoned and the two regression tests directly cover the reported leak channels.

The seeded-entry logic in pop_call_stack correctly distinguishes normal skip entries (cxt=0, no restore) from seeded non-skipped entries (cxt≠0, restore via new branch) and avoids interfering with the normal jcc path. The reconstruction loop ordering (push_cxt before ensure_stack_size) is safe under current constants but creates a silent coupling between CLG_RECON_MAX_FRAMES and N_CALL_STACK_INITIAL_ENTRIES. The ce->nonskipped field is left uninitialized in seeded entries and the static buffer in get_fn_node_for_addr relies on an implicit strdup contract — both minor, non-blocking concerns. Test post-scripts give a false-OK if the callgrind output file is absent.

callgrind/callstack.c (the new reconstruct_call_stack_from_native function and the pop_call_stack else-if branch) and callgrind/fn.c (the static buffer in CLG_(get_fn_node_for_addr)).

Important Files Changed

Filename Overview
callgrind/callstack.c Core change: adds reconstruct_call_stack_from_native to seed the shadow call stack on instrumentation start, and extends pop_call_stack with a new else if (cxt != 0) branch to restore context when seeded entries unwind. Logic is sound but has two latent issues: ce->nonskipped is not explicitly zeroed, and push_cxt is called before ensure_stack_size.
callgrind/fn.c Adds CLG_(get_fn_node_for_addr) to resolve raw IPs to fn_nodes for use by the stack reconstruction path. Logic is correct; the static buffer for anonymous-address formatting is safe under the current strdup contract but fragile by design.
callgrind/bbcc.c Promotes new_recursion and insert_bbcc_into_hash from static to exported CLG_-prefixed functions; adds a debug-log line in handleUnderflow. All call sites updated consistently.
callgrind/main.c Calls CLG_(reconstruct_call_stack_from_native)(tid) immediately after CLG_(set_instrument_state) on VG_USERREQ__START_INSTRUMENTATION. Ordering is correct and the cs->sp != 0 guard prevents double-seeding.
callgrind/global.h Declares the new exported symbols. CLG_(new_recursion) and CLG_(insert_bbcc_into_hash) are only used within bbcc.c in this PR; their export appears forward-looking.
callgrind/tests/Makefile.am Adds build rules for two new shared libraries and their test binaries; links with -l:*.so -ldl and RPATH=$ORIGIN so tests find the libs at runtime without install.
callgrind/tests/runtime_obj_skip_c.vgtest Regression test for the obj-skip leak via the cxt==0 push path. Post-check grep can produce a false-OK if the callgrind output file is missing.
callgrind/tests/runtime_obj_skip_underflow.vgtest Regression test for the underflow-channel obj-skip leak. Same false-OK risk as the sister test when the output file is absent.

Sequence Diagram

sequenceDiagram
    participant Client as Client code
    participant VG as Valgrind core
    participant CLG as Callgrind

    Client->>VG: CALLGRIND_START_INSTRUMENTATION
    VG->>CLG: handle_client_request(START)
    CLG->>CLG: set_instrument_state(ON)
    CLG->>VG: VG_(get_StackTrace)(tid, ips[], sps[])
    VG-->>CLG: n frames [bottom to top]
    loop "frame = n-1 downto 0 (oldest first)"
        CLG->>CLG: get_fn_node_for_addr(ips[frame])
        CLG->>CLG: "check obj_skip list -> fn->skip"
        alt fn not skipped
            CLG->>CLG: "push_cxt(fn) -> seed cxt chain"
        end
        CLG->>CLG: "push call_entry(jcc=0, sp=sps[frame+1])"
    end
    Note over CLG: shadow call stack seeded
    Client->>Client: execute benchmark (instrumented)
    loop each RET in instrumented code
        CLG->>CLG: unwind_call_stack(sp)
        CLG->>CLG: pop_call_stack()
        alt "jcc != 0"
            CLG->>CLG: restore cxt + fn_sp (normal path)
        else "cxt != 0 (seeded entry)"
            CLG->>CLG: restore cxt + fn_sp (new path)
        else "cxt == 0 (skipped entry)"
            CLG->>CLG: no cxt restore
        end
    end
Loading

Reviews (2): Last reviewed commit: "test(callgrind): add regression tests fo..." | Re-trigger Greptile

Comment thread callgrind/callstack.c Outdated
Comment thread callgrind/callstack.c
Comment thread callgrind/fn.c
Comment thread callgrind/tests/runtime_obj_skip_c.vgtest Outdated
Copy link
Copy Markdown
Member

@art049 art049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! we need to carefully test this before putting it ga though

Comment thread callgrind/bbcc.c
@not-matthias not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from 21ab8bb to 641c4d2 Compare June 2, 2026 12:22
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 2, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

@not-matthias
Copy link
Copy Markdown
Member Author

@codspeedbot investigate the regresions / flakyness

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 2, 2026

@codspeedbot investigate the regresions / flakyness

Regression Investigation

I analyzed the 4 benchmarks flagged by CodSpeed (2 regressions, 2 improvements) by comparing values across multiple runs on this branch and examining the flamegraphs. These are walltime flakiness, not regressions caused by the code changes in this PR.

Evidence: Cross-run variance on the same branch

test_valgrind[valgrind-3.25.1, echo Hello, World!, full-with-inline] (reported as -97.54%):

Run Commit Value
6a19b668 (May 29) 21ab8bb 748.9 ms
6a1ecc6f (Jun 2) 641c4d2 71.6 s
6a1ed177 (Jun 2) d324ccf 29.8 s
BASE (master) d2dd609 733.6 ms

This benchmark swings from ~750ms to 30–72s on the same branch code, confirming it's environmental noise rather than a code regression.

test_valgrind[valgrind.codspeed, python3 testdata/test.py, full-no-inline] (reported as -70.8%):

Run Commit Value
6a1ecc6f (Jun 2) 641c4d2 6.7 s
6a1ed177 (Jun 2) d324ccf 24.6 s
BASE (master) d2dd609 7.2 s

Same pattern — the previous branch run measured 6.7s (in line with the 7.2s base), then it jumped to 24.6s on the very next run.

The "improvements" are also flaky

  • valgrind-3.26.0, python3 test.py, full-no-inline (+65.8%): The base value of 50.3s was itself an outlier — earlier runs on both branches measured ~5.5–6.7s. The HEAD value of 30.3s is also elevated.
  • valgrind.codspeed, python3 test.py, full-with-inline (+21%): 8.7s → 7.2s, within normal variance.

Flamegraph analysis confirms flakiness

When the echo Hello, World! benchmark "regresses" (29.8s vs 733ms), the flamegraph shows vgCallgrind_setup_bbcc dominating at 9.4s self time (31%) with only 2 calls, and cachesim_setref at 5.1s (17%). In the normal (733ms) base run, neither function even appears in the top 10 — the time is spent in expected startup work (debug info loading, JIT compilation). This pattern is consistent with system-level interference (memory pressure, cache effects, scheduling contention) inflating walltime unpredictably during valgrind's instrumented execution.

Conclusion

All 4 flagged changes (both regressions and improvements) are walltime noise affecting the full-with-inline and full-no-inline configurations. The PR's code changes do not cause any consistent performance degradation. These benchmarks can be safely acknowledged on CodSpeed.

Summary
  • Warming up the performance wizard...
  • Investigating regressions — Analysis complete — regressions are walltime flakiness, not code-related
  • Handling comment — Comment handling finished successfully

Was this response helpful? React with 👍 or 👎 to help us improve CodSpeedBot.

@not-matthias
Copy link
Copy Markdown
Member Author

@codspeedbot investigate why they are so flaky

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 2, 2026

@codspeedbot investigate why they are so flaky

Why These Benchmarks Are So Flaky

I did a deep analysis across 5 runs (3 on this branch, 2 on master), comparing flamegraphs for both normal and outlier measurements. The flakiness is structural — it stems from the nature of what's being benchmarked and how it's measured.

The Smoking Gun: Flamegraph Comparison

Normal run (echo Hello, World!, full-with-inline @ 714ms on master):

  • Time dominated by startup: debug info loading (50%), JIT compilation (25%)
  • vgCallgrind_setup_bbcc doesn't even appear in the top 10

Outlier run (same benchmark @ 71.6s on this branch — 100× slower):

  • vgCallgrind_setup_bbcc: 20.6s self time (28.9%) with only 2 calls
  • cachesim_setref: 6.5s with 6 calls
  • vgPlain_scheduler self time: 3.7s (vs ~2ms normally)

The same 2 calls to setup_bbcc that normally take microseconds are taking 10 seconds each. This is not a code regression — it's system-level interference inflating walltime for functions that happen to be running when the disturbance hits.

This same pattern appears on master too: valgrind-3.26.0, python3 test.py, full-no-inline measured 50.3s on the latest master run vs its normal ~5.5s, with the identical flamegraph signature (setup_bbcc at 13.1s/26% with 2 calls).

Root Causes

1. Profiler-on-profiler stacking
The benchmarks run samply (walltime profiler) → valgrind --tool=callgrind (instrumentation profiler) → guest program. Samply's signal-based sampling interrupts valgrind's tight scheduler dispatch loop (vgPlain_scheduler). In outlier runs, vgPlain_scheduler accumulates 3.7–3.9s of self time (7–16%), vs negligible in normal runs — this is time spent handling profiler signals between JIT block dispatches.

2. Single-shot measurement with no warmup or repetition
Each benchmark is one valgrind --tool=callgrind ... invocation, measured once. There's no warmup run and no averaging across iterations. A single transient system disturbance during the run produces a permanent outlier with no way to filter it out.

3. Cache simulation amplifies system noise exponentially
The full-* configs enable --cache-sim=yes, which simulates L1/L2/LL caches. The simulation's own working set (cache state arrays, cost counters) must stay in the physical CPU cache to perform well. When profiler signals or system noise evict this hot data, every subsequent cache simulation access becomes a cache miss — creating a cascading slowdown. This explains why cachesim_setref (6 calls) takes 6.5s in outlier runs vs being invisible normally.

4. Short benchmarks amplify relative noise
echo Hello, World! normally takes ~700ms. If a system hiccup adds 29s of overhead, that shows as a -97% regression. The same 29s added to a 300s benchmark would only be -10%. The shortest benchmarks will always be the noisiest in relative terms.

5. Transient system-level interference
Even on dedicated codspeed-macro runners, transient memory pressure, hypervisor-level scheduling, or NUMA effects can cause dramatic walltime inflation for memory-intensive workloads. Valgrind's JIT cache + cache simulation data structures make it one of the most memory-sensitive workloads possible.

Possible Mitigations

Mitigation Impact Effort
Add iteration/repetition — run each benchmark 3–5× and take the median High — filters transient outliers Medium — requires changes to the exec harness or a wrapper script
Add a warmup run — discard the first iteration Medium — avoids cold-start outliers Low
Drop the shortest benchmarksecho Hello, World! is inherently too short for stable walltime measurement under valgrind Medium — eliminates the worst offenders Low
Separate profiler configs from non-profiler configs — run full-* configs without the samply profiler, or accept higher variance for them Medium — reduces profiler-on-profiler interference Low
Increase timeout and add outlier detection — flag runs where any benchmark is >5× its historical median Low (detection only) — doesn't prevent flakiness but avoids false regression reports Low
Summary
  • Warming up the performance wizard...
  • Investigating benchmark flakiness — Root cause analysis complete — identified 5 structural causes of walltime flakiness
  • Submitting findings — Detailed flakiness analysis submitted as PR comment
  • Handling comment — Comment handling finished successfully

Was this response helpful? React with 👍 or 👎 to help us improve CodSpeedBot.

CodSpeed runs benchmarks with --instr-atstart=no and fires
CALLGRIND_START_INSTRUMENTATION several frames deep (inside libpython, or
behind a V8/JIT trampoline). The shadow stack starts at 0 while real frames
exist, so every later return underflows and the inclusive cost collapses onto
a phantom root.

- Reconstruct the shadow stack from the native stack at the OFF->ON transition
  (CLG_(reconstruct_call_stack_from_native)), seeding each frame's entry SP so
  it pops correctly instead of underflowing.
- Name anonymous JIT frames by address in get_fn_node_for_addr (mirroring the
  BB path) instead of "???", so the seeded root frame (e.g.
  __codspeed_root_frame__) is preserved and stays backend-symbolicatable via
  perf-<pid>.map.
- Add --obj-skip and CALLGRIND_ADD_OBJ_SKIP to exclude whole objects (the node
  binary, libpython) from the call graph.
@not-matthias not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from d324ccf to 310ac3d Compare June 2, 2026 13:02
- runtime_obj_skip_c: a fn from a skipped object must not leak into the
  dump as a top-level fn= block when it is the first BB after START
  (the cxt==0 force-push path)
- runtime_obj_skip_underflow: a RET past an empty call stack
  (handleUnderflow) must not re-leak the skipped fn -- the Python 3.14
  deep recursive interpreter-dispatch shape

Both exercise --obj-skip / CALLGRIND_ADD_OBJ_SKIP from a separately-linked
.so. Also filters callgrind diagnostic logs from test stderr and gitignores
the test build artifacts (binaries, .so, logs).
@not-matthias not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch 2 times, most recently from 6c726b7 to ae9ee5c Compare June 2, 2026 16:23
Comment thread callgrind/tests/Makefile.am
@not-matthias not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from 14832cf to ae9ee5c Compare June 3, 2026 09:39
@not-matthias not-matthias merged commit ae9ee5c into master Jun 3, 2026
18 of 22 checks passed
@not-matthias not-matthias deleted the cod-2714-investigate-pytest-flamegraph-regression-final branch June 3, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants