Skip to content

JIT: Assertion failure Stack underflow (depth = -1) #149049

@devdanzin

Description

@devdanzin

Crash report

What happened?

It's possible to cause an assertion failure in a patched JIT build by running the short code below.

Necessary patch

diff --git a/Include/internal/pycore_backoff.h b/Include/internal/pycore_backoff.h
index 38dd82f6fc8..e1ef449e346 100644
--- a/Include/internal/pycore_backoff.h
+++ b/Include/internal/pycore_backoff.h
@@ -125,7 +125,7 @@ trigger_backoff_counter(void)
 // For example, 4095 does not work for the nqueens benchmark on pyperformance
 // as we always end up tracing the loop iteration's
 // exhaustion iteration. Which aborts our current tracer.
-#define JUMP_BACKWARD_INITIAL_VALUE 4000
+#define JUMP_BACKWARD_INITIAL_VALUE 63
 #define JUMP_BACKWARD_INITIAL_BACKOFF 6
 static inline _Py_BackoffCounter
 initial_jump_backoff_counter(_PyOptimizationConfig *opt_config)
@@ -153,7 +153,7 @@ initial_resume_backoff_counter(_PyOptimizationConfig *opt_config)
  * Must be larger than ADAPTIVE_COOLDOWN_VALUE,
  * otherwise when a side exit warms up we may construct
  * a new trace before the Tier 1 code has properly re-specialized. */
-#define SIDE_EXIT_INITIAL_VALUE 4000
+#define SIDE_EXIT_INITIAL_VALUE 63
 #define SIDE_EXIT_INITIAL_BACKOFF 6

 static inline _Py_BackoffCounter
diff --git a/Include/internal/pycore_optimizer.h b/Include/internal/pycore_optimizer.h
index 7c2e0e95a80..04dda1eb3a0 100644
--- a/Include/internal/pycore_optimizer.h
+++ b/Include/internal/pycore_optimizer.h
@@ -304,7 +304,7 @@ PyAPI_FUNC(void) _Py_Executors_InvalidateCold(PyInterpreterState *interp);
 // Used as the threshold to trigger executor invalidation when
 // executor_creation_counter is greater than this value.
 // This value is arbitrary and was not optimized.
-#define JIT_CLEANUP_THRESHOLD 1000
+#define JIT_CLEANUP_THRESHOLD 10000

 int _Py_uop_analyze_and_optimize(
     _PyThreadStateImpl *tstate,
diff --git a/Include/internal/pycore_optimizer_types.h b/Include/internal/pycore_optimizer_types.h
index a722652cc81..37976ba9f48 100644
--- a/Include/internal/pycore_optimizer_types.h
+++ b/Include/internal/pycore_optimizer_types.h
@@ -24,7 +24,7 @@ extern "C" {
 // progress (and inserting a new ENTER_EXECUTOR instruction). In practice, this
 // is the "maximum amount of polymorphism" that an isolated trace tree can
 // handle before rejoining the rest of the program.
-#define MAX_CHAIN_DEPTH 4
+#define MAX_CHAIN_DEPTH 16

 /* Symbols */
 /* See explanation in optimizer_symbols.c */

MRE:

def f1():
    def victim(a=0, b=float("nan"), c=2):
        return (a + b) / c

    for _ in range(90):
        res = victim()

for i in range(30):
    f1()

Backtrace:

Stack underflow (depth = -1) at Python/executor_cases.c.h:8520

Program received signal SIGABRT, Aborted.

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=0) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (threadid=<optimized out>, signo=6) at ./nptl/pthread_kill.c:89
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:100
#3  0x00007ffff7c45e2e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c28888 in __GI_abort () at ./stdlib/abort.c:77
#5  0x0000555555e8d4d4 in _Py_assert_within_stack_bounds (frame=frame@entry=0x7e8ff6fe5318, stack_pointer=stack_pointer@entry=0x7e8ff6fe5378, filename=<optimized out>,
    lineno=lineno@entry=8520) at Python/ceval.c:1005
#6  0x0000555555f77788 in _PyTier2Interpreter (current_executor=<optimized out>, frame=<optimized out>, stack_pointer=<optimized out>, tstate=<optimized out>)
    at Python/executor_cases.c.h:8520
#7  0x0000555555e917bb in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/generated_cases.c.h:5941
#8  0x0000555555e89af8 in _PyEval_EvalFrame (tstate=0x555556a754b8 <_PyRuntime+360760>, frame=0x7e8ff6fe5220, throwflag=0) at ./Include/internal/pycore_ceval.h:118
#9  _PyEval_Vector (tstate=<optimized out>, func=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0) at Python/ceval.c:2124
#10 0x0000555555e89515 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=0x7c7ff7089fc0) at Python/ceval.c:686
#11 0x000055555617dad0 in run_eval_code_obj (tstate=tstate@entry=0x555556a754b8 <_PyRuntime+360760>, co=co@entry=0x7d1ff701ab50, globals=globals@entry=0x7c7ff7089fc0,
    locals=locals@entry=0x7c7ff7089fc0) at Python/pythonrun.c:1369
#12 0x000055555617cc9c in run_mod (mod=<optimized out>, filename=<optimized out>, globals=<optimized out>, locals=<optimized out>, flags=<optimized out>, arena=<optimized out>,
    interactive_src=<optimized out>, generate_new_source=<optimized out>) at Python/pythonrun.c:1472

Output from running with PYTHON_LLTRACE=4 and PYTHON_OPT_DEBUG=4:

Claude's exploration of related reproducers

Here are the additional reproducers — every one is independently confirmed to abort on unmodified main with Stack underflow (depth = -1) at Python/executor_cases.c.h:8520 and to pass with the fix applied:

v2 — TOS recorded float, NOS int dividend (int / float)

def f1():
    def victim(a=2, b=0, c=float("nan")):
        return a / (b + c)
    for _ in range(90):
        victim()

for i in range(30):
    f1()

Tests the _GUARD_TOS_FLOAT side of the bug (mirror of the original).

v3 — % instead of / (NB_REMAINDER)

def f1():
    def victim(a=0, b=float("nan"), c=2):
        return (a + b) % c
    for _ in range(90):
        victim()

for i in range(30):
    f1()

Confirms the bug fires on the remainder branch too — same is_truediv || is_remainder guard logic.

v5 — minimal: no intermediate addition

def f1():
    def victim(b=float("nan"), c=2):
        return b / c
    for _ in range(90):
        victim()

for i in range(30):
    f1()

Drops the (a + b) indirection — the bug doesn't need a chained binop, just a recorded-float operand.

v6 — opposite direction: int / float

def f1():
    def victim(a=2, b=float("nan")):
        return a / b
    for _ in range(90):
        victim()

for i in range(30):
    f1()

TOS-side guard variant of v5.

v7 — no defaults, explicit args

def f1():
    def victim(b, c):
        return b / c

    nan = float("nan")
    for _ in range(90):
        victim(nan, 2)

for i in range(30):
    f1()

Shows defaults aren't part of the trigger — any consistent (float, int) call pair into a hot truediv works.

Negative cases (don't crash on baseline) worth mentioning

  • Same shape with float("inf") instead of float("nan") — passes. Likely because inf becomes a safe const somewhere upstream and gets folded out before the buggy path.
  • Same shape with a regular float literal like 2.5 — passes. _RECORD_TOS_TYPE doesn't fire on operands that the optimizer already knows are constants/safe-floats, so the speculative _GUARD_*_FLOAT branch isn't reached.

Both negatives are useful as a clue that the bug requires the operand to be a JIT_SYM_RECORDED_TYPE_TAG (float) rather than a safe-const float — i.e., it has to come in via _RECORD_*_TYPE from a _LOAD_FAST_BORROW'd local.

Claude analysis and suggested fix

Diagnosis & fix

Bug. The Tier 2 abstract interpreter for _BINARY_OP in Python/optimizer_bytecodes.c (introduced by commit 95cbd4a232d, gh-146393, GH-146397 — speculative float-divide narrowing) can drop the actual binary op from the trace.

The handler optionally emits ADD_OP(_GUARD_TOS_FLOAT) / ADD_OP(_GUARD_NOS_FLOAT) when an operand has a recorded probable type of float, then falls through to type-analysis branches. Only the is_truediv && lhs_float && rhs_float branch emits an ADD_OP for a specialized binary op. The other branches just compute a result-type symbol and return.

The optimizer_analysis.c driver auto-copies *this_instr to the output buffer only when out_buffer.next == out_ptr — i.e., when no ADD_OP was called that iteration. Adding only a guard therefore suppresses the auto-copy and the original _BINARY_OP disappears.

Concrete failing trace (from your MRE — (a + 0.0_nan) / 2, hot in victim):

The optimizer's abstract stack still tracks _BINARY_OP's (lhs, rhs -- res, l, r) (+1 net), but the emitted uops are only _GUARD_NOS_FLOAT (0) + _POP_TOP_NOP (-1) + _POP_TOP_FLOAT (-1) = -2. After the (a+b)/c macro the abstract stack thinks there is one return value at [ -1 ]; the physical stack is actually empty. _MAKE_HEAP_SAFE (the first half of RETURN_VALUE) then reads stack_pointer[-1] and decrements stack_pointer past Stackbase, tripping _Py_assert_within_stack_bounds at Python/executor_cases.c.h:8520.

Repros only when lhs (or rhs) is a JIT_SYM_RECORDED_TYPE_TAG of float (sym_has_type false, sym_get_probable_type == float) and the other operand is anything sym_matches_type doesn't see as float — exactly what you get with (int + float) / int after _RECORD_*_TYPE runs in the second BINARY_OP.

Fix. In Python/optimizer_bytecodes.c, restructure the _BINARY_OP handler so every non-specialized path also explicitly ADD_OP(_BINARY_OP, oparg, 0). The only branch that omits it is the float/float truediv path that already emits its own _BINARY_OP_TRUEDIV_FLOAT* replacement. (Identical edit applied to the generated Python/optimizer_cases.c.h; make regen-optimizer-cases produces the same content modulo whitespace.)

Verification.

  • Python/optimizer_bytecodes.c:319+ (and the generated Python/optimizer_cases.c.h case _BINARY_OP:).
  • After fix, the optimizer trace at the offending site is _GUARD_NOS_FLOAT + _BINARY_OP + _POP_TOP_NOP + _POP_TOP_FLOAT, restoring the +1/-1/-1 net effect that matches the abstract stack.
  • MRE no longer aborts; four hand-built variants (truediv with NOS recorded float, truediv with TOS recorded float, remainder mirror, infinity instead of NaN) also pass on the fix and crash on baseline (truediv ones, at least).
  • test_optimizer and test_capi.test_opt show no new regressions: the only test_capi.test_opt failures (test_resume, test_call_super) reproduce on a clean rebuild of unmodified main — pre-existing, unrelated to this fix.

The minimal source diff is in Python/optimizer_bytecodes.c lines 319–385 (re-nested as if {...} else { ADD_OP(_BINARY_OP, oparg, 0); ... }); the generated optimizer_cases.c.h is updated to match.

Found using lafleur.

CPython versions tested on:

3.15, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.15.0a8+ (heads/main-dirty:804c213c893, Apr 27 2026, 07:24:58) [Clang 21.1.2 (2ubuntu6)]

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-JITtype-crashA hard crash of the interpreter, possibly with a core dump

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions