ProjectTorreyPines · mgyoo86 · Mar 12, 2026 · Mar 12, 2026 · Mar 12, 2026 · Mar 12, 2026
diff --git a/docs/src/basics/safety-rules.md b/docs/src/basics/safety-rules.md
@@ -105,20 +105,20 @@ end
 end
 ```
 
-## Debugging with POOL_DEBUG
+## Debugging with RUNTIME_CHECK
 
-Enable runtime safety checks during development:
+Enable runtime safety checks during development by setting the `runtime_check` preference:
 
-```julia
-using AdaptiveArrayPools
-AdaptiveArrayPools.POOL_DEBUG[] = true
-
-@with_pool pool function test()
-    v = acquire!(pool, Float64, 100)
-    return v  # Will warn about returning pool-backed array
-end
+```toml
+# LocalPreferences.toml
+[AdaptiveArrayPools]
+runtime_check = 1   # 0 = off (default), 1 = on
 ```
 
+**Restart Julia** after changing this setting. When enabled, returning a pool-backed array from a `@with_pool` block throws a `PoolRuntimeEscapeError` with the exact source location.
+
+See [Safety](../features/safety.md) for full details on what `RUNTIME_CHECK = 1` enables (poisoning, structural invalidation, escape detection, borrow tracking).
+
 ## acquire! vs unsafe_acquire!
 
 | Function | Returns | Best For |

diff --git a/docs/src/features/configuration.md b/docs/src/features/configuration.md
@@ -4,10 +4,13 @@ AdaptiveArrayPools can be configured via `LocalPreferences.toml`:
 
 ```toml
 [AdaptiveArrayPools]
-use_pooling = false  # ⭐ Primary: Disable pooling entirely
-cache_ways = 8       # Advanced: N-way cache size (default: 4)
+use_pooling = false      # ⭐ Primary: Disable pooling entirely
+runtime_check = 1        # Safety: Enable runtime safety checks
+cache_ways = 8           # Advanced: N-way cache size (default: 4)
 ```
 
+All compile-time preferences require **restarting Julia** to take effect.
+
 ## Compile-time: STATIC_POOLING (⭐ Primary)
 
 **The most important configuration.** Completely disable pooling to make `acquire!` behave like standard allocation.
@@ -50,26 +53,40 @@ Use `pooling_enabled(pool)` to check if pooling is active.
 
 All pooling code is **completely eliminated at compile time** (zero overhead).
 
-## Runtime: MAYBE_POOLING
+## Compile-time: RUNTIME_CHECK
 
-Only affects `@maybe_with_pool`. Toggle without restart.
+Enable runtime safety checks to catch pool-escape bugs. See [Safety](safety.md) for full details.
+
+```toml
+# LocalPreferences.toml
+[AdaptiveArrayPools]
+runtime_check = 1      # enable (0 = off, 1 = on)
+# runtime_check = true  # also accepted
+```
+
+Or programmatically:
 
 ```julia
-MAYBE_POOLING[] = false  # Disable
-MAYBE_POOLING[] = true   # Enable (default)
+using Preferences
+Preferences.set_preferences!(AdaptiveArrayPools, "runtime_check" => 1)
+# Restart Julia for changes to take effect
 ```
 
-## Runtime: POOL_DEBUG
+Accepts both `Bool` and `Int` values — internally normalized to `Int`:
+- `false` / `0` → off (zero overhead, all safety branches eliminated)
+- `true` / `1` → on (poisoning + invalidation + escape detection + borrow tracking)
+
+The safety level is baked into the pool type parameter: `AdaptiveArrayPool{0}` or `AdaptiveArrayPool{1}`. This enables dead-code elimination — at `RUNTIME_CHECK = 0`, all safety branches are completely removed by the compiler.
+
+## Runtime: MAYBE_POOLING
 
-Enable safety validation to catch direct returns of pool-backed arrays.
+Only affects `@maybe_with_pool`. Toggle without restart.
 
 ```julia
-POOL_DEBUG[] = true   # Enable safety checks (development)
-POOL_DEBUG[] = false  # Disable (default, production)
+MAYBE_POOLING[] = false  # Disable
+MAYBE_POOLING[] = true   # Enable (default)
 ```
 
-When enabled, returning a pool-backed array from a `@with_pool` block will throw an error.
-
 ## Compile-time: CACHE_WAYS (Julia 1.10 / CUDA only)
 
 Configure the N-way cache size for `unsafe_acquire!`. **On Julia 1.11+ CPU, this setting has no effect** — the `setfield!`-based wrapper reuse supports unlimited dimension patterns with zero allocation.
@@ -99,6 +116,6 @@ set_cache_ways!(8)
 | Setting | Scope | Restart? | Priority | Affects |
 |---------|-------|----------|----------|---------|
 | `use_pooling` | Compile-time | Yes | ⭐ Primary | All macros, `acquire!` behavior |
+| `runtime_check` | Compile-time | Yes | Safety | Poisoning, invalidation, escape detection |
 | `cache_ways` | Compile-time | Yes | Advanced | `unsafe_acquire!` N-D caching (Julia 1.10 / CUDA only) |
 | `MAYBE_POOLING` | Runtime | No | Optional | `@maybe_with_pool` only |
-| `POOL_DEBUG` | Runtime | No | Debug | Safety validation |
diff --git a/docs/src/features/multi-threading.md b/docs/src/features/multi-threading.md
@@ -270,7 +270,7 @@ If you encounter unexpected behavior:
 
 1. **Check pool placement**: Is `@with_pool` inside or outside `@threads`?
 2. **Check pool sharing**: Is the same pool variable accessed from multiple Tasks?
-3. **Enable POOL_DEBUG**: `POOL_DEBUG[] = true` catches some (not all) misuse patterns
+3. **Enable RUNTIME_CHECK**: Set `runtime_check = 1` in `LocalPreferences.toml` (restart required) to catch escape bugs
 
 ---
 
@@ -281,4 +281,4 @@ If you encounter unexpected behavior:
 - `@threads` creates one Task per thread → pools are reused within the block
 - **Always place `@with_pool` inside `@threads`**, not outside
 - Thread-local pools are **not an alternative** due to stack discipline requirements
-- Correct usage is the user's responsibility (no runtime checks for performance)
+- Correct usage is the user's responsibility (enable `RUNTIME_CHECK` during development to catch bugs)
diff --git a/docs/src/features/safety.md b/docs/src/features/safety.md
@@ -1,6 +1,6 @@
 # Pool Safety
 
-AdaptiveArrayPools catches pool-escape bugs at **two levels**: compile-time (macro analysis) and runtime (configurable safety levels).
+AdaptiveArrayPools catches pool-escape bugs at **two levels**: compile-time (macro analysis) and runtime (configurable via `RUNTIME_CHECK`).
 
 ## Compile-Time Detection
 
@@ -65,67 +65,103 @@ end
 end
 ```
 
-## Runtime Safety Levels
+## Runtime Safety (`RUNTIME_CHECK`)
 
 For bugs the compiler can't catch (e.g., values hidden behind opaque function calls), runtime safety provides configurable protection via the type parameter `S` in `AdaptiveArrayPool{S}`.
 
-### Level Overview
+### Binary System
 
-| Level | Name | CPU | CUDA | Overhead |
-|-------|------|-----|------|----------|
-| **0** | off | No-op (all branches dead-code-eliminated) | Same | Zero |
-| **1** | guard | `resize!(v,0)` + `setfield!` invalidation | NaN/sentinel poisoning + cache clear | ~5ns/slot |
-| **2** | full | Level 1 + data poisoning + escape detection at scope exit | Level 1 + device-pointer overlap check | Moderate |
-| **3** | debug | Level 2 + acquire call-site tracking | Same | Moderate+ |
+| `RUNTIME_CHECK` | State | What Happens | Overhead |
+|:-:|-------|--------------|----------|
+| **0** | off | All safety branches dead-code-eliminated | **Zero** |
+| **1** | on | Poisoning + structural invalidation + escape detection + borrow tracking | ~5ns/slot |
 
-### Why CPU and CUDA Differ at Level 1
+`RUNTIME_CHECK` is a **compile-time constant** — not a runtime toggle. At `RUNTIME_CHECK = 0`, the JIT eliminates all safety branches completely. No `Ref` reads, no conditional branches, no overhead whatsoever.
 
-Both achieve the same goal — **make stale references fail loudly** — but use different mechanisms:
+### Enabling Runtime Safety
 
-| | CPU | CUDA |
-|---|-----|------|
-| **Strategy** | Structural invalidation | Data poisoning |
-| **Mechanism** | `resize!(v, 0)` shrinks backing vector to length 0; `setfield!(:size, (0,))` zeroes the array dimensions | `CUDA.fill!(v, NaN)` / `typemax` / `true` fills backing CuVector with sentinel values |
-| **Stale access result** | `BoundsError` (array has length 0) | Reads `NaN` or `typemax` (obviously wrong data) |
-| **Why not the other way?** | CPU `resize!` is cheap (~0 cost) | CUDA `resize!` calls `CUDA.Mem.free()` — destroys the pooled VRAM allocation |
-| **Cache invalidation** | View length/dims zeroed | N-way view cache entries cleared to `nothing` |
+Set the `runtime_check` preference in `LocalPreferences.toml` and **restart Julia**:
+
+```toml
+# LocalPreferences.toml
+[AdaptiveArrayPools]
+runtime_check = 1     # enable all safety checks
+# runtime_check = true  # also accepted (normalized to 1 internally)
+```
 
-### Setting the Level
+Or programmatically:
 
 ```julia
-using AdaptiveArrayPools
+using Preferences
+Preferences.set_preferences!(AdaptiveArrayPools, "runtime_check" => 1)
+# Restart Julia for changes to take effect
+```
 
-# Enable full safety on CPU + all GPU devices (preserves cached arrays, zero-copy)
-set_safety_level!(2)
+!!! warning "Restart Required"
+    `RUNTIME_CHECK` is baked into the pool type at compile time (`AdaptiveArrayPool{S}`). Changing the preference **requires restarting Julia** — it cannot be toggled at runtime.
 
-# Back to zero overhead everywhere
-set_safety_level!(0)
-```
+### What `RUNTIME_CHECK = 1` Enables
 
-The pool type parameter `S` is a compile-time constant. At `S=0`, the JIT eliminates all safety branches via dead-code elimination — true zero overhead with no `Ref` reads or conditional branches.
+When safety is on, `@with_pool` scope exit triggers the following protections:
 
-### Data Poisoning (Level 2+, CPU)
+#### 1. Data Poisoning
 
-At Level 1, CPU relies on **structural invalidation** (`resize!` + `setfield!`) which makes stale views throw `BoundsError`. At Level 2+, CPU additionally **poisons** the backing vector data with sentinel values (`NaN`, `typemax`, all-`true` for `BitVector`) *before* structural invalidation. This catches stale access through `unsafe_acquire!` wrappers on Julia 1.10 where `setfield!` on Array is unavailable.
+Released arrays are filled with detectable sentinel values **before** structural invalidation:
 
-CUDA already poisons at Level 1 (its primary invalidation strategy), so no additional poisoning step is needed at Level 2.
+| Element Type | Poison Value | Detection |
+|-------------|-------------|-----------|
+| `Float64`, `Float32`, `Float16` | `NaN` | `isnan(x)` returns `true` |
+| `Int64`, `Int32`, etc. | `typemax(T)` | Obviously wrong value |
+| `ComplexF64`, `ComplexF32` | `NaN + NaN*im` | `isnan(real(x))` |
+| `Bool` | `true` | All-true is suspicious |
+| Other types | `zero(T)` | Generic fallback |
+
+#### 2. Structural Invalidation
+
+After poisoning, stale references are made to fail loudly:
+
+| | CPU | CUDA |
+|---|-----|------|
+| **Mechanism** | `resize!(v, 0)` shrinks backing vector; `setfield!(:size, (0,))` zeroes array dimensions | `_resize_to_fit!(v, 0)` shrinks logical length (GPU memory preserved) |
+| **Stale access** | `BoundsError` (array has length 0) | `BoundsError` (logical length 0); poisoned data visible on re-acquire |
+| **arr_wrapper** | Dimensions set to `(0,)` / `(0,0)` | Same |
+| **Why different?** | CPU `resize!` is cheap (~0 cost) | CUDA `resize!` would call `CUDA.Mem.free()` — destroys pooled VRAM |
 
-### Escape Detection (Level 2+)
+#### 3. Escape Detection
 
 At every `@with_pool` scope exit, the return value is inspected for overlap with pool-backed memory. Recursively checks `Tuple`, `NamedTuple`, `Dict`, `Pair`, `Set`, and `AbstractArray` elements.
 
-Level 3 additionally records each `acquire!` call-site, so the error message pinpoints the exact source line and expression that allocated the escaping array.
+```julia
+# Throws PoolRuntimeEscapeError at scope exit
+@with_pool pool begin
+    v = acquire!(pool, Float64, 100)
+    opaque_function(v)  # returns v through opaque call
+end
+```
 
-### Legacy: `POOL_DEBUG`
+#### 4. Borrow Tracking
 
-`POOL_DEBUG[] = true` triggers Level 2 escape detection regardless of `S`. For new code, prefer `set_safety_level!(2)`.
+Each `acquire!` call-site is recorded, so escape error messages pinpoint the exact source line and expression that allocated the escaping array:
+
+```
+PoolEscapeError (runtime, RUNTIME_CHECK >= 1)
+
+    SubArray{Float64, 1, ...}
+      ← backed by Float64 pool memory, will be reclaimed at scope exit
+      ← acquired at src/solver.jl:42
+        v = acquire!(pool, Float64, n)
+
+  Fix: Wrap with collect() to return an owned copy, or compute a scalar result.
+```
 
 ## Recommended Workflow
 
-```julia
-# Development / Testing: catch bugs early
-set_safety_level!(2)   # or 3 for call-site info in error messages
+```toml
+# Development / Testing (LocalPreferences.toml):
+[AdaptiveArrayPools]
+runtime_check = 1     # catch bugs early — restart Julia after changing
 
-# Production: zero overhead
-set_safety_level!(0)   # all safety branches eliminated by the compiler
+# Production:
+[AdaptiveArrayPools]
+runtime_check = 0     # zero overhead — all safety branches eliminated
 ```
diff --git a/docs/src/reference/api.md b/docs/src/reference/api.md
@@ -51,7 +51,7 @@ Default element type is `Float64` (CPU) or `Float32` (CUDA).
 |--------|-------------|
 | `STATIC_POOLING` | Compile-time constant to disable all pooling. (alias: `USE_POOLING`) |
 | `MAYBE_POOLING` | Runtime `Ref{Bool}` for `@maybe_with_pool`. (alias: `MAYBE_POOLING_ENABLED`) |
-| `POOL_DEBUG` | Runtime `Ref{Bool}` to enable safety validation. |
+| `RUNTIME_CHECK` | Compile-time `Int` constant (0=off, 1=on). Set via `runtime_check` preference. Restart required. |
 | `set_cache_ways!(n)` | Set N-way cache size (Julia 1.10 / CUDA only; no effect on Julia 1.11+ CPU). |
 
 ---

diff --git a/ext/AdaptiveArrayPoolsCUDAExt/debug.jl b/ext/AdaptiveArrayPoolsCUDAExt/debug.jl
@@ -3,28 +3,24 @@
 # ==============================================================================
 # CUDA-specific safety implementations for CuAdaptiveArrayPool{S}.
 #
-# Safety levels on CUDA differ from CPU:
-# - Level 0: Zero overhead (all branches dead-code-eliminated)
-# - Level 1: Poisoning (NaN/sentinel fill) + structural invalidation via
-#            _resize_to_fit!(vec, 0) + arr_wrappers invalidation (setfield!(:dims, zeros))
-# - Level 2: Poisoning + escape detection (_validate_pool_return for CuArrays)
-# - Level 3: Full + borrow call-site registry + debug messages
+# Binary safety system (S=0 off, S=1 all checks):
+# - S=0: Zero overhead (all branches dead-code-eliminated)
+# - S=1: Poisoning + structural invalidation + escape detection + borrow tracking
 #
 # Key difference: CPU uses resize!(v, 0) at Level 1 to invalidate stale SubArrays.
 # On CUDA, resize!(CuVector, 0) would free GPU memory, so we use
 # _resize_to_fit!(vec, 0) instead — sets dims to (0,) while preserving
 # the GPU allocation (maxsize). Poisoning fills sentinel data before the shrink.
 # arr_wrappers are invalidated by setting wrapper dims to zeros (matches CPU pattern).
 
-using AdaptiveArrayPools: _safety_level, _validate_pool_return,
+using AdaptiveArrayPools: _runtime_check, _validate_pool_return,
     _set_pending_callsite!, _maybe_record_borrow!,
     _invalidate_released_slots!, _zero_dims_tuple,
     _throw_pool_escape_error,
-    POOL_DEBUG, POOL_SAFETY_LV,
     PoolRuntimeEscapeError
 
 # ==============================================================================
-# Poisoning: Fill released CuVectors with sentinel values (Level 1+)
+# Poisoning: Fill released CuVectors with sentinel values (S=1)
 # ==============================================================================
 
 _cuda_poison_value(::Type{T}) where {T <: AbstractFloat} = T(NaN)
@@ -45,12 +41,12 @@ Fill a CuVector with a detectable sentinel value (NaN for floats, typemax for in
 end
 
 # ==============================================================================
-# _invalidate_released_slots! for CuTypedPool (Level 1+)
+# _invalidate_released_slots! for CuTypedPool (S=1)
 # ==============================================================================
 #
 # Overrides the no-op fallback in base. On CUDA:
-# - Level 0: no-op (base _rewind_typed_pool! gates with S >= 1, so never called)
-# - Level 1+: poison released CuVectors + invalidate arr_wrappers
+# - S=0: no-op (base _rewind_typed_pool! gates with S >= 1, so never called)
+# - S=1: poison released CuVectors + invalidate arr_wrappers
 # - NO resize!(cuv, 0) — would free GPU memory; use _resize_to_fit! instead
 
 @noinline function AdaptiveArrayPools._invalidate_released_slots!(
@@ -79,22 +75,22 @@ end
 end
 
 # ==============================================================================
-# Borrow Tracking: Call-site recording (Level 3)
+# Borrow Tracking: Call-site recording (S=1)
 # ==============================================================================
 #
 # Overrides the no-op AbstractArrayPool fallbacks.
 # The macro injects pool._pending_callsite = "file:line\nexpr" before acquire calls.
 # These functions flush that pending info into the borrow log.
 
-"""Record pending callsite for borrow tracking (compiles to no-op when S < 3)."""
+"""Record pending callsite for borrow tracking (compiles to no-op when S=0)."""
 @inline function AdaptiveArrayPools._set_pending_callsite!(pool::CuAdaptiveArrayPool{S}, msg::String) where {S}
-    S >= 3 && isempty(pool._pending_callsite) && (pool._pending_callsite = msg)
+    S >= 1 && isempty(pool._pending_callsite) && (pool._pending_callsite = msg)
     return nothing
 end
 
-"""Flush pending callsite into borrow log (compiles to no-op when S < 3)."""
+"""Flush pending callsite into borrow log (compiles to no-op when S=0)."""
 @inline function AdaptiveArrayPools._maybe_record_borrow!(pool::CuAdaptiveArrayPool{S}, tp::AbstractTypedPool) where {S}
-    S >= 3 && _cuda_record_borrow_from_pending!(pool, tp)
+    S >= 1 && _cuda_record_borrow_from_pending!(pool, tp)
     return nothing
 end
 
@@ -118,14 +114,14 @@ end
 end
 
 # ==============================================================================
-# Escape Detection: _validate_pool_return for CuArrays (Level 2+)
+# Escape Detection: _validate_pool_return for CuArrays (S=1)
 # ==============================================================================
 #
 # CuArray views share the same device buffer, so device pointer overlap
 # detection works correctly. pointer(::CuArray) returns CuPtr{T}.
 
 function AdaptiveArrayPools._validate_pool_return(val, pool::CuAdaptiveArrayPool{S}) where {S}
-    (S >= 2 || POOL_DEBUG[]) || return nothing
+    S >= 1 || return nothing
     _validate_cuda_return(val, pool)
     return nothing
 end

diff --git a/ext/AdaptiveArrayPoolsCUDAExt/macros.jl b/ext/AdaptiveArrayPoolsCUDAExt/macros.jl
@@ -16,12 +16,12 @@ Uses Val dispatch for compile-time resolution and full inlining.
 @inline AdaptiveArrayPools._get_pool_for_backend(::Val{:cuda}) = get_task_local_cuda_pool()
 
 # ==============================================================================
-# Pool Type Registration for Closureless Union Splitting
+# Pool Type Registration for Compile-Time Type Assertion
 # ==============================================================================
 #
 # `_pool_type_for_backend` is called at macro expansion time to determine the
-# concrete pool type for closureless `let`/`if isa` chain generation.
-# This enables `@with_pool :cuda` to generate `if _raw isa CuAdaptiveArrayPool{0} ...`
-# instead of closure-based `_dispatch_pool_scope`.
+# concrete pool type for direct type assertion in macro-generated code.
+# This enables `@with_pool :cuda` to generate `pool::CuAdaptiveArrayPool{S}`
+# where S is determined by the compile-time const `RUNTIME_CHECK`.
 
 AdaptiveArrayPools._pool_type_for_backend(::Val{:cuda}) = CuAdaptiveArrayPool