Skip to content
20 changes: 10 additions & 10 deletions docs/src/basics/safety-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,20 +105,20 @@ end
end
```

## Debugging with POOL_DEBUG
## Debugging with RUNTIME_CHECK

Enable runtime safety checks during development:
Enable runtime safety checks during development by setting the `runtime_check` preference:

```julia
using AdaptiveArrayPools
AdaptiveArrayPools.POOL_DEBUG[] = true

@with_pool pool function test()
v = acquire!(pool, Float64, 100)
return v # Will warn about returning pool-backed array
end
```toml
# LocalPreferences.toml
[AdaptiveArrayPools]
runtime_check = 1 # 0 = off (default), 1 = on
```

**Restart Julia** after changing this setting. When enabled, returning a pool-backed array from a `@with_pool` block throws a `PoolRuntimeEscapeError` with the exact source location.

See [Safety](../features/safety.md) for full details on what `RUNTIME_CHECK = 1` enables (poisoning, structural invalidation, escape detection, borrow tracking).

## acquire! vs unsafe_acquire!

| Function | Returns | Best For |
Expand Down
43 changes: 30 additions & 13 deletions docs/src/features/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@ AdaptiveArrayPools can be configured via `LocalPreferences.toml`:

```toml
[AdaptiveArrayPools]
use_pooling = false # ⭐ Primary: Disable pooling entirely
cache_ways = 8 # Advanced: N-way cache size (default: 4)
use_pooling = false # ⭐ Primary: Disable pooling entirely
runtime_check = 1 # Safety: Enable runtime safety checks
cache_ways = 8 # Advanced: N-way cache size (default: 4)
```

All compile-time preferences require **restarting Julia** to take effect.

## Compile-time: STATIC_POOLING (⭐ Primary)

**The most important configuration.** Completely disable pooling to make `acquire!` behave like standard allocation.
Expand Down Expand Up @@ -50,26 +53,40 @@ Use `pooling_enabled(pool)` to check if pooling is active.

All pooling code is **completely eliminated at compile time** (zero overhead).

## Runtime: MAYBE_POOLING
## Compile-time: RUNTIME_CHECK

Only affects `@maybe_with_pool`. Toggle without restart.
Enable runtime safety checks to catch pool-escape bugs. See [Safety](safety.md) for full details.

```toml
# LocalPreferences.toml
[AdaptiveArrayPools]
runtime_check = 1 # enable (0 = off, 1 = on)
# runtime_check = true # also accepted
```

Or programmatically:

```julia
MAYBE_POOLING[] = false # Disable
MAYBE_POOLING[] = true # Enable (default)
using Preferences
Preferences.set_preferences!(AdaptiveArrayPools, "runtime_check" => 1)
# Restart Julia for changes to take effect
```

## Runtime: POOL_DEBUG
Accepts both `Bool` and `Int` values — internally normalized to `Int`:
- `false` / `0` → off (zero overhead, all safety branches eliminated)
- `true` / `1` → on (poisoning + invalidation + escape detection + borrow tracking)

The safety level is baked into the pool type parameter: `AdaptiveArrayPool{0}` or `AdaptiveArrayPool{1}`. This enables dead-code elimination — at `RUNTIME_CHECK = 0`, all safety branches are completely removed by the compiler.

## Runtime: MAYBE_POOLING

Enable safety validation to catch direct returns of pool-backed arrays.
Only affects `@maybe_with_pool`. Toggle without restart.

```julia
POOL_DEBUG[] = true # Enable safety checks (development)
POOL_DEBUG[] = false # Disable (default, production)
MAYBE_POOLING[] = false # Disable
MAYBE_POOLING[] = true # Enable (default)
```

When enabled, returning a pool-backed array from a `@with_pool` block will throw an error.

## Compile-time: CACHE_WAYS (Julia 1.10 / CUDA only)

Configure the N-way cache size for `unsafe_acquire!`. **On Julia 1.11+ CPU, this setting has no effect** — the `setfield!`-based wrapper reuse supports unlimited dimension patterns with zero allocation.
Expand Down Expand Up @@ -99,6 +116,6 @@ set_cache_ways!(8)
| Setting | Scope | Restart? | Priority | Affects |
|---------|-------|----------|----------|---------|
| `use_pooling` | Compile-time | Yes | ⭐ Primary | All macros, `acquire!` behavior |
| `runtime_check` | Compile-time | Yes | Safety | Poisoning, invalidation, escape detection |
| `cache_ways` | Compile-time | Yes | Advanced | `unsafe_acquire!` N-D caching (Julia 1.10 / CUDA only) |
| `MAYBE_POOLING` | Runtime | No | Optional | `@maybe_with_pool` only |
| `POOL_DEBUG` | Runtime | No | Debug | Safety validation |
4 changes: 2 additions & 2 deletions docs/src/features/multi-threading.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ If you encounter unexpected behavior:

1. **Check pool placement**: Is `@with_pool` inside or outside `@threads`?
2. **Check pool sharing**: Is the same pool variable accessed from multiple Tasks?
3. **Enable POOL_DEBUG**: `POOL_DEBUG[] = true` catches some (not all) misuse patterns
3. **Enable RUNTIME_CHECK**: Set `runtime_check = 1` in `LocalPreferences.toml` (restart required) to catch escape bugs

---

Expand All @@ -281,4 +281,4 @@ If you encounter unexpected behavior:
- `@threads` creates one Task per thread → pools are reused within the block
- **Always place `@with_pool` inside `@threads`**, not outside
- Thread-local pools are **not an alternative** due to stack discipline requirements
- Correct usage is the user's responsibility (no runtime checks for performance)
- Correct usage is the user's responsibility (enable `RUNTIME_CHECK` during development to catch bugs)
112 changes: 74 additions & 38 deletions docs/src/features/safety.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Pool Safety

AdaptiveArrayPools catches pool-escape bugs at **two levels**: compile-time (macro analysis) and runtime (configurable safety levels).
AdaptiveArrayPools catches pool-escape bugs at **two levels**: compile-time (macro analysis) and runtime (configurable via `RUNTIME_CHECK`).

## Compile-Time Detection

Expand Down Expand Up @@ -65,67 +65,103 @@ end
end
```

## Runtime Safety Levels
## Runtime Safety (`RUNTIME_CHECK`)

For bugs the compiler can't catch (e.g., values hidden behind opaque function calls), runtime safety provides configurable protection via the type parameter `S` in `AdaptiveArrayPool{S}`.

### Level Overview
### Binary System

| Level | Name | CPU | CUDA | Overhead |
|-------|------|-----|------|----------|
| **0** | off | No-op (all branches dead-code-eliminated) | Same | Zero |
| **1** | guard | `resize!(v,0)` + `setfield!` invalidation | NaN/sentinel poisoning + cache clear | ~5ns/slot |
| **2** | full | Level 1 + data poisoning + escape detection at scope exit | Level 1 + device-pointer overlap check | Moderate |
| **3** | debug | Level 2 + acquire call-site tracking | Same | Moderate+ |
| `RUNTIME_CHECK` | State | What Happens | Overhead |
|:-:|-------|--------------|----------|
| **0** | off | All safety branches dead-code-eliminated | **Zero** |
| **1** | on | Poisoning + structural invalidation + escape detection + borrow tracking | ~5ns/slot |

### Why CPU and CUDA Differ at Level 1
`RUNTIME_CHECK` is a **compile-time constant** — not a runtime toggle. At `RUNTIME_CHECK = 0`, the JIT eliminates all safety branches completely. No `Ref` reads, no conditional branches, no overhead whatsoever.

Both achieve the same goal — **make stale references fail loudly** — but use different mechanisms:
### Enabling Runtime Safety

| | CPU | CUDA |
|---|-----|------|
| **Strategy** | Structural invalidation | Data poisoning |
| **Mechanism** | `resize!(v, 0)` shrinks backing vector to length 0; `setfield!(:size, (0,))` zeroes the array dimensions | `CUDA.fill!(v, NaN)` / `typemax` / `true` fills backing CuVector with sentinel values |
| **Stale access result** | `BoundsError` (array has length 0) | Reads `NaN` or `typemax` (obviously wrong data) |
| **Why not the other way?** | CPU `resize!` is cheap (~0 cost) | CUDA `resize!` calls `CUDA.Mem.free()` — destroys the pooled VRAM allocation |
| **Cache invalidation** | View length/dims zeroed | N-way view cache entries cleared to `nothing` |
Set the `runtime_check` preference in `LocalPreferences.toml` and **restart Julia**:

```toml
# LocalPreferences.toml
[AdaptiveArrayPools]
runtime_check = 1 # enable all safety checks
# runtime_check = true # also accepted (normalized to 1 internally)
```

### Setting the Level
Or programmatically:

```julia
using AdaptiveArrayPools
using Preferences
Preferences.set_preferences!(AdaptiveArrayPools, "runtime_check" => 1)
# Restart Julia for changes to take effect
```

# Enable full safety on CPU + all GPU devices (preserves cached arrays, zero-copy)
set_safety_level!(2)
!!! warning "Restart Required"
`RUNTIME_CHECK` is baked into the pool type at compile time (`AdaptiveArrayPool{S}`). Changing the preference **requires restarting Julia** — it cannot be toggled at runtime.

# Back to zero overhead everywhere
set_safety_level!(0)
```
### What `RUNTIME_CHECK = 1` Enables

The pool type parameter `S` is a compile-time constant. At `S=0`, the JIT eliminates all safety branches via dead-code elimination — true zero overhead with no `Ref` reads or conditional branches.
When safety is on, `@with_pool` scope exit triggers the following protections:

### Data Poisoning (Level 2+, CPU)
#### 1. Data Poisoning

At Level 1, CPU relies on **structural invalidation** (`resize!` + `setfield!`) which makes stale views throw `BoundsError`. At Level 2+, CPU additionally **poisons** the backing vector data with sentinel values (`NaN`, `typemax`, all-`true` for `BitVector`) *before* structural invalidation. This catches stale access through `unsafe_acquire!` wrappers on Julia 1.10 where `setfield!` on Array is unavailable.
Released arrays are filled with detectable sentinel values **before** structural invalidation:

CUDA already poisons at Level 1 (its primary invalidation strategy), so no additional poisoning step is needed at Level 2.
| Element Type | Poison Value | Detection |
|-------------|-------------|-----------|
| `Float64`, `Float32`, `Float16` | `NaN` | `isnan(x)` returns `true` |
| `Int64`, `Int32`, etc. | `typemax(T)` | Obviously wrong value |
| `ComplexF64`, `ComplexF32` | `NaN + NaN*im` | `isnan(real(x))` |
| `Bool` | `true` | All-true is suspicious |
| Other types | `zero(T)` | Generic fallback |

#### 2. Structural Invalidation

After poisoning, stale references are made to fail loudly:

| | CPU | CUDA |
|---|-----|------|
| **Mechanism** | `resize!(v, 0)` shrinks backing vector; `setfield!(:size, (0,))` zeroes array dimensions | `_resize_to_fit!(v, 0)` shrinks logical length (GPU memory preserved) |
| **Stale access** | `BoundsError` (array has length 0) | `BoundsError` (logical length 0); poisoned data visible on re-acquire |
| **arr_wrapper** | Dimensions set to `(0,)` / `(0,0)` | Same |
| **Why different?** | CPU `resize!` is cheap (~0 cost) | CUDA `resize!` would call `CUDA.Mem.free()` — destroys pooled VRAM |

### Escape Detection (Level 2+)
#### 3. Escape Detection

At every `@with_pool` scope exit, the return value is inspected for overlap with pool-backed memory. Recursively checks `Tuple`, `NamedTuple`, `Dict`, `Pair`, `Set`, and `AbstractArray` elements.

Level 3 additionally records each `acquire!` call-site, so the error message pinpoints the exact source line and expression that allocated the escaping array.
```julia
# Throws PoolRuntimeEscapeError at scope exit
@with_pool pool begin
v = acquire!(pool, Float64, 100)
opaque_function(v) # returns v through opaque call
end
```

### Legacy: `POOL_DEBUG`
#### 4. Borrow Tracking

`POOL_DEBUG[] = true` triggers Level 2 escape detection regardless of `S`. For new code, prefer `set_safety_level!(2)`.
Each `acquire!` call-site is recorded, so escape error messages pinpoint the exact source line and expression that allocated the escaping array:

```
PoolEscapeError (runtime, RUNTIME_CHECK >= 1)

SubArray{Float64, 1, ...}
← backed by Float64 pool memory, will be reclaimed at scope exit
← acquired at src/solver.jl:42
v = acquire!(pool, Float64, n)

Fix: Wrap with collect() to return an owned copy, or compute a scalar result.
```

## Recommended Workflow

```julia
# Development / Testing: catch bugs early
set_safety_level!(2) # or 3 for call-site info in error messages
```toml
# Development / Testing (LocalPreferences.toml):
[AdaptiveArrayPools]
runtime_check = 1 # catch bugs early — restart Julia after changing

# Production: zero overhead
set_safety_level!(0) # all safety branches eliminated by the compiler
# Production:
[AdaptiveArrayPools]
runtime_check = 0 # zero overhead — all safety branches eliminated
```
2 changes: 1 addition & 1 deletion docs/src/reference/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Default element type is `Float64` (CPU) or `Float32` (CUDA).
|--------|-------------|
| `STATIC_POOLING` | Compile-time constant to disable all pooling. (alias: `USE_POOLING`) |
| `MAYBE_POOLING` | Runtime `Ref{Bool}` for `@maybe_with_pool`. (alias: `MAYBE_POOLING_ENABLED`) |
| `POOL_DEBUG` | Runtime `Ref{Bool}` to enable safety validation. |
| `RUNTIME_CHECK` | Compile-time `Int` constant (0=off, 1=on). Set via `runtime_check` preference. Restart required. |
| `set_cache_ways!(n)` | Set N-way cache size (Julia 1.10 / CUDA only; no effect on Julia 1.11+ CPU). |

---
Expand Down
34 changes: 15 additions & 19 deletions ext/AdaptiveArrayPoolsCUDAExt/debug.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,24 @@
# ==============================================================================
# CUDA-specific safety implementations for CuAdaptiveArrayPool{S}.
#
# Safety levels on CUDA differ from CPU:
# - Level 0: Zero overhead (all branches dead-code-eliminated)
# - Level 1: Poisoning (NaN/sentinel fill) + structural invalidation via
# _resize_to_fit!(vec, 0) + arr_wrappers invalidation (setfield!(:dims, zeros))
# - Level 2: Poisoning + escape detection (_validate_pool_return for CuArrays)
# - Level 3: Full + borrow call-site registry + debug messages
# Binary safety system (S=0 off, S=1 all checks):
# - S=0: Zero overhead (all branches dead-code-eliminated)
# - S=1: Poisoning + structural invalidation + escape detection + borrow tracking
#
# Key difference: CPU uses resize!(v, 0) at Level 1 to invalidate stale SubArrays.
# On CUDA, resize!(CuVector, 0) would free GPU memory, so we use
# _resize_to_fit!(vec, 0) instead — sets dims to (0,) while preserving
# the GPU allocation (maxsize). Poisoning fills sentinel data before the shrink.
# arr_wrappers are invalidated by setting wrapper dims to zeros (matches CPU pattern).

using AdaptiveArrayPools: _safety_level, _validate_pool_return,
using AdaptiveArrayPools: _runtime_check, _validate_pool_return,
_set_pending_callsite!, _maybe_record_borrow!,
_invalidate_released_slots!, _zero_dims_tuple,
_throw_pool_escape_error,
POOL_DEBUG, POOL_SAFETY_LV,
PoolRuntimeEscapeError

# ==============================================================================
# Poisoning: Fill released CuVectors with sentinel values (Level 1+)
# Poisoning: Fill released CuVectors with sentinel values (S=1)
# ==============================================================================

_cuda_poison_value(::Type{T}) where {T <: AbstractFloat} = T(NaN)
Expand All @@ -45,12 +41,12 @@ Fill a CuVector with a detectable sentinel value (NaN for floats, typemax for in
end

# ==============================================================================
# _invalidate_released_slots! for CuTypedPool (Level 1+)
# _invalidate_released_slots! for CuTypedPool (S=1)
# ==============================================================================
#
# Overrides the no-op fallback in base. On CUDA:
# - Level 0: no-op (base _rewind_typed_pool! gates with S >= 1, so never called)
# - Level 1+: poison released CuVectors + invalidate arr_wrappers
# - S=0: no-op (base _rewind_typed_pool! gates with S >= 1, so never called)
# - S=1: poison released CuVectors + invalidate arr_wrappers
# - NO resize!(cuv, 0) — would free GPU memory; use _resize_to_fit! instead

@noinline function AdaptiveArrayPools._invalidate_released_slots!(
Expand Down Expand Up @@ -79,22 +75,22 @@ end
end

# ==============================================================================
# Borrow Tracking: Call-site recording (Level 3)
# Borrow Tracking: Call-site recording (S=1)
# ==============================================================================
#
# Overrides the no-op AbstractArrayPool fallbacks.
# The macro injects pool._pending_callsite = "file:line\nexpr" before acquire calls.
# These functions flush that pending info into the borrow log.

"""Record pending callsite for borrow tracking (compiles to no-op when S < 3)."""
"""Record pending callsite for borrow tracking (compiles to no-op when S=0)."""
@inline function AdaptiveArrayPools._set_pending_callsite!(pool::CuAdaptiveArrayPool{S}, msg::String) where {S}
S >= 3 && isempty(pool._pending_callsite) && (pool._pending_callsite = msg)
S >= 1 && isempty(pool._pending_callsite) && (pool._pending_callsite = msg)
return nothing
end

"""Flush pending callsite into borrow log (compiles to no-op when S < 3)."""
"""Flush pending callsite into borrow log (compiles to no-op when S=0)."""
@inline function AdaptiveArrayPools._maybe_record_borrow!(pool::CuAdaptiveArrayPool{S}, tp::AbstractTypedPool) where {S}
S >= 3 && _cuda_record_borrow_from_pending!(pool, tp)
S >= 1 && _cuda_record_borrow_from_pending!(pool, tp)
return nothing
end

Expand All @@ -118,14 +114,14 @@ end
end

# ==============================================================================
# Escape Detection: _validate_pool_return for CuArrays (Level 2+)
# Escape Detection: _validate_pool_return for CuArrays (S=1)
# ==============================================================================
#
# CuArray views share the same device buffer, so device pointer overlap
# detection works correctly. pointer(::CuArray) returns CuPtr{T}.

function AdaptiveArrayPools._validate_pool_return(val, pool::CuAdaptiveArrayPool{S}) where {S}
(S >= 2 || POOL_DEBUG[]) || return nothing
S >= 1 || return nothing
_validate_cuda_return(val, pool)
return nothing
end
Expand Down
8 changes: 4 additions & 4 deletions ext/AdaptiveArrayPoolsCUDAExt/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ Uses Val dispatch for compile-time resolution and full inlining.
@inline AdaptiveArrayPools._get_pool_for_backend(::Val{:cuda}) = get_task_local_cuda_pool()

# ==============================================================================
# Pool Type Registration for Closureless Union Splitting
# Pool Type Registration for Compile-Time Type Assertion
# ==============================================================================
#
# `_pool_type_for_backend` is called at macro expansion time to determine the
# concrete pool type for closureless `let`/`if isa` chain generation.
# This enables `@with_pool :cuda` to generate `if _raw isa CuAdaptiveArrayPool{0} ...`
# instead of closure-based `_dispatch_pool_scope`.
# concrete pool type for direct type assertion in macro-generated code.
# This enables `@with_pool :cuda` to generate `pool::CuAdaptiveArrayPool{S}`
# where S is determined by the compile-time const `RUNTIME_CHECK`.

AdaptiveArrayPools._pool_type_for_backend(::Val{:cuda}) = CuAdaptiveArrayPool
Loading
Loading