Releases: ProjectTorreyPines/AdaptiveArrayPools.jl
v0.3.1
What's New
Metal.jl Support (#34)
Full Apple Silicon GPU backend — same API as CUDA (@with_pool :metal), lazy/typed-lazy modes, full safety suite, and task-local multi-device pools. Requires Julia 1.11+. See 📖 Metal Backend.
Structural Mutation Detection (#35)
Catches resize!/push!/append! on pool-backed arrays:
- Compile-time:
PoolMutationErrorat macro expansion (zero cost) - Runtime: wrapper-vs-backing divergence check at rewind — advisory
@warnwithmaxlog=1(pool self-heals on nextacquire!)
Works across CPU, CUDA, and Metal. See 📖 Pool Safety.
Other
- CUDA extension now gated behind Julia 1.11+ to match Metal
What's Changed
- (feat): Metal.jl (Apple Silicon GPU) backend support by @mgyoo86 in #34
- (feat): compile-time and runtime structural mutation detection by @mgyoo86 in #35
Full Changelog: v0.3.0...v0.3.1
v0.3.0
⚠️ Breaking Changes
Default acquire! returns Array
acquire!now returnsArray{T,N}instead ofSubArray/ReshapedArray. On Julia 1.11+,Arrayis a mutable struct enabling zero-allocationsetfield!reuse — the same guarantee views had, with better FFI/ccall and type constraint compatibility.- New
acquire_view!provides explicit opt-in to the old view behavior.
Remove unsafe_* API
- The entire
unsafe_*API (unsafe_acquire!,unsafe_zeros!,unsafe_ones!,unsafe_similar!) is removed with no deprecation period.
Migration Guide
unsafe_acquire! → acquire!
unsafe_zeros! → zeros!
unsafe_ones! → ones!
unsafe_similar! → similar!
If you relied on acquire! returning views: acquire! → acquire_view!
See the full 📖 Migration Guide for details.
What's Changed
Full Changelog: v0.2.6...v0.3.0
v0.2.6
Eliminate try-finally Overhead in @with_pool
@with_pool now uses direct rewind insertion instead of try-finally, enabling compiler inlining for ~15-25% speedup on hot-loop patterns.
New @safe_with_pool / @safe_maybe_with_pool macros preserve the old try-finally behavior for code that needs guaranteed exception cleanup.
What's Changed
Full Changelog: v0.2.5...v0.2.6
v0.2.5
Fix Compile-Time Explosion in Nested @with_pool
Nested or @inline @with_pool could cause compile-time explosion due to exponential macro expansion. Fixed by replacing union splitting with a single compile-time type assertion.
Safety system simplified from 4-tier (0–3) to binary RUNTIME_CHECK (0=off, 1=on) via LocalPreferences.toml. POOL_DEBUG, POOL_SAFETY_LV, set_safety_level! are removed.
See 📖 Safety and 📖 Configuration docs for details.
What's Changed
Full Changelog: v0.2.4...v0.2.5
v0.2.4
Hotfix: Eliminate Core.Box Allocation in @inline @with_pool
v0.2.3 introduced a Core.Box boxing regression (32–48 bytes) in @inline @with_pool functions, caused by closure-based dispatch interacting with try/finally boundaries after inlining.
Replaced with closureless union splitting on both CPU and CUDA — zero allocation on Julia 1.12+. On Julia <1.12, a minor 16-byte let-scope overhead may appear at top-level but disappears inside functions and hot loops.
What's Changed
Full Changelog: v0.2.3...v0.2.4
v0.2.3
New Features
Pool Safety System
Two-layer safety for catching pool misuse:
- Compile-time (
STATIC_POOL_CHECKS):@with_poolAST analysis detects escaped pool-backed variables at macro-expansion time — zero runtime cost. - Runtime (
POOL_SAFETY_LV): progressive protection levels — L1 invalidates released slots viaresize!/setfield!, L2 adds full borrow tracking with escape detection. - Type-parameterized
AdaptiveArrayPool{S}: encodes safety level as a type parameter, enabling dead-code elimination atS=0for true zero overhead. CPU and CUDA.
Unlimited Zero-Alloc Dimension Patterns
On Julia 1.11+, both CPU and CUDA now use setfield!-based wrapper reuse — unlimited dimension patterns per slot are zero-allocation on the hot path. The old 4-way set-associative cache (with its eviction limit) is removed on both backends.
Zero-Alloc reshape! Support (CPU & CUDA)
reshape!(pool, A, dims...) now works on both CPU and CUDA. Same-dim reshapes are in-place; cross-dim reshapes reuse cached wrappers — zero allocation after warmup.
GPU Memory Preservation
_resize_to_fit! avoids GPU reallocation on CuVector shrink (CUDA resize! below 25% capacity triggers alloc→copy→free). Preserves VRAM across pool rewind cycles.
Limitations
The setfield!-based wrapper reuse and reshape! zero-allocation features require Julia 1.11+ on CPU. On Julia 1.10 (LTS), the CPU path retains the previous N-way set-associative cache behavior with CACHE_WAYS=4 eviction limit. CUDA is unaffected (always uses the new path).
What's Changed
- (refac): Slot-first architecture: remove view cache, isolate legacy by @mgyoo86 in #23
- (fix):
USE_POOLING=falsepath fixes and 2-tier toggle rename by @mgyoo86 in #24 - (feat): add 2-tier pool safety — compile-time escape detection + runtime validation by @mgyoo86 in #25
- (feat): Type-parameterized safety dispatch (
Pool{S}) — CPU path by @mgyoo86 in #26 - (feat): pool safety dispatch — CUDA path by @mgyoo86 in #27
- (feat): Avoid GPU realloc on CuVector shrink by @mgyoo86 in #28
- (feat): CUDA arr_wrappers — Zero-Alloc CuArray Reuse via
setfield!by @mgyoo86 in #29
Full Changelog: v0.2.2...v0.2.3
v0.2.2
New Feature
reshape!(pool, A, dims...)— Zero-allocation array reshaping via pool wrapper cache. On Julia 1.11+, cross-dim reshapes reuse cached wrappers with no allocation after warmup. (docs 📖)
Code Quality
- Adopt Runic.jl formatter with CI enforcement
What's Changed
Full Changelog: v0.2.1...v0.2.2
v0.2.1
Bug Fix
Fallback Type Memory Leak
- Fixed a memory leak where non-fixed-slot types (e.g.,
ForwardDiff.Dual) were not properly reclaimed onrewind!during repeated@with_poolcalls. This causedn_activeto grow unboundedly in workloads likeForwardDiff.gradientwith pooled interpolation. - No performance regression — the fix is resolved entirely at compile time. CUDA extension included.
What's Changed
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's New
Lazy Selective Rewind
@with_pool now defers per-type checkpoints until first acquire! and rewinds only the pools actually touched, instead of all 8 fixed-slot types. Up to 6.5× faster in common patterns with helper functions.
Bitmask-Aware Type Tracking
Per-type bitmask tracking replaces the boolean _untracked_flags system. When helper functions acquire types already tracked by the macro, the fast typed path is preserved via subset check (untracked ⊆ tracked).
Internal Naming Refactor
Internal identifiers renamed to reflect lazy-selective-rewind architecture (e.g. _mark_untracked! → _record_type_touch!, _depth_only_checkpoint! → _lazy_checkpoint!). Magic hex literals replaced with named constants. No user-facing API changes.
What's Changed
- (feat): Bitmask-Aware Untracked Tracking for
@with_poolby @mgyoo86 in #16 - (perf): Dynamic Selective Rewind & Typed-Fallback Optimization by @mgyoo86 in #17
- (refactor): rename internals to match evolved architecture by @mgyoo86 in #18
Full Changelog: v0.1.2...v0.2.0
v0.1.2
What's New
⚠️ Breaking: Unified Bit Type API
acquire!(pool, Bit, n) now returns BitVector instead of SubArray{Bool}.
Why: Native BitVector utilizes SIMD-optimized chunk algorithms, making operations like count(), sum(), and bitwise broadcasting 10×–100× faster compared to SubArray{Bool} views.
# Before (v0.1.1): Returned SubArray{Bool}
# After (v0.1.2): Returns native BitVector (SIMD optimized)
@with_pool pool function foo()
bv = acquire!(pool, Bit, 10_000)
# Operations using packed bits are significantly faster
c = count(bv) # 10x~100x speedup vs view behavior
endMigration: No code changes needed for typical usage. Only affects code explicitly type-checking for SubArray.
What's Changed
Full Changelog: v0.1.1...v0.1.2