Skip to content

Releases: ProjectTorreyPines/AdaptiveArrayPools.jl

v0.3.1

14 Mar 21:21

Choose a tag to compare

What's New

Metal.jl Support (#34)

Full Apple Silicon GPU backend — same API as CUDA (@with_pool :metal), lazy/typed-lazy modes, full safety suite, and task-local multi-device pools. Requires Julia 1.11+. See 📖 Metal Backend.

Structural Mutation Detection (#35)

Catches resize!/push!/append! on pool-backed arrays:

  • Compile-time: PoolMutationError at macro expansion (zero cost)
  • Runtime: wrapper-vs-backing divergence check at rewind — advisory @warn with maxlog=1 (pool self-heals on next acquire!)

Works across CPU, CUDA, and Metal. See 📖 Pool Safety.

Other

  • CUDA extension now gated behind Julia 1.11+ to match Metal

What's Changed

  • (feat): Metal.jl (Apple Silicon GPU) backend support by @mgyoo86 in #34
  • (feat): compile-time and runtime structural mutation detection by @mgyoo86 in #35

Full Changelog: v0.3.0...v0.3.1

v0.3.0

13 Mar 21:14

Choose a tag to compare

⚠️ Breaking Changes

Default acquire! returns Array

  • acquire! now returns Array{T,N} instead of SubArray/ReshapedArray. On Julia 1.11+, Array is a mutable struct enabling zero-allocation setfield! reuse — the same guarantee views had, with better FFI/ccall and type constraint compatibility.
  • New acquire_view! provides explicit opt-in to the old view behavior.

Remove unsafe_* API

  • The entire unsafe_* API (unsafe_acquire!, unsafe_zeros!, unsafe_ones!, unsafe_similar!) is removed with no deprecation period.

Migration Guide

unsafe_acquire!  →  acquire!
unsafe_zeros!    →  zeros!
unsafe_ones!     →  ones!
unsafe_similar!  →  similar!

If you relied on acquire! returning views: acquire!acquire_view!

See the full 📖 Migration Guide for details.

What's Changed

  • Default acquire! → Array, remove unsafe_* API by @mgyoo86 in #33

Full Changelog: v0.2.6...v0.3.0

v0.2.6

13 Mar 17:37

Choose a tag to compare

Eliminate try-finally Overhead in @with_pool

@with_pool now uses direct rewind insertion instead of try-finally, enabling compiler inlining for ~15-25% speedup on hot-loop patterns.

New @safe_with_pool / @safe_maybe_with_pool macros preserve the old try-finally behavior for code that needs guaranteed exception cleanup.

What's Changed

  • Remove try-finally from @with_pool for inlining performance by @mgyoo86 in #32

Full Changelog: v0.2.5...v0.2.6

v0.2.5

12 Mar 21:21

Choose a tag to compare

Fix Compile-Time Explosion in Nested @with_pool

Nested or @inline @with_pool could cause compile-time explosion due to exponential macro expansion. Fixed by replacing union splitting with a single compile-time type assertion.

Safety system simplified from 4-tier (0–3) to binary RUNTIME_CHECK (0=off, 1=on) via LocalPreferences.toml. POOL_DEBUG, POOL_SAFETY_LV, set_safety_level! are removed.

See 📖 Safety and 📖 Configuration docs for details.

What's Changed

  • (fix): compile-time explosion in nested @with_pool + binary RUNTIME_CHECK by @mgyoo86 in #31

Full Changelog: v0.2.4...v0.2.5

v0.2.4

12 Mar 04:13

Choose a tag to compare

Hotfix: Eliminate Core.Box Allocation in @inline @with_pool

v0.2.3 introduced a Core.Box boxing regression (32–48 bytes) in @inline @with_pool functions, caused by closure-based dispatch interacting with try/finally boundaries after inlining.

Replaced with closureless union splitting on both CPU and CUDA — zero allocation on Julia 1.12+. On Julia <1.12, a minor 16-byte let-scope overhead may appear at top-level but disappears inside functions and hot loops.

What's Changed

  • (fix): Closureless union splitting — eliminate Core.Box allocation in @with_pool by @mgyoo86 in #30

Full Changelog: v0.2.3...v0.2.4

v0.2.3

11 Mar 20:49

Choose a tag to compare

New Features

Pool Safety System

Two-layer safety for catching pool misuse:

  • Compile-time (STATIC_POOL_CHECKS): @with_pool AST analysis detects escaped pool-backed variables at macro-expansion time — zero runtime cost.
  • Runtime (POOL_SAFETY_LV): progressive protection levels — L1 invalidates released slots via resize!/setfield!, L2 adds full borrow tracking with escape detection.
  • Type-parameterized AdaptiveArrayPool{S}: encodes safety level as a type parameter, enabling dead-code elimination at S=0 for true zero overhead. CPU and CUDA.

Unlimited Zero-Alloc Dimension Patterns

On Julia 1.11+, both CPU and CUDA now use setfield!-based wrapper reuse — unlimited dimension patterns per slot are zero-allocation on the hot path. The old 4-way set-associative cache (with its eviction limit) is removed on both backends.

Zero-Alloc reshape! Support (CPU & CUDA)

reshape!(pool, A, dims...) now works on both CPU and CUDA. Same-dim reshapes are in-place; cross-dim reshapes reuse cached wrappers — zero allocation after warmup.

GPU Memory Preservation

_resize_to_fit! avoids GPU reallocation on CuVector shrink (CUDA resize! below 25% capacity triggers alloc→copy→free). Preserves VRAM across pool rewind cycles.

Limitations

The setfield!-based wrapper reuse and reshape! zero-allocation features require Julia 1.11+ on CPU. On Julia 1.10 (LTS), the CPU path retains the previous N-way set-associative cache behavior with CACHE_WAYS=4 eviction limit. CUDA is unaffected (always uses the new path).

What's Changed

  • (refac): Slot-first architecture: remove view cache, isolate legacy by @mgyoo86 in #23
  • (fix): USE_POOLING=false path fixes and 2-tier toggle rename by @mgyoo86 in #24
  • (feat): add 2-tier pool safety — compile-time escape detection + runtime validation by @mgyoo86 in #25
  • (feat): Type-parameterized safety dispatch (Pool{S}) — CPU path by @mgyoo86 in #26
  • (feat): pool safety dispatch — CUDA path by @mgyoo86 in #27
  • (feat): Avoid GPU realloc on CuVector shrink by @mgyoo86 in #28
  • (feat): CUDA arr_wrappers — Zero-Alloc CuArray Reuse via setfield! by @mgyoo86 in #29

Full Changelog: v0.2.2...v0.2.3

v0.2.2

06 Mar 08:57

Choose a tag to compare

New Feature

  • reshape!(pool, A, dims...) — Zero-allocation array reshaping via pool wrapper cache. On Julia 1.11+, cross-dim reshapes reuse cached wrappers with no allocation after warmup. (docs 📖)

Code Quality

  • Adopt Runic.jl formatter with CI enforcement

What's Changed

Full Changelog: v0.2.1...v0.2.2

v0.2.1

04 Mar 06:21

Choose a tag to compare

Bug Fix

Fallback Type Memory Leak

  • Fixed a memory leak where non-fixed-slot types (e.g., ForwardDiff.Dual) were not properly reclaimed on rewind! during repeated @with_pool calls. This caused n_active to grow unboundedly in workloads like ForwardDiff.gradient with pooled interpolation.
  • No performance regression — the fix is resolved entirely at compile time. CUDA extension included.

What's Changed

  • (fix): fallback type memory leak in checkpoint/rewind cycle by @mgyoo86 in #19

Full Changelog: v0.2.0...v0.2.1

v0.2.0

18 Feb 21:55

Choose a tag to compare

What's New

Lazy Selective Rewind

@with_pool now defers per-type checkpoints until first acquire! and rewinds only the pools actually touched, instead of all 8 fixed-slot types. Up to 6.5× faster in common patterns with helper functions.

Bitmask-Aware Type Tracking

Per-type bitmask tracking replaces the boolean _untracked_flags system. When helper functions acquire types already tracked by the macro, the fast typed path is preserved via subset check (untracked ⊆ tracked).

Internal Naming Refactor

Internal identifiers renamed to reflect lazy-selective-rewind architecture (e.g. _mark_untracked!_record_type_touch!, _depth_only_checkpoint!_lazy_checkpoint!). Magic hex literals replaced with named constants. No user-facing API changes.

What's Changed

  • (feat): Bitmask-Aware Untracked Tracking for @with_pool by @mgyoo86 in #16
  • (perf): Dynamic Selective Rewind & Typed-Fallback Optimization by @mgyoo86 in #17
  • (refactor): rename internals to match evolved architecture by @mgyoo86 in #18

Full Changelog: v0.1.2...v0.2.0

v0.1.2

31 Jan 18:33

Choose a tag to compare

What's New

⚠️ Breaking: Unified Bit Type API

acquire!(pool, Bit, n) now returns BitVector instead of SubArray{Bool}.

Why: Native BitVector utilizes SIMD-optimized chunk algorithms, making operations like count(), sum(), and bitwise broadcasting 10×–100× faster compared to SubArray{Bool} views.

# Before (v0.1.1): Returned SubArray{Bool}
# After  (v0.1.2): Returns native BitVector (SIMD optimized)

@with_pool pool function foo()
    bv = acquire!(pool, Bit, 10_000)

    # Operations using packed bits are significantly faster
    c = count(bv)  # 10x~100x speedup vs view behavior
end

Migration: No code changes needed for typical usage. Only affects code explicitly type-checking for SubArray.

What's Changed

  • ⚠️ (refac): return BitVector for performance by @mgyoo86 in #15

Full Changelog: v0.1.1...v0.1.2