Skip to content

(fix): eliminate Core.Box allocation in @inline @with_pool functions#30

Merged
mgyoo86 merged 5 commits intomasterfrom
fix/with_pool_alloc
Mar 12, 2026
Merged

(fix): eliminate Core.Box allocation in @inline @with_pool functions#30
mgyoo86 merged 5 commits intomasterfrom
fix/with_pool_alloc

Conversation

@mgyoo86
Copy link
Member

@mgyoo86 mgyoo86 commented Mar 12, 2026

Problem

@inline @with_pool pool function caused heap allocations (32–48 bytes per scope) due to Core.Box boxing. When the compiler inlined the _dispatch_pool_scope closure into outer callers crossing try/finally boundaries, it lost type tracking and boxed closure-captured variables.

This was a regression from v0.2.2, where AdaptiveArrayPool was non-parametric and didn't need union splitting.

Fix

Replace the closure-based _dispatch_pool_scope(pool -> body, getter) with a closureless let/if isa chain directly in the macro expansion:

let _raw = get_task_local_pool()
    if _raw isa AdaptiveArrayPool{0}
        let pool = _raw::AdaptiveArrayPool{0}; body end
    elseif _raw isa AdaptiveArrayPool{1}
        ...
    end
end

No closure → no boxing → zero allocation.

Allocation comparison (per @with_pool scope, fixed cost)

Version v0.2.2 v0.2.3 (closure) This PR
Julia 1.12 0B 32B 0B
Julia 1.10/1.11 16B 48B 16B
  • Cost is fixed per scope, not per acquire! (1 acquire = 8 acquires = same)
  • Hot loops inside functions: 0B on all versions (compiler eliminates let scope)

Test changes

  • _ZERO_ALLOC_THRESHOLD: 0 on Julia ≥1.12, 16 on <1.12
  • Hot loop tests wrapped in function barriers for consistent 0B measurement
  • All 23 zero-allocation tests pass on 1.10, 1.11, and 1.12

mgyoo86 added 2 commits March 11, 2026 19:59
Replace closure-based _dispatch_pool_scope with inline if/elseif/let
chain to eliminate Core.Box boxing when @inline @with_pool functions
are inlined into callers crossing try/finally boundaries.

Allocation improvements (per @with_pool scope, fixed cost):
- Julia 1.12: 32B → 0B (@inline), 0B unchanged (noinline)
- Julia 1.10/1.11: 48B → 16B (@inline), 16B unchanged (noinline)

Update test_zero_allocation.jl with version-dependent threshold
(_ZERO_ALLOC_THRESHOLD: 0 on ≥1.12, 16 on <1.12) and function
barriers for hot loop tests.
@mgyoo86 mgyoo86 requested a review from Copilot March 12, 2026 03:00
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.97%. Comparing base (a3da1a3) to head (85d5175).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #30      +/-   ##
==========================================
+ Coverage   96.54%   96.97%   +0.43%     
==========================================
  Files          14       14              
  Lines        2632     2645      +13     
==========================================
+ Hits         2541     2565      +24     
+ Misses         91       80      -11     
Files with missing lines Coverage Δ
src/macros.jl 98.90% <100.00%> (-0.10%) ⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Eliminates heap allocations caused by Core.Box when @with_pool-wrapped functions are inlined, by replacing a closure-based dispatch with closureless union-splitting in the macro expansion.

Changes:

  • Reworked @with_pool macro expansion to avoid _dispatch_pool_scope closures and prevent boxing across try/finally.
  • Added/adjusted allocation regression tests for @inline @with_pool and version-dependent thresholds / function barriers.
  • Expanded CI matrix to include Julia 1.11 explicitly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
src/macros.jl Replaces closure-based pool dispatch with a closureless let + isa chain to avoid boxing/allocations.
test/test_zero_allocation.jl Adds regression tests for @inline @with_pool and version-specific allocation thresholds/function barriers.
test/test_reshape.jl Adds a function barrier for stable @allocated measurement.
.github/workflows/CI.yml Adds Julia 1.11 to CI matrix for coverage of the affected behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

mgyoo86 added 3 commits March 11, 2026 20:17
Add _pool_type_for_backend trait so _wrap_with_dispatch generates
correct isa checks per backend (AdaptiveArrayPool for CPU,
CuAdaptiveArrayPool for CUDA). Removes closure-based
_dispatch_pool_scope from both paths.

Without this fix, @with_pool :cuda hit TypeError at runtime:
  expected AdaptiveArrayPool{3}, got CuAdaptiveArrayPool{0}
_pool_type_for_backend returns nothing (instead of error) for unloaded
backends, so _wrap_with_dispatch falls back to closure-based dispatch.
Fixes LTS CI failure where @macroexpand @with_pool :cuda ran without
CUDA extension loaded.
@mgyoo86 mgyoo86 merged commit b63aebe into master Mar 12, 2026
14 checks passed
@mgyoo86 mgyoo86 deleted the fix/with_pool_alloc branch March 12, 2026 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants