Skip to content

(feat): Avoid GPU realloc on CuVector shrink#28

Merged
mgyoo86 merged 3 commits intomasterfrom
feat/cuda_resize
Mar 11, 2026
Merged

(feat): Avoid GPU realloc on CuVector shrink#28
mgyoo86 merged 3 commits intomasterfrom
feat/cuda_resize

Conversation

@mgyoo86
Copy link
Member

@mgyoo86 mgyoo86 commented Mar 11, 2026

Background

Pool operations frequently resize! backing vectors. On CPU, shrinking preserves capacity — no problem. On CUDA, shrinking below 25% of capacity triggers a full GPU reallocation (alloc → copy → free). resize!(v, 0) in particular always reallocates, defeating the pool's zero-allocation guarantee.

Solution

_resize_without_shrink! changes only the logical size (dims) on shrink, preserving the GPU allocation. When the vector grows back to the same size, the existing memory is reused with no reallocation.

Impact

  • Acquire path: reusing a slot for a smaller array no longer triggers GPU realloc
  • Safety Level 1 parity: released slots can now be structurally invalidated to length 0 (poison → shrink to 0), matching CPU's resize!(vec, 0) behavior. Previously CUDA could only poison without structural invalidation due to the memory-free issue
  • CPU–CUDA safety symmetry: Levels 0–3 now check identically across both backends

Compatibility

  • Relies on CuArray internals (:dims, .maxsize) — @assert hasfield compile-time guard added (tested with CUDA.jl v5.x)

mgyoo86 added 2 commits March 10, 2026 21:54
CUDA.jl's resize! triggers pool_alloc + copy + pool_free when shrinking
below 25% capacity (n < cap÷4). This is expensive for pool operations:
- Safety invalidation: resize!(vec, 0) on every released slot
- Acquire path: resize!(vec, smaller) when reusing slots

Solution: _resize_without_shrink!(A, n) uses setfield!(:dims) for shrink
(zero-cost, GPU memory preserved via maxsize) and delegates to resize!
for grow. Grow-back after shrink reuses existing allocation since
CUDA.jl sees n ≤ cap from preserved maxsize.

Changes:
- Add _resize_without_shrink! in CUDA ext acquire.jl
- Replace resize! in get_view! acquire path
- Add _resize_without_shrink!(vec, 0) to _invalidate_released_slots!
  (CUDA Level 1 now has structural invalidation matching CPU behavior)
- Update Level 1 safety tests: verify length==0 + poison via re-acquire
- Add _resize_without_shrink! pointer-preservation tests
…d guard

- Update debug.jl header: Level 1 now includes structural invalidation
  via _resize_without_shrink!(vec, 0), not just poisoning
- Fix acquire.jl cache invalidation comment: clarify grow vs shrink
  pointer stability semantics
- Update get_view! docstring to describe _resize_without_shrink! behavior
- Add @Assert hasfield(CuArray, :dims) compile-time guard for CUDA.jl
  internal API compatibility
@mgyoo86 mgyoo86 requested a review from Copilot March 11, 2026 05:15
@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.57%. Comparing base (7dc30c8) to head (29c110b).
⚠️ Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (7dc30c8) and HEAD (29c110b). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (7dc30c8) HEAD (29c110b)
6 3
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #28       +/-   ##
===========================================
- Coverage   96.79%   74.57%   -22.22%     
===========================================
  Files          14       14               
  Lines        2620     2616        -4     
===========================================
- Hits         2536     1951      -585     
- Misses         84      665      +581     

see 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a CUDA-specific resize strategy to avoid costly GPU reallocations when pooled CuVectors are shrunk (especially to length 0), restoring the pool’s “no GPU allocation on reuse” behavior and making CUDA safety level 1 structurally invalidate released slots like the CPU backend.

Changes:

  • Introduces _resize_without_shrink! for CuVector shrink operations (dims-only shrink; grow delegates to resize!) and uses it in the CUDA acquire path.
  • Updates CUDA safety level 1 invalidation to shrink released vectors to length 0 after poisoning (without freeing GPU memory).
  • Adjusts CUDA tests to assert length→0 after rewind and adds pointer-preservation tests for _resize_without_shrink!.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
ext/AdaptiveArrayPoolsCUDAExt/acquire.jl Adds _resize_without_shrink! and uses it to avoid GPU realloc on shrink in get_view!.
ext/AdaptiveArrayPoolsCUDAExt/debug.jl Level 1 invalidation now shrinks released vectors to length 0 using _resize_without_shrink! after poisoning.
test/cuda/test_cuda_safety.jl Updates safety tests to reflect structural invalidation (length 0) and verifies poisoning via re-acquire.
test/cuda/test_allocation.jl Adds tests ensuring shrink/grow-back preserves the GPU pointer when within maxsize.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Replace @Assert with @static if + error() for CUDA.jl compat guard
  (@Assert could theoretically be disabled; @static if always runs)
- Add ismutable(CuArray) check (setfield! requires mutable type)
- Add n >= 0 ArgumentError guard against negative dimensions
@mgyoo86 mgyoo86 merged commit 5f143c2 into master Mar 11, 2026
5 of 8 checks passed
@mgyoo86 mgyoo86 deleted the feat/cuda_resize branch March 11, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants