Skip to content

(feat): pool safety dispatch — CUDA path#27

Merged
mgyoo86 merged 6 commits intomasterfrom
feat/cuda_safety
Mar 11, 2026
Merged

(feat): pool safety dispatch — CUDA path#27
mgyoo86 merged 6 commits intomasterfrom
feat/cuda_safety

Conversation

@mgyoo86
Copy link
Member

@mgyoo86 mgyoo86 commented Mar 10, 2026

Summary

Brings the CPU's AdaptiveArrayPool{S} type-parameterized safety system (PR #26) to the CUDA extension.

Key design decisions

  • Poisoning over resize!: resize!(CuVector, 0) calls CUDA.Mem.free() — destroys the pooled allocation. Poisoning (NaN/typemax/true) keeps VRAM alive while making stale data detectable.
  • Float16 bit 7: CPU uses bit 7 for Bit (BitArray internal). CUDA has no Bit equivalent, so bit 7 is repurposed for Float16 — giving it proper lazy first-touch checkpoint behavior.
  • Container recursion: Duplicated from CPU path (documented trade-off). CPU dispatches on pool::AdaptiveArrayPool; CUDA uses a single _validate_cuda_return function to avoid 6+ method definitions.

mgyoo86 added 2 commits March 10, 2026 15:27
… — CUDA path

Port the `{S}` safety level type parameter from CPU `AdaptiveArrayPool{S}` to the
CUDA extension, enabling compile-time dead-code elimination of safety branches on GPU.

CUDA safety levels differ from CPU due to GPU memory semantics:
- S=0 (off): zero overhead, all safety branches eliminated
- S=1 (guard): poisoning (NaN/sentinel fill) + N-way cache invalidation
               (CUDA equivalent of CPU's resize! structural invalidation,
                since resize!(CuVector, 0) frees GPU memory)
- S=2 (full): guard + escape detection via device pointer overlap
- S=3 (debug): full + borrow call-site registry

Key changes:
- `CuAdaptiveArrayPool{S}` with borrow tracking fields at all levels
- `_make_cuda_pool(s)` function barrier for runtime→compile-time S
- `_dispatch_pool_scope` union splitting for CUDA pools
- GPU-specific `_invalidate_released_slots!` (poison, no resize!)
- Device pointer overlap escape detection for CuArray
- `set_cuda_safety_level!` for all-device safety level replacement
- S threaded through all rewind paths for compile-time dispatch

Tests: compile-time escape detection, native Level 2/3 macro integration,
borrow callsite tracking, error message formatting, function form
…ation safety, code clarity

- Merge double loop into single loop in _invalidate_released_slots! (poison + cache invalidation)
- Extract _check_tp_cuda_overlap as @noinline with explicit args (eliminates captured-variable closure)
- Fix Dict mutation during iteration in set_cuda_safety_level! via collect(keys(...))
- Extract _transfer_cuda_pool from inner _new closure in _make_cuda_pool
- Document container recursion duplication design trade-off
- Document SubArray/ReshapedArray branches as defensive code for future CUDA.jl changes
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends AdaptiveArrayPools’ type-parameterized safety dispatch system to the CUDA extension by introducing CuAdaptiveArrayPool{S} (S=0–3) and adding CUDA-specific poisoning/escape-detection/borrow-tracking behavior plus a comprehensive CUDA safety test suite.

Changes:

  • Parameterize CUDA pools as CuAdaptiveArrayPool{S} with _safety_level dispatch and CUDA union-splitting in _dispatch_pool_scope.
  • Add CUDA safety runtime behaviors (poisoning, escape detection, borrow tracking) and update display/stats to include {S} + safety label.
  • Add extensive CUDA safety tests and wire them into the CUDA test runner.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/cuda/test_cuda_safety.jl New end-to-end tests for CUDA safety levels, poisoning, escape detection, borrow tracking, and display/showerror expectations.
test/cuda/runtests.jl Includes the new CUDA safety test file in the CUDA test suite.
src/debug.jl Updates the fallback _validate_pool_return comment to reflect CUDA extension override.
ext/AdaptiveArrayPoolsCUDAExt/AdaptiveArrayPoolsCUDAExt.jl Includes new CUDA debug.jl safety implementation and exports set_cuda_safety_level!.
ext/AdaptiveArrayPoolsCUDAExt/types.jl Introduces CuAdaptiveArrayPool{S}, _safety_level, _make_cuda_pool, transfer helper, and safety label helper.
ext/AdaptiveArrayPoolsCUDAExt/task_local_pool.jl Updates task-local CUDA pool handling for {S} pools and adds set_cuda_safety_level! to replace pools across devices.
ext/AdaptiveArrayPoolsCUDAExt/state.jl Threads S through CUDA rewind paths to enable compile-time safety dispatch; resets borrow-tracking state on reset/empty.
ext/AdaptiveArrayPoolsCUDAExt/macros.jl Adds CUDA-specific _dispatch_pool_scope union splitting for {S} specialization under @with_pool :cuda.
ext/AdaptiveArrayPoolsCUDAExt/debug.jl New CUDA safety implementation: poisoning on rewind, escape detection via pointer overlap, and borrow callsite registry.
ext/AdaptiveArrayPoolsCUDAExt/utils.jl Updates pool_stats/show output to include {S} and safety label.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mgyoo86 added 2 commits March 10, 2026 16:39
Level 2 table entry now includes data poisoning for CPU path.
New section explains why CPU adds poisoning at LV2 while CUDA
already poisons at LV1.
@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.79%. Comparing base (055eb1f) to head (302dc5f).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #27   +/-   ##
=======================================
  Coverage   96.79%   96.79%           
=======================================
  Files          14       14           
  Lines        2618     2620    +2     
=======================================
+ Hits         2534     2536    +2     
  Misses         84       84           
Files with missing lines Coverage Δ
src/debug.jl 97.07% <ø> (ø)
src/task_local_pool.jl 96.42% <100.00%> (+0.27%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mgyoo86 added 2 commits March 10, 2026 19:13
…safety_level!

Replace the standalone `set_cuda_safety_level!` public API with an
internal `_set_cuda_safety_level_hook!` pattern. `set_safety_level!(level)`
now updates both CPU and all CUDA device pools in a single call.

- Remove `set_cuda_safety_level!` export from CUDA extension
- Add `_set_cuda_safety_level_hook!` no-op stub in base, overridden by ext
- Update all CUDA safety tests to use unified `set_safety_level!`
- Simplify docs/safety.md and add Documenter pages in make.jl
@mgyoo86 mgyoo86 merged commit 7dc30c8 into master Mar 11, 2026
11 checks passed
@mgyoo86 mgyoo86 deleted the feat/cuda_safety branch March 11, 2026 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants