Conversation
… — CUDA path
Port the `{S}` safety level type parameter from CPU `AdaptiveArrayPool{S}` to the
CUDA extension, enabling compile-time dead-code elimination of safety branches on GPU.
CUDA safety levels differ from CPU due to GPU memory semantics:
- S=0 (off): zero overhead, all safety branches eliminated
- S=1 (guard): poisoning (NaN/sentinel fill) + N-way cache invalidation
(CUDA equivalent of CPU's resize! structural invalidation,
since resize!(CuVector, 0) frees GPU memory)
- S=2 (full): guard + escape detection via device pointer overlap
- S=3 (debug): full + borrow call-site registry
Key changes:
- `CuAdaptiveArrayPool{S}` with borrow tracking fields at all levels
- `_make_cuda_pool(s)` function barrier for runtime→compile-time S
- `_dispatch_pool_scope` union splitting for CUDA pools
- GPU-specific `_invalidate_released_slots!` (poison, no resize!)
- Device pointer overlap escape detection for CuArray
- `set_cuda_safety_level!` for all-device safety level replacement
- S threaded through all rewind paths for compile-time dispatch
Tests: compile-time escape detection, native Level 2/3 macro integration,
borrow callsite tracking, error message formatting, function form
…ation safety, code clarity - Merge double loop into single loop in _invalidate_released_slots! (poison + cache invalidation) - Extract _check_tp_cuda_overlap as @noinline with explicit args (eliminates captured-variable closure) - Fix Dict mutation during iteration in set_cuda_safety_level! via collect(keys(...)) - Extract _transfer_cuda_pool from inner _new closure in _make_cuda_pool - Document container recursion duplication design trade-off - Document SubArray/ReshapedArray branches as defensive code for future CUDA.jl changes
There was a problem hiding this comment.
Pull request overview
Extends AdaptiveArrayPools’ type-parameterized safety dispatch system to the CUDA extension by introducing CuAdaptiveArrayPool{S} (S=0–3) and adding CUDA-specific poisoning/escape-detection/borrow-tracking behavior plus a comprehensive CUDA safety test suite.
Changes:
- Parameterize CUDA pools as
CuAdaptiveArrayPool{S}with_safety_leveldispatch and CUDA union-splitting in_dispatch_pool_scope. - Add CUDA safety runtime behaviors (poisoning, escape detection, borrow tracking) and update display/stats to include
{S}+ safety label. - Add extensive CUDA safety tests and wire them into the CUDA test runner.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/cuda/test_cuda_safety.jl | New end-to-end tests for CUDA safety levels, poisoning, escape detection, borrow tracking, and display/showerror expectations. |
| test/cuda/runtests.jl | Includes the new CUDA safety test file in the CUDA test suite. |
| src/debug.jl | Updates the fallback _validate_pool_return comment to reflect CUDA extension override. |
| ext/AdaptiveArrayPoolsCUDAExt/AdaptiveArrayPoolsCUDAExt.jl | Includes new CUDA debug.jl safety implementation and exports set_cuda_safety_level!. |
| ext/AdaptiveArrayPoolsCUDAExt/types.jl | Introduces CuAdaptiveArrayPool{S}, _safety_level, _make_cuda_pool, transfer helper, and safety label helper. |
| ext/AdaptiveArrayPoolsCUDAExt/task_local_pool.jl | Updates task-local CUDA pool handling for {S} pools and adds set_cuda_safety_level! to replace pools across devices. |
| ext/AdaptiveArrayPoolsCUDAExt/state.jl | Threads S through CUDA rewind paths to enable compile-time safety dispatch; resets borrow-tracking state on reset/empty. |
| ext/AdaptiveArrayPoolsCUDAExt/macros.jl | Adds CUDA-specific _dispatch_pool_scope union splitting for {S} specialization under @with_pool :cuda. |
| ext/AdaptiveArrayPoolsCUDAExt/debug.jl | New CUDA safety implementation: poisoning on rewind, escape detection via pointer overlap, and borrow callsite registry. |
| ext/AdaptiveArrayPoolsCUDAExt/utils.jl | Updates pool_stats/show output to include {S} and safety label. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Level 2 table entry now includes data poisoning for CPU path. New section explains why CPU adds poisoning at LV2 while CUDA already poisons at LV1.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #27 +/- ##
=======================================
Coverage 96.79% 96.79%
=======================================
Files 14 14
Lines 2618 2620 +2
=======================================
+ Hits 2534 2536 +2
Misses 84 84
🚀 New features to boost your workflow:
|
…safety_level! Replace the standalone `set_cuda_safety_level!` public API with an internal `_set_cuda_safety_level_hook!` pattern. `set_safety_level!(level)` now updates both CPU and all CUDA device pools in a single call. - Remove `set_cuda_safety_level!` export from CUDA extension - Add `_set_cuda_safety_level_hook!` no-op stub in base, overridden by ext - Update all CUDA safety tests to use unified `set_safety_level!` - Simplify docs/safety.md and add Documenter pages in make.jl
mgyoo86
referenced
this pull request
Mar 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings the CPU's
AdaptiveArrayPool{S}type-parameterized safety system (PR #26) to the CUDA extension.Key design decisions
resize!(CuVector, 0)callsCUDA.Mem.free()— destroys the pooled allocation. Poisoning (NaN/typemax/true) keeps VRAM alive while making stale data detectable.Bit(BitArray internal). CUDA has noBitequivalent, so bit 7 is repurposed for Float16 — giving it proper lazy first-touch checkpoint behavior.pool::AdaptiveArrayPool; CUDA uses a single_validate_cuda_returnfunction to avoid 6+ method definitions.