geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948)#1952
Merged
brendancol merged 2 commits intoMay 15, 2026
Merged
Conversation
179d83c to
38d5b2f
Compare
Contributor
Author
PR Review: geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948)Blockers (must fix before merge)
Suggestions (should fix, not blocking)
Nits (optional improvements)
What looks good
Checklist
|
brendancol
added a commit
that referenced
this pull request
May 15, 2026
brendancol
added a commit
that referenced
this pull request
May 15, 2026
136661a to
288ff83
Compare
Contributor
Author
PR Review: geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948) (re-review after rebase)Re-reviewing branch at Blockers
Suggestions
Nits
What looks good
Checklist
|
…rewrite (#1948) The COG overview loop inside `write_geotiff_gpu` used to allocate a fresh `current.copy()` before rewriting NaN cells back to the nodata sentinel: ``` current = make_overview_gpu(current, ...) ... nan_mask = cupy.isnan(current) if bool(nan_mask.any().item()): current = current.copy() current[nan_mask] = sentinel ``` `make_overview_gpu` returns a freshly allocated cupy buffer at every call site (the 2-D path ends in `cupy.nan*` / `cupy.around(...).astype(...)` / `cropped[::2, ::2].copy()`; the 3-D path ends in `cupy.stack`), so nothing aliased the buffer between the return and the in-place rewrite. The `current.copy()` allocated a second chunk-sized GPU buffer per overview level for no semantic gain. Replace the two-line rewrite with `cupy.putmask(current, nan_mask, sentinel)` so the existing buffer is mutated in place. Mirrors the in-place sentinel rewrite `_apply_nodata_mask_gpu` adopted in #1934. Tests cover: * structural -- the overview branch uses `cupy.putmask` and no longer contains `current = current.copy()`, * correctness -- a COG write of a float32 raster with an all-sentinel quadrant round-trips through the GPU writer and back to the CPU reader with NaN preserved on every overview level, * contract -- every overview method in `_block_reduce_2d_gpu` returns a cupy buffer whose `data.ptr` differs from the input, so the in-place mutation is safe.
288ff83 to
19fd38f
Compare
Contributor
Author
|
Rebased onto current main at |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1948.
Summary
The COG overview loop inside
write_geotiff_gpuused to allocate afresh
current.copy()before rewriting NaN cells back to the nodatasentinel:
make_overview_gpureturns a freshly allocated cupy buffer at everycall site (the 2-D path ends in
cupy.nan*/cupy.around(...).astype(...)/
cropped[::2, ::2].copy(); the 3-D path ends incupy.stack), sonothing aliased the buffer between the return and the in-place rewrite.
The
current.copy()allocated a second chunk-sized GPU buffer peroverview level for no semantic gain.
The fix replaces the two-line rewrite with
cupy.putmask(current, nan_mask, sentinel)so the existing buffer ismutated in place. Mirrors the in-place sentinel rewrite
_apply_nodata_mask_gpuadopted in #1934.For an 8192x8192 float32 raster with 4 auto-generated overview levels,
the extra allocations sum to roughly 21 MB per write; the saving is
modest but the pattern aligns with #1934 and removes a redundant
device allocation per overview level.
Test plan
test_gpu_writer_overview_loop_uses_putmask_1948-- structuralguard against reintroducing the redundant copy.
test_gpu_writer_cog_overview_sentinel_roundtrip_1948-- COGwrite -> CPU read preserves the sentinel through every overview level
(uses an all-sentinel quadrant so every 2x reduction still hits a
fully-sentinel 2x2 block and triggers the rewrite branch).
test_gpu_writer_overview_uses_make_overview_gpu_fresh_buffer_1948-- contract: every overview method in
_block_reduce_2d_gpureturnsa cupy buffer whose
data.ptrdiffers from the input.unchanged.
Filed under the performance sweep run from
/deep-sweep performanceon the geotiff module.