geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948) by brendancol · Pull Request #1952 · xarray-contrib/xarray-spatial

brendancol · 2026-05-15T15:28:48Z

Closes #1948.

Summary

The COG overview loop inside write_geotiff_gpu used to allocate a
fresh current.copy() before rewriting NaN cells back to the nodata
sentinel:

current = make_overview_gpu(current, ...)
...
nan_mask = cupy.isnan(current)
if bool(nan_mask.any().item()):
    current = current.copy()
    current[nan_mask] = sentinel

make_overview_gpu returns a freshly allocated cupy buffer at every
call site (the 2-D path ends in cupy.nan* / cupy.around(...).astype(...)
/ cropped[::2, ::2].copy(); the 3-D path ends in cupy.stack), so
nothing aliased the buffer between the return and the in-place rewrite.
The current.copy() allocated a second chunk-sized GPU buffer per
overview level for no semantic gain.

The fix replaces the two-line rewrite with
cupy.putmask(current, nan_mask, sentinel) so the existing buffer is
mutated in place. Mirrors the in-place sentinel rewrite
_apply_nodata_mask_gpu adopted in #1934.

For an 8192x8192 float32 raster with 4 auto-generated overview levels,
the extra allocations sum to roughly 21 MB per write; the saving is
modest but the pattern aligns with #1934 and removes a redundant
device allocation per overview level.

Test plan

test_gpu_writer_overview_loop_uses_putmask_1948 -- structural
guard against reintroducing the redundant copy.
test_gpu_writer_cog_overview_sentinel_roundtrip_1948 -- COG
write -> CPU read preserves the sentinel through every overview level
(uses an all-sentinel quadrant so every 2x reduction still hits a
fully-sentinel 2x2 block and triggers the rewrite branch).
test_gpu_writer_overview_uses_make_overview_gpu_fresh_buffer_1948
-- contract: every overview method in _block_reduce_2d_gpu returns
a cupy buffer whose data.ptr differs from the input.
Existing GPU writer + overview + COG suite (137 tests) passes
unchanged.

Filed under the performance sweep run from /deep-sweep performance
on the geotiff module.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

brendancol · 2026-05-15T15:59:08Z

brendancol · 2026-05-15T17:30:30Z

PR Review: geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948) (re-review after rebase)

Re-reviewing branch at 288ff83 after rebase onto current main. Two commits: the original bbec25c putmask swap and the follow-up 288ff83 that hoists the loop-invariant sentinel cast and anchors the test slice on parts.append(. CPU-side tests (test_gpu_writer_overview_loop_uses_putmask_1948) pass locally; the two GPU-gated tests skip without a device.

Blockers

None

Suggestions

None

Nits

None

What looks good

The hoist of rewrite_nodata and sentinel_scalar out of the inner while is safe. make_overview_gpu preserves dtype (the comment cites the four buffer-producing call sites inside _block_reduce_2d_gpu and the 3-D cupy.stack path), and np_dtype is captured once on line 399 from the source arr. The cast and the float/finite gate no longer repeat per level.
The cupy.putmask(current, nan_mask, sentinel_scalar) call is correct against the same buffer make_overview_gpu just returned, with bool(nan_mask.any().item()) still gating the rewrite so the kernel is skipped on overview levels that have no all-sentinel cells.
The source-level test now anchors its window on the next parts.append( instead of a fixed character count, so the guard tracks the loop body even if surrounding code shifts.
The third GPU test (make_overview_gpu_fresh_buffer_1948) covers all five reducer methods (mean, nearest, min, max, median) and asserts the returned device pointer differs from the input. That is the contract the in-place rewrite depends on, and it is now under test.
The round-trip COG test stays green: the bottom-right quadrant stays all-sentinel so each 2x reduction still produces at least one NaN block that exercises the rewrite branch, and both overview levels read back with the sentinel preserved.

Checklist

Rebased cleanly on origin/main
CPU test passes locally (3 passed in 0.94s; two GPU tests skip without a device)
No changes outside _writers/gpu.py and the new regression test
No public API or signature changes
Sentinel-rewrite semantics preserved (float dtype gate, finite nodata gate, dtype-cast sentinel)

…rewrite (#1948) The COG overview loop inside `write_geotiff_gpu` used to allocate a fresh `current.copy()` before rewriting NaN cells back to the nodata sentinel: ``` current = make_overview_gpu(current, ...) ... nan_mask = cupy.isnan(current) if bool(nan_mask.any().item()): current = current.copy() current[nan_mask] = sentinel ``` `make_overview_gpu` returns a freshly allocated cupy buffer at every call site (the 2-D path ends in `cupy.nan*` / `cupy.around(...).astype(...)` / `cropped[::2, ::2].copy()`; the 3-D path ends in `cupy.stack`), so nothing aliased the buffer between the return and the in-place rewrite. The `current.copy()` allocated a second chunk-sized GPU buffer per overview level for no semantic gain. Replace the two-line rewrite with `cupy.putmask(current, nan_mask, sentinel)` so the existing buffer is mutated in place. Mirrors the in-place sentinel rewrite `_apply_nodata_mask_gpu` adopted in #1934. Tests cover: * structural -- the overview branch uses `cupy.putmask` and no longer contains `current = current.copy()`, * correctness -- a COG write of a float32 raster with an all-sentinel quadrant round-trips through the GPU writer and back to the CPU reader with NaN preserved on every overview level, * contract -- every overview method in `_block_reduce_2d_gpu` returns a cupy buffer whose `data.ptr` differs from the input, so the in-place mutation is safe.

brendancol · 2026-05-15T17:53:23Z

Rebased onto current main at 19fd38f to pick up the Python 3.14 CI matrix limit (b1579f8).

Copilot AI review requested due to automatic review settings May 15, 2026 15:28

github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026

Copilot AI reviewed May 15, 2026

View reviewed changes

brendancol force-pushed the deep-sweep-performance-geotiff-2026-05-15-pass2-01 branch from 179d83c to 38d5b2f Compare May 15, 2026 15:46

brendancol added a commit that referenced this pull request May 15, 2026

geotiff: address PR #1952 review nits

136661a

brendancol added a commit that referenced this pull request May 15, 2026

geotiff: address PR #1952 review nits

288ff83

brendancol force-pushed the deep-sweep-performance-geotiff-2026-05-15-pass2-01 branch from 136661a to 288ff83 Compare May 15, 2026 17:29

brendancol added 2 commits May 15, 2026 10:53

geotiff: address PR #1952 review nits

19fd38f

brendancol force-pushed the deep-sweep-performance-geotiff-2026-05-15-pass2-01 branch from 288ff83 to 19fd38f Compare May 15, 2026 17:53

brendancol merged commit add6529 into main May 15, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948)#1952

geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948)#1952
brendancol merged 2 commits into
mainfrom
deep-sweep-performance-geotiff-2026-05-15-pass2-01

brendancol commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

brendancol commented May 15, 2026

Uh oh!

brendancol commented May 15, 2026

Uh oh!

brendancol commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brendancol commented May 15, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

brendancol commented May 15, 2026

PR Review: geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948)

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol commented May 15, 2026

PR Review: geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948) (re-review after rebase)

Blockers

Suggestions

Nits

What looks good

Checklist

Uh oh!

brendancol commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants