Skip to content

geotiff: extract read_geotiff_gpu to _backends/gpu.py (#1885)#1895

Merged
brendancol merged 2 commits into
mainfrom
issue-1885
May 15, 2026
Merged

geotiff: extract read_geotiff_gpu to _backends/gpu.py (#1885)#1895
brendancol merged 2 commits into
mainfrom
issue-1885

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Step 7 of #1813's multi-PR refactor of xrspatial/geotiff/__init__.py. Pure code motion; no public API change. Biggest single-function move in the sequence.

Moved into a new xrspatial/geotiff/_backends/gpu.py:

  • read_geotiff_gpu — GPU read entry point (~640 lines, the largest single function in __init__.py).
  • _gds_chunk_path_available — predicate gating the GDS direct-to-GPU chunked path.
  • _decode_window_gpu_direct — per-chunk disk→GPU decode used by the GDS path.
  • _read_geotiff_gpu_chunked — lazy Dask+CuPy graph builder for read_geotiff_gpu(chunks=...).
  • _read_geotiff_gpu_chunked_gds — direct-to-GPU specialisation when KvikIO + a chunky tiled local file qualifies.

Function bodies are unchanged except for adjusted relative imports (._X.._X outside _backends, ._gpu_helpers for the already-extracted GPU helpers from #1894). __init__.py re-exports read_geotiff_gpu so from xrspatial.geotiff import read_geotiff_gpu keeps working. The four private helpers are not re-exported because they were only called by read_geotiff_gpu itself; nothing outside the GPU read flow references them.

Test update

One existing test had to update its monkeypatch target: test_gpu_chunks_out_of_core_1876.py patches _decode_window_gpu_direct to spy on chunk-decode calls. The target moves from xrspatial.geotiff to xrspatial.geotiff._backends.gpu so the spy still intercepts the real call site (module-global lookup happens in the defining module). Both spy tests (uses_gds_path_when_available, fallback_when_kvikio_absent) pass after the update.

Circular-import note

The Dask+CuPy chunked-fallback path needs read_geotiff_dask, which still lives in __init__.py. To avoid a circular import, _read_geotiff_gpu_chunked lazy-imports it inside the function body (from .. import read_geotiff_dask). PR #8 (#1886) will move read_geotiff_dask into _backends/dask.py; the lazy import can be promoted to a static one then.

Diff stats

  • __init__.py: 3969 → 2962 (-1007 lines, the single biggest reduction in the series).
  • New _backends/gpu.py: 1047 lines.

Test plan

Closes #1885. Refs #1813.

Step 7 of #1813's multi-PR refactor of __init__.py. Pure code motion;
no public API change.

Moved into a new xrspatial/geotiff/_backends/gpu.py:

- read_geotiff_gpu: GPU read entry point (~640 lines, the largest
  single function in __init__.py).
- _gds_chunk_path_available: predicate gating the GDS direct-to-GPU
  chunked path.
- _decode_window_gpu_direct: per-chunk disk->GPU decode used by the
  GDS path.
- _read_geotiff_gpu_chunked: lazy Dask+CuPy graph builder for
  read_geotiff_gpu(chunks=...).
- _read_geotiff_gpu_chunked_gds: direct-to-GPU specialisation built
  by _read_geotiff_gpu_chunked when KvikIO + a chunky tiled local
  file qualifies.

Function bodies are unchanged except for adjusted relative imports
(``._X`` -> ``.._X`` outside _backends, ``._gpu_helpers`` for the
already-extracted GPU helpers). __init__.py re-exports
read_geotiff_gpu so ``from xrspatial.geotiff import read_geotiff_gpu``
keeps working. The four private helpers are not re-exported because
they were only called by read_geotiff_gpu itself; nothing outside
the GPU module references them.

One known consumer of the private names had to update its monkeypatch
target: test_gpu_chunks_out_of_core_1876.py patches
``_decode_window_gpu_direct`` to spy on chunk-decode calls. The patch
target moves from ``xrspatial.geotiff`` to
``xrspatial.geotiff._backends.gpu`` so the spy still intercepts the
real call site (module-global lookup happens in the defining module).

The Dask+CuPy chunked-fallback path needs ``read_geotiff_dask``,
which still lives in __init__.py. To avoid a circular import,
_read_geotiff_gpu_chunked lazy-imports it inside the function body
(``from .. import read_geotiff_dask``). PR #8 will move
``read_geotiff_dask`` into ``_backends/dask.py`` and the lazy import
can be replaced with a static one then.

Net __init__.py change: 3969 -> 2962 lines (-1007). _backends/gpu.py
is 1047 lines.

Verification: pixel-parity matrix from #1889 (192 cells, all GPU
cells), runtime identity (9), GPU byteswap (#1508), GPU kwarg rename
(#1560), GPU chunks out-of-core (#1876, including the GDS spy
tests), full geotiff suite minus 8 pre-existing main failures
(read_to_array monkeypatch + tile_size_positive_works) -- 2862/2862
pass.

Refs #1813.
Copilot AI review requested due to automatic review settings May 15, 2026 03:48
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the GeoTIFF backend refactor by moving the GPU read entry point and its GPU-chunking helpers out of xrspatial/geotiff/__init__.py into a dedicated backend module while preserving the public read_geotiff_gpu import.

Changes:

  • Adds xrspatial/geotiff/_backends/gpu.py containing read_geotiff_gpu and related private GPU chunk helpers.
  • Re-exports read_geotiff_gpu from xrspatial.geotiff.
  • Updates GPU chunk tests to monkeypatch the helper in its new module.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
xrspatial/geotiff/_backends/gpu.py New home for GPU GeoTIFF read implementation and private chunk/GDS helpers.
xrspatial/geotiff/__init__.py Removes the moved implementation and imports the public GPU read entry point from the backend module.
xrspatial/geotiff/tests/test_gpu_chunks_out_of_core_1876.py Updates monkeypatch targets for moved GPU chunk helper.
Comments suppressed due to low confidence (1)

xrspatial/geotiff/_backends/gpu.py:785

  • The mmap fallback in the direct chunk decoder also omits the LERC masked_fill value when calling gpu_decode_tiles. If KvikIO is present but unusable and this fallback runs, LERC valid-mask pixels keep the decoder's zero fill instead of being restored to nodata, so the later nodata mask cannot match them.
        arr_gpu = gpu_decode_tiles(
            compressed_tiles, tw, th, sub_w, sub_h,
            compression, predictor, file_dtype, samples,
            byte_order=byte_order,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +954 to +961
if band is not None:
if n_bands_out == 0:
if band != 0:
raise IndexError(
f"band={band} requested but file is single-band.")
elif band < 0 or band >= n_bands_out:
raise IndexError(
f"band={band} out of range for {n_bands_out}-band file.")
Comment on lines +760 to +765
arr_gpu = gpu_decode_tiles_from_file(
file_path, sub_offsets, sub_byte_counts,
tw, th, sub_w, sub_h,
compression, predictor, file_dtype, samples,
byte_order=byte_order,
)
Comment thread xrspatial/geotiff/__init__.py Outdated
@@ -76,6 +76,7 @@
_gpu_decode_single_band_tiles,
Drop five GPU-helper re-imports from __init__.py that no longer have
internal callers after ``read_geotiff_gpu`` moved out:

- ``_apply_nodata_mask_gpu``
- ``_apply_orientation_geo_info``
- ``_apply_orientation_gpu``
- ``_gpu_apply_window_band``
- ``_gpu_decode_single_band_tiles``

``_is_gpu_data`` is still imported because ``to_geotiff`` and
``write_geotiff_gpu`` (which stay in __init__.py for now) reference
it to detect cupy-backed inputs.

The two parity-bug comments Copilot flagged on _backends/gpu.py
(boolean band validation missing on the GDS chunked path, LERC
masked_fill dropped on the GDS chunked path) are pre-existing in
the function bodies as they lived in __init__.py before this move.
They will be filed as their own follow-up issue against #1813 so
this PR stays pure code motion.
@brendancol
Copy link
Copy Markdown
Contributor Author

Addressed Copilot's three line comments:

  1. F401 on unused GPU-helper re-imports in __init__.py — fixed in 9275b5b. Dropped _apply_nodata_mask_gpu, _apply_orientation_geo_info, _apply_orientation_gpu, _gpu_apply_window_band, and _gpu_decode_single_band_tiles from the _backends._gpu_helpers re-import block since read_geotiff_gpu (their only caller in __init__.py) has moved out. _is_gpu_data stays because to_geotiff and write_geotiff_gpu still call it.

  2. Bool band rejection missing on the GDS chunked path — pre-existing bug in _read_geotiff_gpu_chunked_gds. Tracked in geotiff: GDS chunked GPU path missing bool-band rejection and LERC masked_fill #1896 as a follow-up so this PR stays pure motion.

  3. LERC masked_fill dropped on the GDS chunked path — also pre-existing, tracked in geotiff: GDS chunked GPU path missing bool-band rejection and LERC masked_fill #1896 alongside the bool-band fix.

Re-running CI on the new commit.

@brendancol brendancol merged commit 8acb1a2 into main May 15, 2026
10 of 11 checks passed
brendancol added a commit that referenced this pull request May 15, 2026
* geotiff: extract read_geotiff_dask to _backends/dask.py (#1886)

Step 8 of #1813's multi-PR refactor of __init__.py. Pure code motion;
no public API change.

Moved into a new xrspatial/geotiff/_backends/dask.py:

- read_geotiff_dask: dask read entry point (~350 lines), with the
  HTTP/fsspec COG metadata prefetch, MinIsWhite nodata-inversion
  detection, window clipping, max_pixels guard, and 50k-task graph
  cap intact.
- _delayed_read_window: the per-chunk reader closure (~100 lines)
  that fans out via dask.delayed.

Function bodies unchanged except for adjusted relative imports
(``._X`` -> ``.._X``). Two helpers still living in __init__.py
(``read_vrt`` and ``_read_geo_info``) are lazy-imported inside the
function bodies to avoid a circular import; later PRs in #1813 will
move them out and the lazy imports can become static.

Promotes the matching lazy import on the other side: _backends/gpu.py's
``from .. import read_geotiff_dask`` (added for the same circular-
import reason in #1895) becomes a static top-of-module
``from .dask import read_geotiff_dask`` now that the target is at a
sibling path.

Test changes
============

- test_dask_no_op_astype_1624.py monkeypatches ``_read_to_array``; the
  target moves from ``xrspatial.geotiff`` to
  ``xrspatial.geotiff._backends.dask`` so the spy still intercepts the
  per-chunk worker. Same pattern as the GPU spy update in #1895.
- Restored the ``_apply_nodata_mask_gpu`` re-export from __init__.py
  (with a ``# noqa: F401`` marker). PR #1895's review-fix commit
  dropped this re-import after concluding it had no internal callers;
  it overlooked five test imports across ``test_nodata_nan_int_1774``
  and ``test_nodata_out_of_range_1581`` that exercise the helper
  directly. Restoring the re-export fixes 6 tests that were failing
  on main since #1895 merged.

Diff stats
==========

- __init__.py: 2962 -> 2504 (-458 lines), plus 1 re-import line.
- New _backends/dask.py: 481 lines.

Verification: pixel-parity matrix from #1889 (192 cells), runtime
identity (9), dask int nodata chunks (#1597), attrs parity (#1548),
dask no-op astype (#1624), gpu chunks out-of-core (#1876), full
geotiff suite minus the 8 pre-existing main failures -- 2862/2862 pass.

Refs #1813.

* geotiff: address Copilot review on _backends/dask.py extraction (#1897)

Two cleanups Copilot flagged:

1. ``_backends/gpu.py`` module docstring still described the
   ``read_geotiff_dask`` import as lazy. The import was promoted to a
   static top-of-module ``from .dask import read_geotiff_dask`` in this
   PR; updated the docstring to reflect the current strategy and note
   why the original lazy form existed.

2. ``__init__.py`` had the ``# noqa: F401`` applied to a multi-name
   import, suppressing lint signal for ``_is_gpu_data`` as well as the
   intentional ``_apply_nodata_mask_gpu`` re-export. Split into two
   imports so the ``noqa`` marks only the re-export line. ``_is_gpu_data``
   keeps full lint coverage; a future genuinely-unused import on it
   would surface normally.
@brendancol brendancol deleted the issue-1885 branch May 15, 2026 04:40
brendancol added a commit that referenced this pull request May 15, 2026
…#1888) (#1899)

* geotiff: extract writers to _writers/, reduce __init__.py to dispatch (#1888)

Final step (10) of #1813's multi-PR refactor of __init__.py. Pure
code motion; no public API change.

Created the _writers/ subpackage with three modules:

- _writers/eager.py: ``to_geotiff`` (public eager writer),
  ``_write_single_tile`` (per-tile worker), and ``_write_vrt_tiled``
  (deprecated vrt_tiled=True path). ~870 lines.
- _writers/gpu.py: ``write_geotiff_gpu`` (nvCOMP-backed GPU writer
  with CPU fallback). ~500 lines.
- _writers/vrt.py: ``write_vrt`` (mosaic XML generator with crs_wkt
  deprecation handling). ~90 lines.

__init__.py re-exports the three public writer names. The two
private helpers tests reach for (``_write_single_tile``,
``_apply_nodata_mask_gpu``) keep their package-level re-exports with
``# noqa: F401`` markers since the test files import them directly.

Final shape of __init__.py: 529 lines (down from 1905 pre-PR and
4902 at the start of #1813 series). It now holds only:

- Module docstring + ``__all__`` (lines 1-112).
- ``_read_geo_info``: metadata-only header parse used by
  ``open_geotiff`` and lazy-imported by ``_backends/dask.py``.
- ``open_geotiff``: the public auto-dispatching read entry point that
  routes between the four read backends and ``read_vrt``.
- ``plot_geotiff``: deprecated wrapper that emits a
  ``DeprecationWarning`` pointing to ``da.xrs.plot()``.

Test updates
============

- test_to_geotiff_gpu_fallback_1674.py monkeypatches
  ``write_geotiff_gpu`` and ``_is_gpu_data`` as resolved by
  ``to_geotiff``. The patch targets move from ``xrspatial.geotiff``
  to ``xrspatial.geotiff._writers.eager`` so the spy still intercepts
  the real call sites. Same module-global-rebind pattern as #1895 and
  #1897.

Net change since the start of #1813
====================================

- __init__.py: 4902 -> 529 lines (-4373; -89%).
- New per-responsibility modules under xrspatial/geotiff/:
  _runtime.py (#1880), _crs.py (#1881), _validation.py (#1882),
  _attrs.py (#1883), _backends/_gpu_helpers.py (#1884),
  _backends/gpu.py (#1885), _backends/dask.py (#1886),
  _backends/vrt.py (#1887), _writers/{eager,gpu,vrt}.py (this PR).
- One new regression-backstop test:
  test_backend_pixel_parity_matrix_1813.py (#1879), 192 parametrised
  cells covering pixel/coord/attrs parity across the four backends
  and four read entry points.

Verification: 2862/2862 of the geotiff test suite pass (the 8
remaining pre-existing main failures from #1517 and #1776 are
unrelated to #1813 and were excluded as deselected throughout).

Closes #1888. Closes #1813.

* geotiff: address Copilot review on _writers extraction (#1899)

Drop the unused module-level and in-function imports Copilot flagged:

eager.py
- Drop ``import math`` (no math.* calls in the file).
- Drop ``RASTER_PIXEL_IS_POINT`` re-import (only RASTER_PIXEL_IS_AREA
  is referenced; the POINT path lives in _writers/_vrt_tiled below
  and uses its own internal constants).
- Drop ``_validate_dtype_cast`` re-import (never called; the writer
  side does not validate dtype casts post-decode).
- Drop ``from .vrt import write_vrt`` re-import; ``_write_vrt_tiled``
  does a lazy ``from .._vrt import write_vrt as _write_vrt_fn`` at
  its single call site so the module-top import was dead.
- Drop the redundant ``import os`` inside ``_write_vrt_tiled``; the
  module-level ``import os`` already covers the os.path / os.makedirs
  calls in that body.

gpu.py
- Drop ``import math`` and ``import os`` (no math.* / os.* references
  in the file body).
- Drop ``GeoTransform`` re-import at module level (only used in
  eager.py's _write_single_tile, which still imports it).
- Drop ``_validate_dtype_cast`` re-import (never called).
- Drop ``from .._writer import write`` re-import; only docstring
  references mention it.
- Drop the in-function ``GeoTransform as _GT`` alias inside the
  ``_writer`` import block (the alias was carried over verbatim from
  __init__.py but the function body never references _GT).
- Drop the in-function ``from .._dtypes import numpy_to_tiff_dtype``
  (also unreferenced in the body).

All 2862 geotiff tests still pass; pyflakes is clean on these files
except for the ``cupy.ndarray`` type annotation in
``write_geotiff_gpu``'s signature, which works fine under
``from __future__ import annotations`` and matched the same surface
the original __init__.py had through earlier #1813 PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: extract read_geotiff_gpu into _backends/gpu.py (#1813 step 7)

2 participants