geotiff: extract read_vrt and VRT dask helpers to _backends/vrt.py (#1887)#1898
Merged
Conversation
…1887) Step 9 of #1813's multi-PR refactor of __init__.py. Pure code motion; no public API change. Moved into a new xrspatial/geotiff/_backends/vrt.py: - read_vrt: VRT read entry point (~250 lines), with the eager-path CRS/transform/holes attrs population, the per-band nodata selection (#1598), and the integer-sentinel post-decode mask (#1611, #1825) intact. - _vrt_chunk_read: per-chunk delayed reader used by the dask graph. - _read_vrt_chunked: lazy dask dispatch when chunks= is set (#1814), including the parse-time hole detection and 50k-task graph cap. - _MAX_VRT_DASK_CHUNKS constant. The XML parsing, source-path containment, per-source decode, and ``_apply_integer_sentinel_mask`` already live in xrspatial/geotiff/_vrt.py (untouched here); this module orchestrates only the eager / dask / GPU dispatch around them. Function bodies are unchanged except for adjusted relative imports (``._X`` -> ``.._X``). __init__.py re-exports ``read_vrt`` so the public surface is unchanged. Knock-on cleanup ================ ``_backends/dask.py`` had a lazy ``from .. import read_vrt`` (added in #1886) because ``read_vrt`` was still in __init__.py. With ``read_vrt`` now in a sibling backend module, the import is promoted to a static top-of-module ``from .vrt import read_vrt`` and the dispatch site becomes a direct call. Diff stats ========== - __init__.py: 2505 -> 1905 lines (-600). - New _backends/vrt.py: 624 lines. Verification: pixel-parity matrix from #1889 (192 cells), runtime identity (9), VRT-specific tests (lazy chunks #1814, lazy chunks #1798, window validation #1697, backend coverage), full geotiff suite minus the 8 pre-existing main failures -- 2862/2862 pass. Refs #1813.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR is part of the ongoing GeoTIFF refactor (#1813), extracting the VRT read entry point (read_vrt) and its dask chunking helpers out of xrspatial/geotiff/__init__.py into a dedicated backend module, while keeping the public API stable via re-export.
Changes:
- Added
xrspatial/geotiff/_backends/vrt.pycontainingread_vrt,_vrt_chunk_read,_read_vrt_chunked, and_MAX_VRT_DASK_CHUNKS. - Updated
xrspatial/geotiff/_backends/dask.pyto use a static import ofread_vrtfrom the sibling backend module. - Updated
xrspatial/geotiff/__init__.pyto re-exportread_vrtfrom the new backend module and removed the inlined implementation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_backends/vrt.py |
New VRT backend module holding read_vrt and its eager/dask/GPU orchestration. |
xrspatial/geotiff/_backends/dask.py |
Switches VRT dispatch from a lazy .. import read_vrt to static from .vrt import read_vrt. |
xrspatial/geotiff/__init__.py |
Re-exports read_vrt from _backends/vrt.py and removes the previous in-file implementation. |
Comments suppressed due to low confidence (1)
xrspatial/geotiff/_backends/vrt.py:205
- This comment says “lenient mode (the default)”, but
missing_sourcesnow defaults to'raise'(and strict mode can force raising). Please update the wording to reflect that the lenient path is opt-in viamissing_sources='warn', so the comment stays consistent with the function signature and docstring above.
# Surface skipped-source records as ``attrs['vrt_holes']`` so
# callers can detect a partial mosaic by attribute lookup. Under
# lenient mode (the default), the underlying ``_vrt.read_vrt``
# already warned per skipped source but the warning is easy to
# miss in a pipeline; the attr lets downstream code branch on
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+114
to
+117
| ``attrs['vrt_holes']`` from skipped sources; the chunked path does | ||
| not aggregate per-task hole records, so that attribute is not set | ||
| when ``chunks=`` is used. Each worker still emits | ||
| ``GeoTIFFFallbackWarning`` for missing sources. |
| ) | ||
| from .._reader import read_to_array as _read_to_array | ||
| from .._validation import _validate_chunks_arg, _validate_dtype_cast | ||
| from .vrt import read_vrt |
Two docstring corrections Copilot flagged: 1. ``_backends/vrt.py`` ``read_vrt`` docstring claimed the chunked path leaves ``attrs['vrt_holes']`` unset. ``_read_vrt_chunked`` actually populates it via a parse-time ``os.path.exists`` sweep (introduced before this PR, in the original __init__.py code). Updated the docstring to describe the actual best-effort contract and note that decode-time codec failures still surface only as per-task ``GeoTIFFFallbackWarning``. 2. ``_backends/dask.py`` module docstring still described ``read_vrt`` as lazy-imported from __init__.py. After #1898 promoted it to a static ``from .vrt import read_vrt``, only ``_read_geo_info`` remains lazy. Rewrote the wording to match the current import strategy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Step 9 of #1813's multi-PR refactor of
xrspatial/geotiff/__init__.py. Pure code motion; no public API change.Moved into a new
xrspatial/geotiff/_backends/vrt.py:read_vrt— VRT read entry point (~250 lines), with the eager-path CRS/transform/holes attrs population, the per-band nodata selection (read_vrt(band=N) reports band 0's nodata in attrs and skips masking for band N #1598), and the integer-sentinel post-decode mask (read_vrt(band=None) only masks band 0's nodata sentinel on multi-band integer VRTs #1611, geotiff: refactor read_vrt chunked path to share logic with eager #1825) intact._vrt_chunk_read— per-chunk delayed reader used by the dask graph._read_vrt_chunked— lazy dask dispatch whenchunks=is set (geotiff: read_vrt(chunks=...) materialises the full mosaic before chunking #1814), including the parse-time hole detection and the 50k-task graph cap._MAX_VRT_DASK_CHUNKSconstant.The XML parsing, source-path containment, per-source decode, and
_apply_integer_sentinel_maskalready live inxrspatial/geotiff/_vrt.py(untouched here); this module orchestrates only the eager / dask / GPU dispatch around them.Function bodies are unchanged except for adjusted relative imports (
._X→.._X).__init__.pyre-exportsread_vrtso the public surface is unchanged.Knock-on cleanup
_backends/dask.pyhad a lazyfrom .. import read_vrtadded in #1897 becauseread_vrtwas still in__init__.py. Withread_vrtnow in a sibling backend module, the import is promoted to a static top-of-modulefrom .vrt import read_vrtand the dispatch site becomes a direct call.Diff stats
__init__.py: 2505 → 1905 (-600 lines)._backends/vrt.py: 624 lines.Test plan
open_geotiff(.vrt)vsread_vrt).Closes #1887. Refs #1813.