Skip to content

geotiff: extract read_vrt and VRT dask helpers to _backends/vrt.py (#1887)#1898

Merged
brendancol merged 2 commits into
mainfrom
issue-1887
May 15, 2026
Merged

geotiff: extract read_vrt and VRT dask helpers to _backends/vrt.py (#1887)#1898
brendancol merged 2 commits into
mainfrom
issue-1887

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Step 9 of #1813's multi-PR refactor of xrspatial/geotiff/__init__.py. Pure code motion; no public API change.

Moved into a new xrspatial/geotiff/_backends/vrt.py:

The XML parsing, source-path containment, per-source decode, and _apply_integer_sentinel_mask already live in xrspatial/geotiff/_vrt.py (untouched here); this module orchestrates only the eager / dask / GPU dispatch around them.

Function bodies are unchanged except for adjusted relative imports (._X.._X). __init__.py re-exports read_vrt so the public surface is unchanged.

Knock-on cleanup

_backends/dask.py had a lazy from .. import read_vrt added in #1897 because read_vrt was still in __init__.py. With read_vrt now in a sibling backend module, the import is promoted to a static top-of-module from .vrt import read_vrt and the dispatch site becomes a direct call.

Diff stats

  • __init__.py: 2505 → 1905 (-600 lines).
  • New _backends/vrt.py: 624 lines.

Test plan

Closes #1887. Refs #1813.

…1887)

Step 9 of #1813's multi-PR refactor of __init__.py. Pure code motion;
no public API change.

Moved into a new xrspatial/geotiff/_backends/vrt.py:

- read_vrt: VRT read entry point (~250 lines), with the eager-path
  CRS/transform/holes attrs population, the per-band nodata
  selection (#1598), and the integer-sentinel post-decode mask
  (#1611, #1825) intact.
- _vrt_chunk_read: per-chunk delayed reader used by the dask graph.
- _read_vrt_chunked: lazy dask dispatch when chunks= is set (#1814),
  including the parse-time hole detection and 50k-task graph cap.
- _MAX_VRT_DASK_CHUNKS constant.

The XML parsing, source-path containment, per-source decode, and
``_apply_integer_sentinel_mask`` already live in
xrspatial/geotiff/_vrt.py (untouched here); this module orchestrates
only the eager / dask / GPU dispatch around them.

Function bodies are unchanged except for adjusted relative imports
(``._X`` -> ``.._X``). __init__.py re-exports ``read_vrt`` so the
public surface is unchanged.

Knock-on cleanup
================

``_backends/dask.py`` had a lazy ``from .. import read_vrt`` (added in
#1886) because ``read_vrt`` was still in __init__.py. With ``read_vrt``
now in a sibling backend module, the import is promoted to a static
top-of-module ``from .vrt import read_vrt`` and the dispatch site
becomes a direct call.

Diff stats
==========

- __init__.py: 2505 -> 1905 lines (-600).
- New _backends/vrt.py: 624 lines.

Verification: pixel-parity matrix from #1889 (192 cells), runtime
identity (9), VRT-specific tests (lazy chunks #1814, lazy chunks
#1798, window validation #1697, backend coverage), full geotiff
suite minus the 8 pre-existing main failures -- 2862/2862 pass.

Refs #1813.
Copilot AI review requested due to automatic review settings May 15, 2026 04:22
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is part of the ongoing GeoTIFF refactor (#1813), extracting the VRT read entry point (read_vrt) and its dask chunking helpers out of xrspatial/geotiff/__init__.py into a dedicated backend module, while keeping the public API stable via re-export.

Changes:

  • Added xrspatial/geotiff/_backends/vrt.py containing read_vrt, _vrt_chunk_read, _read_vrt_chunked, and _MAX_VRT_DASK_CHUNKS.
  • Updated xrspatial/geotiff/_backends/dask.py to use a static import of read_vrt from the sibling backend module.
  • Updated xrspatial/geotiff/__init__.py to re-export read_vrt from the new backend module and removed the inlined implementation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_backends/vrt.py New VRT backend module holding read_vrt and its eager/dask/GPU orchestration.
xrspatial/geotiff/_backends/dask.py Switches VRT dispatch from a lazy .. import read_vrt to static from .vrt import read_vrt.
xrspatial/geotiff/__init__.py Re-exports read_vrt from _backends/vrt.py and removes the previous in-file implementation.
Comments suppressed due to low confidence (1)

xrspatial/geotiff/_backends/vrt.py:205

  • This comment says “lenient mode (the default)”, but missing_sources now defaults to 'raise' (and strict mode can force raising). Please update the wording to reflect that the lenient path is opt-in via missing_sources='warn', so the comment stays consistent with the function signature and docstring above.
    # Surface skipped-source records as ``attrs['vrt_holes']`` so
    # callers can detect a partial mosaic by attribute lookup. Under
    # lenient mode (the default), the underlying ``_vrt.read_vrt``
    # already warned per skipped source but the warning is easy to
    # miss in a pipeline; the attr lets downstream code branch on

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_backends/vrt.py Outdated
Comment on lines +114 to +117
``attrs['vrt_holes']`` from skipped sources; the chunked path does
not aggregate per-task hole records, so that attribute is not set
when ``chunks=`` is used. Each worker still emits
``GeoTIFFFallbackWarning`` for missing sources.
)
from .._reader import read_to_array as _read_to_array
from .._validation import _validate_chunks_arg, _validate_dtype_cast
from .vrt import read_vrt
Two docstring corrections Copilot flagged:

1. ``_backends/vrt.py`` ``read_vrt`` docstring claimed the chunked
   path leaves ``attrs['vrt_holes']`` unset. ``_read_vrt_chunked``
   actually populates it via a parse-time ``os.path.exists`` sweep
   (introduced before this PR, in the original __init__.py code).
   Updated the docstring to describe the actual best-effort contract
   and note that decode-time codec failures still surface only as
   per-task ``GeoTIFFFallbackWarning``.

2. ``_backends/dask.py`` module docstring still described ``read_vrt``
   as lazy-imported from __init__.py. After #1898 promoted it to a
   static ``from .vrt import read_vrt``, only ``_read_geo_info``
   remains lazy. Rewrote the wording to match the current import
   strategy.
@brendancol brendancol merged commit 015e350 into main May 15, 2026
10 checks passed
@brendancol brendancol deleted the issue-1887 branch May 15, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: extract read_vrt and VRT dask helpers into _backends/vrt.py (#1813 step 9)

2 participants