Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@
- Default internal `_vrt.read_vrt` `missing_sources` to `'raise'` so an unreadable VRT source no longer produces a silent zero-fill hole on integer rasters; pass `missing_sources='warn'` to opt back into the previous lenient behaviour (#1843)
- Deprecate read-side emission of matplotlib colormap-derived attrs (cmap, colormap_rgba) on palette TIFFs; the writer cannot set Photometric=3 so they do not round-trip. Construct ListedColormap from attrs['colormap'] in caller code. These attrs still emit for now but trigger a DeprecationWarning. Removal planned for a future release. (#1984)
- Reject writes whose `attrs['crs']` and `attrs['crs_wkt']` resolve to different CRSes (after pyproj canonicalisation) instead of silently emitting the EPSG and dropping the WKT. The new `ConflictingCRSError` (subclass of `ValueError` and `GeoTIFFAmbiguousMetadataError`) names the offending attrs; pass `crs=` explicitly to override both attrs and bypass the check. Read-back DataArrays carrying both attrs continue to round-trip because the reader's two attrs derive from the same on-disk CRS. (#1987)
- Reject reads whose CRS string cannot be parsed by pyproj instead of emitting it verbatim in `attrs['crs_wkt']` and letting downstream code crash on first use. Raises `UnparseableCRSError` (subclass of `ValueError`); pass `allow_unparseable_crs=True` to `open_geotiff` / `read_geotiff_dask` / `read_geotiff_gpu` / `read_vrt` to keep the legacy behaviour. The existing write-side raise from `_validate_crs_fallback` was retyped from plain `ValueError` to the new `UnparseableCRSError` subclass (no behaviour change). (#1987)
- Reject reads whose affine transform has non-zero rotation/shear terms instead of returning an axis-misaligned grid that downstream xrspatial ops (slope, aspect, hillshade, proximity, zonal) silently compute wrong results on. Raises `RotatedTransformError`; pass `allow_rotated=True` to `open_geotiff` / `read_geotiff_dask` / `read_geotiff_gpu` / `read_vrt` to read the pixel grid without the axis-aligned-grid assumption. (#1987)
- Reject writes whose `coords['y']` or `coords['x']` are not uniformly spaced instead of silently using the first two values as the pixel size and misrepresenting the rest of the axis. Raises `NonUniformCoordsError`; the existing int-dtype sentinel convention from #1969 (used by the no-georef coord fallback) is exempted. (#1987)
- Reject writes whose `attrs['nodata']` disagrees with every concrete entry in `attrs['nodatavals']` instead of silently picking the canonical scalar and dropping the rioxarray tuple. Raises `ConflictingNodataError`; pass `nodata=` explicitly to the writer to override both attrs and bypass the check. `_FillValue` continues to be deprioritised per the existing resolver convention. (#1987)


### Version 0.9.9 - 2026-05-05
Expand Down
22 changes: 20 additions & 2 deletions xrspatial/geotiff/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@
_extent_to_window,
_extract_rich_tags,
_populate_attrs_from_geo_info,
_validate_read_geo_info,
_resolve_nodata_attr,
_set_nodata_attrs,
)
Expand Down Expand Up @@ -251,6 +252,8 @@ def open_geotiff(source: str | BinaryIO, *,
max_cloud_bytes=_MAX_CLOUD_BYTES_SENTINEL,
on_gpu_failure: str = _ON_GPU_FAILURE_SENTINEL,
missing_sources: str = _MISSING_SOURCES_SENTINEL,
allow_rotated: bool = False,
allow_unparseable_crs: bool = False,
) -> xr.DataArray:
"""Read a GeoTIFF, COG, or VRT file into an xarray.DataArray.

Expand Down Expand Up @@ -441,7 +444,10 @@ def open_geotiff(source: str | BinaryIO, *,
vrt_kwargs['missing_sources'] = missing_sources
return read_vrt(source, dtype=dtype, window=window, band=band,
name=name, chunks=chunks, gpu=gpu,
max_pixels=max_pixels, **vrt_kwargs)
max_pixels=max_pixels,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
**vrt_kwargs)

# File-like buffers don't support the GPU or dask code paths because
# those re-open the source by path from worker tasks or device-side
Expand All @@ -466,14 +472,18 @@ def open_geotiff(source: str | BinaryIO, *,
window=window, band=band,
name=name, chunks=chunks,
max_pixels=max_pixels,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
**gpu_kwargs)

# Dask path (CPU)
if chunks is not None:
return read_geotiff_dask(source, dtype=dtype, chunks=chunks,
overview_level=overview_level,
window=window, band=band,
max_pixels=max_pixels, name=name)
max_pixels=max_pixels, name=name,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs)

kwargs = {}
if max_pixels is not None:
Expand Down Expand Up @@ -514,6 +524,14 @@ def open_geotiff(source: str | BinaryIO, *,
import os
name = os.path.splitext(os.path.basename(source))[0]

# Issue #1987 ambiguous-metadata checks. Run before attrs population
# so a rejected file does not leak a partly-populated attrs dict.
_validate_read_geo_info(
geo_info, window=window,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
)

attrs = {}
_populate_attrs_from_geo_info(attrs, geo_info, window=window)

Expand Down
51 changes: 51 additions & 0 deletions xrspatial/geotiff/_attrs.py
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,57 @@ def _set_nodata_attrs(attrs: dict, nodata, *, array_dtype) -> None:
attrs['masked_nodata'] = bool(np.dtype(array_dtype).kind == 'f')


def _validate_read_geo_info(
geo_info,
*,
window=None,
allow_rotated: bool = False,
allow_unparseable_crs: bool = False,
) -> None:
"""Run issue #1987 read-side ambiguous-metadata checks against ``geo_info``.

Centralised helper so the eager numpy, dask, GPU, and VRT read
paths run the same checks before constructing the returned
DataArray. Forwards ``allow_rotated`` / ``allow_unparseable_crs``
to the registered checks (``_check_read_rotated_transform`` and
``_check_read_unparseable_crs`` today; sibling checks attach via
the registry).

Raises whichever ``GeoTIFFAmbiguousMetadataError`` subclass a
registered check picks. The hook is a no-op when no check is
registered, so callers can use this helper unconditionally without
coupling each backend to the current check list.

Note: the transform tuple built here is always axis-aligned
(``b == 0`` / ``d == 0``) because ``_transform_tuple_from_pixel_geometry``
only carries origin + pixel size, and the upstream TIFF reader
rejects rotated ``ModelTransformationTag`` entries with
``NotImplementedError`` in ``_geotags._extract_transform_and_georef``
before we reach this helper. The rotated-transform check therefore
fires only on the VRT path, which builds its context from the GDAL
``geo_transform`` via ``_gdal_geotransform_to_affine_tuple``.
"""
from ._validation import validate_read_metadata
transform_for_check = (
_transform_tuple_from_pixel_geometry(
geo_info.transform.origin_x,
geo_info.transform.origin_y,
geo_info.transform.pixel_width,
geo_info.transform.pixel_height,
window=window,
)
if (geo_info.transform is not None
and getattr(geo_info, 'has_georef', True))
else None
)
validate_read_metadata({
'allow_rotated': allow_rotated,
'allow_unparseable_crs': allow_unparseable_crs,
'transform': transform_for_check,
'crs_wkt': geo_info.crs_wkt,
})


def _populate_attrs_from_geo_info(attrs: dict, geo_info, *, window=None) -> None:
"""Populate ``attrs`` with all GeoTIFF metadata from ``geo_info``.

Expand Down
17 changes: 15 additions & 2 deletions xrspatial/geotiff/_backends/dask.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@
import numpy as np
import xarray as xr

from .._attrs import _populate_attrs_from_geo_info, _set_nodata_attrs
from .._attrs import (
_populate_attrs_from_geo_info,
_set_nodata_attrs,
_validate_read_geo_info,
)
from .._coords import (
coords_from_geo_info as _coords_from_geo_info,
geo_to_coords as _geo_to_coords,
Expand All @@ -34,7 +38,9 @@ def read_geotiff_dask(source: str, *,
band: int | None = None,
name: str | None = None,
chunks: int | tuple = 512,
max_pixels: int | None = None) -> xr.DataArray:
max_pixels: int | None = None,
allow_rotated: bool = False,
allow_unparseable_crs: bool = False) -> xr.DataArray:
"""Read a GeoTIFF as a dask-backed DataArray for out-of-core processing.

Each chunk is loaded lazily via windowed reads.
Expand Down Expand Up @@ -294,6 +300,13 @@ def read_geotiff_dask(source: str, *,
import os
name = os.path.splitext(os.path.basename(source))[0]

# Issue #1987 ambiguous-metadata checks.
_validate_read_geo_info(
geo_info, window=window,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
)

attrs = {}
_populate_attrs_from_geo_info(attrs, geo_info, window=window)
# ``masked_nodata`` reflects the declared dask graph dtype: a float
Expand Down
39 changes: 36 additions & 3 deletions xrspatial/geotiff/_backends/gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@
import numpy as np
import xarray as xr

from .._attrs import _populate_attrs_from_geo_info, _set_nodata_attrs
from .._attrs import (
_populate_attrs_from_geo_info,
_set_nodata_attrs,
_validate_read_geo_info,
)
from .._coords import (
coords_from_geo_info as _coords_from_geo_info,
)
Expand Down Expand Up @@ -80,6 +84,8 @@ def read_geotiff_gpu(source: str, *,
chunks: int | tuple | None = None,
max_pixels: int | None = None,
on_gpu_failure: str = _ON_GPU_FAILURE_SENTINEL,
allow_rotated: bool = False,
allow_unparseable_crs: bool = False,
gpu: str = _GPU_DEPRECATED_SENTINEL,
) -> xr.DataArray:
"""Read a GeoTIFF with GPU-accelerated decompression via Numba CUDA.
Expand Down Expand Up @@ -233,6 +239,8 @@ def read_geotiff_gpu(source: str, *,
source, dtype=dtype, chunks=chunks,
overview_level=overview_level, window=window, band=band,
name=name, max_pixels=max_pixels,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
)

from .._reader import (
Expand Down Expand Up @@ -382,6 +390,11 @@ def read_geotiff_gpu(source: str, *,
if name is None:
import os
name = os.path.splitext(os.path.basename(source))[0]
_validate_read_geo_info(
geo_info, window=window,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
)
attrs = {}
_populate_attrs_from_geo_info(attrs, geo_info, window=window)
# Apply nodata mask + record sentinel so the GPU read agrees
Expand Down Expand Up @@ -710,6 +723,12 @@ def _read_once():
import os
name = os.path.splitext(os.path.basename(source))[0]

_validate_read_geo_info(
geo_info, window=window,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
)

attrs = {}
_populate_attrs_from_geo_info(attrs, geo_info, window=window)
# ``attrs['nodata']`` + ``attrs['masked_nodata']`` reflect the
Expand Down Expand Up @@ -881,7 +900,9 @@ def _decode_window_gpu_direct(file_path, all_offsets, all_byte_counts,


def _read_geotiff_gpu_chunked(source, *, dtype, chunks, overview_level,
window, band, name, max_pixels):
window, band, name, max_pixels,
allow_rotated: bool = False,
allow_unparseable_crs: bool = False):
"""Lazy Dask+CuPy backend for ``read_geotiff_gpu(chunks=...)``.

Two paths produce the same shape of dask graph:
Expand Down Expand Up @@ -941,6 +962,8 @@ def _read_geotiff_gpu_chunked(source, *, dtype, chunks, overview_level,
src_path, ifd, geo_info, header,
dtype=dtype, chunks=chunks, window=window, band=band,
name=name, max_pixels=max_pixels,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
)
except Exception:
# GDS qualification failed; fall back to the CPU path. The
Expand All @@ -952,6 +975,8 @@ def _read_geotiff_gpu_chunked(source, *, dtype, chunks, overview_level,
source, dtype=dtype, chunks=chunks,
overview_level=overview_level, window=window, band=band,
max_pixels=max_pixels, name=name,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
)

cpu_dask_arr = cpu_da.data
Expand All @@ -973,7 +998,9 @@ def _upload(block):

def _read_geotiff_gpu_chunked_gds(source, ifd, geo_info, header, *,
dtype, chunks, window, band, name,
max_pixels):
max_pixels,
allow_rotated: bool = False,
allow_unparseable_crs: bool = False):
"""Build a Dask+CuPy graph that decodes each chunk disk->GPU.

Caller must have verified that the source qualifies via
Expand Down Expand Up @@ -1159,6 +1186,12 @@ def _chunk_task(meta, r0, c0, r1, c1):
else:
dims = ['y', 'x']

_validate_read_geo_info(
geo_info, window=window,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
)

attrs = {}
_populate_attrs_from_geo_info(attrs, geo_info, window=window)
# ``masked_nodata`` reflects the declared dask graph dtype; mirrors
Expand Down
62 changes: 59 additions & 3 deletions xrspatial/geotiff/_backends/vrt.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,10 @@ def read_vrt(source: str, *,
chunks: int | tuple | None = None,
gpu: bool = False,
max_pixels: int | None = None,
missing_sources: str = 'raise') -> xr.DataArray:
missing_sources: str = 'raise',
allow_rotated: bool = False,
allow_unparseable_crs: bool = False,
band_nodata: str | None = None) -> xr.DataArray:
"""Read a GDAL Virtual Raster Table (.vrt) into an xarray.DataArray.

The VRT's source GeoTIFFs are read via windowed reads and assembled
Expand Down Expand Up @@ -158,11 +161,43 @@ def read_vrt(source: str, *,
dtype=dtype,
max_pixels=max_pixels,
missing_sources=missing_sources,
allow_rotated=allow_rotated,
allow_unparseable_crs=allow_unparseable_crs,
band_nodata=band_nodata,
)

# Issue #1987 ambiguous-metadata checks for the eager VRT path. Parse
# the VRT XML up front and validate before ``_read_vrt_internal``
# touches any pixel data, so a rejected file does not first
# materialise the full mosaic into host memory. The parsed
# ``VRTDataset`` is threaded into the internal reader via ``parsed=``
# so we don't double-parse the XML.
import os as _os
from .._validation import (
validate_read_metadata,
_gdal_geotransform_to_affine_tuple,
)
from .._vrt import parse_vrt as _parse_vrt, _read_vrt_xml
_xml_str = _read_vrt_xml(source)
_vrt_dir = _os.path.dirname(_os.path.abspath(source))
_parsed_vrt = _parse_vrt(_xml_str, _vrt_dir)
validate_read_metadata({
'allow_rotated': allow_rotated,
'allow_unparseable_crs': allow_unparseable_crs,
'transform': _gdal_geotransform_to_affine_tuple(
_parsed_vrt.geo_transform
),
'crs_wkt': _parsed_vrt.crs_wkt,
'band_nodata': band_nodata,
'band_nodata_values': (
[b.nodata for b in _parsed_vrt.bands]
if _parsed_vrt.bands else None
),
})

arr, vrt = _read_vrt_internal(
source, window=window, band=band, max_pixels=max_pixels,
missing_sources=missing_sources,
missing_sources=missing_sources, parsed=_parsed_vrt,
)

if name is None:
Expand Down Expand Up @@ -339,7 +374,10 @@ def _vrt_chunk_read(source, r0, c0, r1, c1, *,


def _read_vrt_chunked(source, *, window, band, name, chunks, gpu, dtype,
max_pixels, missing_sources):
max_pixels, missing_sources,
allow_rotated: bool = False,
allow_unparseable_crs: bool = False,
band_nodata: str | None = None):
"""Lazy ``read_vrt`` dispatch when ``chunks=`` is set (issue #1814).

Parses the VRT XML once to recover the extent, CRS, GeoTransform,
Expand Down Expand Up @@ -383,6 +421,24 @@ def _read_vrt_chunked(source, *, window, band, name, chunks, gpu, dtype,
vrt_dir = _os.path.dirname(_os.path.abspath(source))
vrt = parse_vrt(xml_str, vrt_dir)

# Issue #1987 ambiguous-metadata checks on the chunked VRT path. Run
# before the band-count validator below so a rejected file does not
# produce side effects.
from .._validation import (
validate_read_metadata,
_gdal_geotransform_to_affine_tuple,
)
validate_read_metadata({
'allow_rotated': allow_rotated,
'allow_unparseable_crs': allow_unparseable_crs,
'transform': _gdal_geotransform_to_affine_tuple(vrt.geo_transform),
'crs_wkt': vrt.crs_wkt,
'band_nodata': band_nodata,
'band_nodata_values': (
[b.nodata for b in vrt.bands] if vrt.bands else None
),
})

# Validate ``band`` against the parsed band count, matching the
# internal reader's contract so the failure mode is the same whether
# the user reads eagerly or chunked.
Expand Down
5 changes: 3 additions & 2 deletions xrspatial/geotiff/_crs.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import numbers
import warnings

from ._errors import UnparseableCRSError
from ._runtime import GeoTIFFFallbackWarning, _geotiff_strict_mode


Expand Down Expand Up @@ -165,15 +166,15 @@ def _validate_crs_fallback(
return
if allow_unparseable_crs:
return
raise ValueError(
raise UnparseableCRSError(
"crs is not an EPSG code, is not a WKT string "
"(no PROJCS / GEOGCS / PROJCRS / GEOGCRS root), and could not "
f"be parsed: got {wkt_fallback!r}. Writing it verbatim to "
"GTCitationGeoKey would produce a file most GeoTIFF readers "
"cannot interpret. Pass an EPSG int (recommended), a real "
"WKT string, install pyproj so EPSG / PROJ tokens can be "
"resolved, or pass allow_unparseable_crs=True to keep the "
"pre-#1929 citation-only behaviour."
"pre-#1929 citation-only behaviour. See issue #1987."
)


Expand Down
Loading
Loading