Skip to content

geotiff: extract_geo_info float-only nodata parse loses uint64 max / int64 max sentinels #1847

@brendancol

Description

@brendancol

Summary

The reader's GDAL_NODATA parsing in _geotags.extract_geo_info (and the related helpers _reader._resolve_masked_fill / _reader._sparse_fill_value) parses the nodata string via float() unconditionally. For 64-bit integer rasters whose declared sentinel is uint64 max (2**64 - 1) or int64 max (2**63 - 1), the float64 round-trip rounds the sentinel to a value that exceeds the dtype's representable range. The downstream integer-mask gate info.min <= int(nodata) <= info.max then rejects the cast and the sentinel pixel survives as a literal valid integer rather than being masked to NaN.

Repro:

import numpy as np
import xarray as xr
import xrspatial.geotiff as g
import tempfile, os

with tempfile.TemporaryDirectory() as d:
    arr = np.full((16, 16), 100, dtype=np.uint64)
    arr[0, 0] = 2**64 - 1  # uint64 max as sentinel
    da = xr.DataArray(arr, dims=['y','x'])
    p = os.path.join(d, 't.tif')
    g.to_geotiff(da, p, nodata=2**64-1)
    out = g.open_geotiff(p)
    print(out.dtype, out.values[0, 0])  # uint64 18446744073709551615
    # expected: float64 NaN, after sentinel-to-NaN promotion

Same failure pattern for np.int64 with nodata = 2**63 - 1. Both values are unrepresentable in float64; the nearest float rounds up by 1 ULP, which then fails the dtype-range check.

Affected sites:

  • xrspatial/geotiff/_geotags.py:619 (extract_geo_info — main GDAL_NODATA parse)
  • xrspatial/geotiff/_reader.py:1166 (_resolve_masked_fill — LERC fill)
  • xrspatial/geotiff/_reader.py:1299 (_sparse_fill_value — SPARSE_OK fill)

This is the same class of bug _vrt._parse_band_nodata (PR #1833) fixed for the VRT XML parse path: try int() first, fall back to float() only for NaN/Inf/scientific/fractional. The VRT-side fix did not extend to the TIFF source-of-truth parser, so:

  • Direct open_geotiff(uint64.tif) with nodata=uint64 max → sentinel lost.
  • write_vrt([uint64.tif]) stringifies the float-parsed sentinel into XML as 1.8446744073709552e+19, which the VRT reader subsequently rejects as out-of-range.

Categories: Cat 2 (NaN propagation: sentinel pixel survives unmasked) + Cat 5 (backend inconsistency: VRT XML path handles 64-bit sentinels via _parse_band_nodata but the TIFF path does not).

Fix sketch

Lift the int-first parse from _vrt._parse_band_nodata into a shared helper and reuse it across the three TIFF-side sites. The helper takes the raw string and the target dtype; it tries int(text), falls back to float(text) for NaN/Inf/fractional/scientific, and returns the parsed value at full Python int precision when possible.

Test plan

  • uint64 nodata=264-1: NaN-mask fires, attrs['nodata'] surfaces 264-1 (int)
  • int64 nodata=263-1: NaN-mask fires, attrs['nodata'] surfaces 263-1
  • int64 nodata=-2**63 (representable): no regression
  • uint16 nodata=65535 (fits float64): no regression
  • uint64 source → write_vrt → read_vrt: sentinel round-trips

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions