Skip to content

feat(raster-gdal): add parse_outdb_source helper for the GDAL format driver#812

Merged
paleolimbot merged 1 commit into
apache:mainfrom
james-willis:jw/raster-outdb-uri-parser
May 5, 2026
Merged

feat(raster-gdal): add parse_outdb_source helper for the GDAL format driver#812
paleolimbot merged 1 commit into
apache:mainfrom
james-willis:jw/raster-outdb-uri-parser

Conversation

@james-willis
Copy link
Copy Markdown
Contributor

@james-willis james-willis commented May 4, 2026

Summary

Adds parse_outdb_source(uri) -> (Cow<str>, u32) in
rust/sedona-raster-gdal/src/source_uri.rs. It is a GDAL-format-driver-internal helper: when an outdb band's outdb_format dispatches to GDAL, the loader calls this to pick the source band out of outdb_uri. Two recognised forms:

  • SedonaDB convention (GDAL driver only)<uri>#band=N(uri without fragment, N).
    rsplit_once('#') is used so a trailing #band=N wins over an earlier #anchor in the URI.
  • GDAL native subdataset URIHDF5:\"x.h5\":/var, NETCDF:\"file.nc\":var, GTIFF_DIR:N:foo.tif, … → (uri verbatim, 1). We don't try to interpret the GDAL grammar; we just pass it through.
  • Plain URI / no fragment / non-band= fragment(uri, 1).
  • Malformed #band= fragment (non-numeric, zero, negative, > u32::MAX) — returns an Execution error. The user explicitly asked for a band; we refuse to silently fall back to band 1.

Returns Cow::Borrowed for both common cases (no allocation when the input has no fragment, and only the trimmed prefix is borrowed when a #band=N is present).

Scope

This is not a SedonaDB-wide URI convention. The schema treats outdb_uri as opaque, and RS_BandPath and other format-agnostic surfaces continue to do so. The #band=N shape only has meaning inside the GDAL format driver — other format drivers (Zarr, MDArray, …) handle their own URIs however they want, and unrelated callers like RS_BandPath are untouched.

Why encode the source band index inside outdb_uri (for GDAL)?

#787 introduced an outdb_band_id: u32 column alongside outdb_url. The follow-up work on N-D rasters needs the GDAL driver to also support GDAL native subdataset URIs (HDF5 groups, multi-page GeoTIFFs, NetCDF variables). Those URIs already carry the source-selector inline, so a typed sibling column would be redundant and couldn't represent them. Inside the GDAL driver, folding both forms (#band=N for plain rasters, native subdataset syntax for everything else) into one parser keeps the driver's URI handling in one place.

Purely additive

No existing call sites are rewired. The first caller will appear in a follow-up PR that ports the GDAL loader to read the source band selector from outdb_uri instead of a separate outdb_band_id column.

Test plan

  • cargo test -p sedona-raster-gdal — 43 unit tests + 1 doctest pass
  • cargo build -p sedona-schema -p sedona-raster -p sedona-raster-gdal -p sedona-raster-functions -p sedona-testing
  • cargo test -p sedona-schema -p sedona-raster -p sedona-raster-gdal -p sedona-raster-functions -p sedona-testing — 315 unit tests + 4 doctests, 0 failures
  • cargo clippy -p sedona-schema -p sedona-raster -p sedona-raster-gdal -p sedona-raster-functions -p sedona-testing --all-targets -- -D warnings — clean
  • cargo fmt --all --check — clean
  • pre-commit run --files rust/sedona-raster-gdal/src/source_uri.rs rust/sedona-raster-gdal/src/lib.rs — clean

The unit tests in source_uri::tests cover: default band 1; explicit band=N; malformed #band= (zero, negative, overflow, non-numeric) returning a clear Execution error; URL query strings preserved before fragment; local paths; HDF5/NETCDF/GTIFF_DIR subdataset passthrough; trailing #band=N winning over earlier #anchor; empty URI; Cow::Borrowed invariants.

@james-willis james-willis marked this pull request as draft May 4, 2026 20:30
@github-actions github-actions Bot requested a review from prantogg May 4, 2026 20:30
@james-willis james-willis changed the title feat(raster-gdal): add parse_outdb_source for outdb URI parsing feat(raster-gdal): add parse_outdb_source helper for the GDAL format driver May 4, 2026
@james-willis james-willis marked this pull request as ready for review May 4, 2026 22:20
@james-willis james-willis force-pushed the jw/raster-outdb-uri-parser branch from dd61091 to bf5ba23 Compare May 4, 2026 23:07
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few optional suggestions...thank you!

Comment thread rust/sedona-raster-gdal/src/lib.rs Outdated
Comment thread rust/sedona-raster-gdal/src/source_uri.rs Outdated
Comment thread rust/sedona-raster-gdal/src/source_uri.rs Outdated
@james-willis james-willis force-pushed the jw/raster-outdb-uri-parser branch from bf5ba23 to 9b5c8dc Compare May 5, 2026 17:08
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit (feel free to merge whenever after an approve when you're happy with it 🙂 )

Comment thread rust/sedona-raster-gdal/src/source_uri.rs Outdated
Add a crate-private parse_outdb_source helper that splits a SedonaDB
outdb URI into the underlying URI plus a 1-based source band index.
Two URI shapes are accepted, both private to the GDAL format driver:

- '<uri>#band=N' — SedonaDB convention for selecting band N.
- GDAL native subdataset URI ('HDF5:"x.h5":/var', 'GTIFF_DIR:N:foo.tif',
  ...) — passed through verbatim, defaulting to band 1.

Plain URIs default to band 1. Malformed '#band=' fragments (non-numeric,
zero, negative, > u32::MAX) return a clear Execution error.

Format-agnostic surfaces (incl. RS_BandPath) treat outdb_uri as opaque;
the parser is dispatched only when outdb_format routes to the GDAL
driver.
@james-willis james-willis force-pushed the jw/raster-outdb-uri-parser branch from 9b5c8dc to 8e8a6cf Compare May 5, 2026 18:13
@james-willis james-willis requested a review from paleolimbot May 5, 2026 19:38
@paleolimbot paleolimbot merged commit f020db4 into apache:main May 5, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants