Skip to content

refactor(rust/sedona-spatial-join): Support wraparound rectangles in EvaluatedGeometryArray#799

Merged
paleolimbot merged 32 commits intoapache:mainfrom
paleolimbot:bounding-for-geog
May 5, 2026
Merged

refactor(rust/sedona-spatial-join): Support wraparound rectangles in EvaluatedGeometryArray#799
paleolimbot merged 32 commits intoapache:mainfrom
paleolimbot:bounding-for-geog

Conversation

@paleolimbot
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot commented Apr 28, 2026

This PR adds support for wraparound bounds in the EvaluatedGeometryArray. It introduces a new struct, Bounds2D, which are just f32 bounds with wraparound support. The main consequence of this was removing the bounding box from the GeometrySummary and reusing the bounds from the evaluated array when constructing the GeoStatistics. This solves the issue of geographies having possibly bogus GeoStatistics, although does introduce the possibility that we trigger some not-yet implemented behaviour in the out-of core join (that may require further eliminating Rect usage in the kdb and rtree partitioners).

Closes #782.

@github-actions github-actions Bot requested a review from zhangfengcdt April 28, 2026 18:38
Comment on lines +115 to +118
/// The method can be called multiple times to insert data in batches before finalizing. The values
/// in rects are ordered (xmin, ymin, xmax, ymax).
pub fn push_build(&mut self, rects: &[(f32, f32, f32, f32)]) -> Result<()> {
// Re-interpreting rects as a flat f32 array (xmin, ymin, xmax, ymax)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pwrliang to make sure I didn't muck this up...the evaluated geometry array is no longer using Rect so I updated this to just use f32s. I don't think the GPU tests actually run here but this does build.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. I also pulled the branch and ran the CI on it https://github.com/wherobots/sedona-db-gpu-tester/actions/runs/25140655727/job/73694467245

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

let rect = Rect::new(coord!(x: min_x, y: min_y), coord!(x: max_x, y: max_y));
rect_vec.push(Some(rect));
if let Some((min_x, min_y, max_x, max_y)) = maybe_bounds {
rect_vec.push(Bounds2D::new((min_x, max_x), (min_y, max_y)));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main point of the PR...eliminate the expansion of potentially very small feature bounds into the entire width of the earth.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests were copied by via LLM from the sedona-spatial-join crate, taking the random geometry parameters from the Python tests. I needed this to debug a deduplication issue but it is good to have regardless.

Comment on lines +215 to +216
for (idx, rect_opt) in rects.iter().enumerate() {
if let Some(rect) = rect_opt {
native_rects.push(*rect);
} else {
for (idx, rect) in rects.iter().enumerate() {
if rect.is_empty() {
native_rects.push(empty_rect);
} else {
let (x, y) = rect.clone().into_inner();
native_rects.push((x.0, y.0, x.1, y.1));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pwrliang...did I reinterpret this correctly?

Comment on lines +189 to 202
for (idx, rect) in rects.iter().enumerate() {
let (left, right) = rect.split(&self.wraparound.unwrap_or(Interval::empty()));
if !left.is_empty() {
let (x, y) = left.into_inner();
let data_idx = rtree_builder.add(x.0, y.0, x.1, y.1);
batch_pos_vec[data_idx as usize] = (batch_idx as i32, idx as i32);
}

if !right.is_empty() {
let (x, y) = right.into_inner();
let data_idx = rtree_builder.add(x.0, y.0, x.1, y.1);
batch_pos_vec[data_idx as usize] = (batch_idx as i32, idx as i32);
}
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change in the spatial index: we can now insert two rectangles into the index instead of one.

Comment on lines +541 to +581
// using several boxes.
candidates.sort_unstable();
candidates.dedup();
// using several boxes (e.g., for antimeridian-crossing geometries).
// First dedup by data_idx (fast), then dedup by position (handles wraparound case).
if self.inner.wraparound.is_some() {
candidates.sort_unstable();
candidates.dedup();

// Dedup by position: when a geometry spans the antimeridian, it may be indexed
// as two separate boxes with different data_idx values that map to the same position.
let mut seen_positions: std::collections::HashSet<(i32, i32)> =
std::collections::HashSet::new();
candidates.retain(|data_idx| {
let pos = self.inner.data_id_to_batch_pos[*data_idx as usize];
seen_positions.insert(pos)
});
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and when we query it we have to deduplicate the results. The deduplication happened already before this PR but the DWithin tests exposed that it didn't handle the multiple source rectangles properly.

We may want to solve this performance issue before merging this PR (a HashSet is almost certainly slowing us down here).

Comment on lines +24 to +32
/// A float32 bounding box with wraparound support
///
/// This struct is conceptually similar to the Rect<f32> but explicitly supports
/// wraparound x intervals to ensure raw xmin and xmax values are not misused.
#[derive(Debug, Clone, PartialEq)]
pub struct Bounds2D {
x: (f32, f32),
y: (f32, f32),
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main utility supporting the EvaluatedGeometryArray. I played with making the IntervalTrait generic but it was a bit of a pain and I was keen to keep this scoped.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Sedona’s Rust spatial-join stack to represent per-geometry bounds with wraparound (antimeridian) support via a new Bounds2D type, and threads those bounds through spatial indexing and statistics so geography joins don’t produce misleading GeoStatistics.

Changes:

  • Introduces Bounds2D (wraparound-aware f32 bounds) and replaces Option<Rect<f32>> rectangles throughout EvaluatedGeometryArray and related spill/partition/index code.
  • Updates statistics collection to ingest per-row bounding boxes directly (removing bbox from GeometrySummary, renaming analyze_geometryanalyze_wkb).
  • Adds geography integration tests focused on antimeridian-crossing geometries; updates GPU plumbing to use the new bounds representation.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
rust/sedona-testing/src/datagen.rs Updates tests to use analyze_wkb after analyze refactor.
rust/sedona-testing/src/benchmark_util.rs Updates benchmark tests to use analyze_wkb.
rust/sedona-spatial-join/src/utils/bounds.rs Adds new wraparound-aware Bounds2D type + unit tests.
rust/sedona-spatial-join/src/utils.rs Exposes the new bounds module.
rust/sedona-spatial-join/src/stream.rs Uses per-row bounds when updating probe-side GeoStatistics.
rust/sedona-spatial-join/src/partitioning/util.rs Removes geo_rect_to_bbox helper (Rect no longer the primary representation).
rust/sedona-spatial-join/src/partitioning/stream_repartitioner.rs Switches repartitioning assignment and per-slot stats to use Bounds2D/BoundingBox.
rust/sedona-spatial-join/src/operand_evaluator.rs Replaces Rect<f32> with Bounds2D in evaluated arrays; distance expansion now uses interval expansion.
rust/sedona-spatial-join/src/index/partitioned_index_provider.rs Updates stats construction in tests to use bounds-aware accumulator updates.
rust/sedona-spatial-join/src/index/default_spatial_index_builder.rs Adds wraparound insertion support by splitting wraparound bounds into multiple rectangles.
rust/sedona-spatial-join/src/index/default_spatial_index.rs Adds wraparound-aware probing by splitting probe rectangles; dedups candidates across split inserts.
rust/sedona-spatial-join/src/index/build_side_collector.rs Builds build-side stats and bbox samples from evaluated bounds.
rust/sedona-spatial-join/src/evaluated_batch/spill.rs Serializes/deserializes Bounds2D instead of optional Rect.
rust/sedona-spatial-join-gpu/src/join_provider.rs Aligns GPU evaluated-array factory with Bounds2D bounds computation.
rust/sedona-spatial-join-gpu/src/index/gpu_spatial_index_builder.rs Switches GPU index build input rectangles to raw 4-float tuples.
rust/sedona-spatial-join-gpu/src/index/gpu_spatial_index.rs Switches GPU probe rectangles to raw 4-float tuples.
rust/sedona-spatial-join-gpu/Cargo.toml Updates deps to rely on sedona-geometry rather than geo-index/geo-types in this crate.
rust/sedona-spatial-join-geography/tests/spatial_join_integration.rs Adds integration tests for antimeridian-crossing geography joins.
rust/sedona-spatial-join-geography/src/spatial_index_builder.rs Configures spatial index builder with wraparound bounds (-180..180).
rust/sedona-spatial-join-geography/src/join_provider.rs Produces wraparound-aware bounds in evaluated arrays via Bounds2D.
rust/sedona-spatial-join-geography/Cargo.toml Adds dev-deps needed for new integration tests.
rust/sedona-geometry/src/interval.rs Makes WraparoundInterval::split() public for wraparound decomposition.
rust/sedona-geometry/src/analyze.rs Renames/changes analysis API (analyze_wkb) and removes bbox from GeometrySummary.
rust/sedona-functions/src/st_analyze_agg.rs Refactors Analyze accumulator to accept explicit bbox (including wraparound) when updating stats.
c/sedona-libgpuspatial/src/lib.rs Changes GPU rect input type to tuples and updates tests accordingly.
Cargo.lock Reflects dependency graph updates from the refactor and added tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rust/sedona-spatial-join/src/utils/bounds.rs Outdated
Comment thread rust/sedona-spatial-join/src/index/default_spatial_index.rs
Comment thread rust/sedona-spatial-join/src/index/default_spatial_index_builder.rs
Comment on lines +603 to 622
for rect in batch.geom_array.rects() {
let partition = if rect.is_empty() {
// Round-robin empty geometries through regular partitions to avoid
// overloading a single slot when the build side is mostly empty.
let p = SpatialPartition::Regular(cnt);
cnt = (cnt + 1) % num_regular_partitions;
p
} else {
partitioner.partition_no_multi(&BoundingBox::xy(rect.x(), rect.y()))?
};
assignments.push(partition);
}
}
PartitionedSide::ProbeSide => {
for rect_opt in batch.geom_array.rects() {
let partition = match rect_opt {
Some(rect) => partitioner.partition(&geo_rect_to_bbox(rect))?,
None => SpatialPartition::None,
for rect in batch.geom_array.rects() {
let partition = if rect.is_empty() {
SpatialPartition::None
} else {
partitioner.partition(&BoundingBox::xy(rect.x(), rect.y()))?
};
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign_rows() now passes BoundingBox::xy(rect.x(), rect.y()) directly into the partitioner. For wraparound x-intervals this will currently error in the partitioning utilities (e.g., bbox_to_f32_rect returns an Execution error when bbox.x().is_wraparound()). This means repartitioning/out-of-core paths can fail for geography antimeridian-crossing geometries. Consider making assign_rows() wraparound-aware (e.g., split bounds using the configured absolute wraparound interval and union the partition results / force Multi), or ensure geography joins avoid partitioners that require non-wraparound rectangles until wraparound partitioning is implemented.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid...if a partition ends up with wraparound bounds, the out of core join will currently error. I opened #800 to track.

Comment thread rust/sedona-spatial-join/src/index/default_spatial_index.rs Outdated
Comment thread c/sedona-libgpuspatial/src/lib.rs
Co-authored-by: Copilot <copilot@github.com>
@paleolimbot paleolimbot requested a review from Copilot April 29, 2026 14:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@zhangfengcdt
Copy link
Copy Markdown
Member

Can we also add tests for regions around the poles, hemisphere-spanning polygons, or full-globe geographies?

@paleolimbot paleolimbot marked this pull request as ready for review April 29, 2026 17:56
@paleolimbot paleolimbot requested a review from Copilot April 29, 2026 17:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread c/sedona-libgpuspatial/src/lib.rs
Comment thread rust/sedona-geometry/src/analyze.rs Outdated
paleolimbot and others added 4 commits April 29, 2026 14:04
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
@paleolimbot
Copy link
Copy Markdown
Member Author

Can we also add tests for regions around the poles, hemisphere-spanning polygons, or full-globe geographies?

I added a few of these (and it should be easy to add more as we come across more difficult scenarios!)

Copy link
Copy Markdown
Member

@zhangfengcdt zhangfengcdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@paleolimbot paleolimbot merged commit ba06a25 into apache:main May 5, 2026
21 checks passed
@paleolimbot paleolimbot deleted the bounding-for-geog branch May 5, 2026 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rust/sedona-spatial-join: Add proper support for wraparound rectangles in geography join

4 participants