Spark 4.1: Implement SupportsReportOrdering DSv2 API by anuragmantri · Pull Request #16750 · apache/iceberg

anuragmantri · 2026-06-10T00:14:43Z

This PR depends on #14948

This PR implements the Spark DSv2 SupportsReportOrdering API to report sort order to Spark, enabling sort elimination for partitioned tables when reading sorted Iceberg tables that have a defined sort order and files are written respecting that order.

Sort order reporting can be enabled with:

SET spark.sql.iceberg.planning.preserve-data-ordering = true; (default false)

Implementation summary:

SortOrderAnalyzer validates two conditions before SparkPartitioningAwareScan.outputOrdering() reports ordering to Spark:
- all files carry the current sort order ID
- each grouping key maps to exactly one task group (bin-packing must not split partitions)
Merging Sorted Files: When ordering is reported, another PR (Spark 4.1: Add MergingSortedRowDataReader for k-way merge of sorted files #14948) adds MergingSortedRowDataReader to merge rows from multiple sorted files within a partition using k-way merge. The plumbing for the merging reader (SparkRowReaderFactory, SparkBatch) is included here.

Constraints:

When preserve-data-ordering is enabled, bin-packing of large partitions is disabled. All files within a partition are placed into a single Spark task. This is a known limitation of the current KeyGroupedPartitioning approach and is expected to be addressed in SPARK-56241.
Vectorized reads are disabled for partitions with more than one file since k-way merge is row-based.
This implementation only reports sort order if files are sorted in the current table sort order.

Depends on #14948 for MergingSortedRowDataReader.

AI Usage: I used Claude Opus 4.6 for code generation and writing tests. I manually reviewed the generated code.

Spark 4.1: Implement SupportsReportOrdering DSv2 API

a417b12

github-actions Bot added the spark label Jun 10, 2026

anuragmantri requested review from RussellSpitzer, aokolnychyi and huaxingao June 10, 2026 00:24

anuragmantri mentioned this pull request Jun 10, 2026

Spark 4.1: Add MergingSortedRowDataReader for k-way merge of sorted files #14948

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 4.1: Implement SupportsReportOrdering DSv2 API#16750

Spark 4.1: Implement SupportsReportOrdering DSv2 API#16750
anuragmantri wants to merge 1 commit into
apache:mainfrom
anuragmantri:supports-report-ordering-plumbing-v2

anuragmantri commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anuragmantri commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anuragmantri commented Jun 10, 2026 •

edited

Loading