Aggregations Support `Partitioning::Range` by gene-bordegaray · Pull Request #23239 · apache/datafusion

gene-bordegaray · 2026-06-29T13:35:46Z

Which issue does this PR close?

Closes Allow Range partitioning to satisfy grouped aggregation requirements #23191.
Related discussion: Support co-partitioned range inner equi joins #23184, Replace / rename HashPartitioned distribution as KeyPartitioned #23236.

Rationale for this change

Range partitioning can satisfy aggregate hash partitioning: equal group keys are already partitioned, even though the partitioning is not hash-based.

This is the first unary-operator implementation from the range partitioning discussion before making broader public API changes around HashPartitioned / KeyPartitioned.

What changes are included in this PR?

Let compatible range partitioning satisfy aggregate hash distribution requirements in EnforceDistribution
Keep this private to aggregate planning for now to not make public API changes to Distribution enum variants yet until more operators are supported

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. Range-partitioned aggregate plans can now avoid hash repartitioning.

gene-bordegaray · 2026-06-29T13:59:51Z

    plan.is::<RepartitionExec>()
 }

+/// Temporary check while `HashPartitioned` is being migrated to `KeyPartitioned`


dont know if these are worth to be shared in the crate if we are gong to eventually remove them,

gene-bordegaray · 2026-06-29T14:00:32Z

cc: @2010YOUY01 @gabotechs @stuhood

gabotechs · 2026-06-29T15:20:46Z

+    AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[]
+      AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[]
+        DataSourceExec: file_groups={4 groups: [[p0], [p1], [p2], [p3]]}, projection=[a, b, c, d, e], output_partitioning=Range([a@0 ASC], [(10), (20), (30)], 4), file_type=parquet


🤔 I think we don't need the Partial there right? we should be fine with just a FinalPartitioned aggregation

this happens in a later optimizer rule that collapses these partial -> finals where approacpriate: datafusion/physical-optimizer/src/combine_partial_final_agg.rs

that is why it shows up correctly in the slt tests 👍

Is it possible to move the unit-test coverage added in this file to end-to-end SLT tests instead?

It seems the same test goal can still be achieved at the SLT level, and those sql tests should be more stable across optimizer refactors.

For example, whether the initial physical plan uses a two-stage aggregation or a single-stage aggregation feels implementation-specific. A future refactor might legitimately change that plan shape, which would require updating these unit tests. At that point, it may be harder to recover the original intent of each assertion, and some coverage could accidentally be lost during the refactor. (while SLT behavior won't change a lot even after aggressive refactors)

2010YOUY01

Thank you! The high-level shape of the PR LGTM. I just need a bit more time to understand what EnforceDistribution is doing before finishing the review — that code looks a little intimidating 😅

2010YOUY01 · 2026-06-30T02:05:46Z

+    AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[]
+      AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[]
+        DataSourceExec: file_groups={4 groups: [[p0], [p1], [p2], [p3]]}, projection=[a, b, c, d, e], output_partitioning=Range([a@0 ASC], [(10), (20), (30)], 4), file_type=parquet


Is it possible to move the unit-test coverage added in this file to end-to-end SLT tests instead?

It seems the same test goal can still be achieved at the SLT level, and those sql tests should be more stable across optimizer refactors.

For example, whether the initial physical plan uses a two-stage aggregation or a single-stage aggregation feels implementation-specific. A future refactor might legitimately change that plan shape, which would require updating these unit tests. At that point, it may be harder to recover the original intent of each assertion, and some coverage could accidentally be lost during the refactor. (while SLT behavior won't change a lot even after aggressive refactors)

2010YOUY01 · 2026-06-30T02:12:10Z

-            .is_satisfied()
-        {
+            .is_satisfied();
+        let range_satisfies_aggregate_distribution =


I feel this PR is already treating HashPartitioned as KeyPartitioned logically, but the formal renaming PR is left to be done in a future PR 🤔

If that's the case, should we directly implement this logic into range_partitioning.satisfaction()? Otherwise we still have to do it after the formal renaming.

satisfy aggregation hash distribution with range

ee5e61f

github-actions Bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jun 29, 2026

gene-bordegaray changed the title ~~Support range partitioned aggregations~~ Aggregations Support Partitioning::Range Jun 29, 2026

gene-bordegaray commented Jun 29, 2026

View reviewed changes

gene-bordegaray marked this pull request as ready for review June 29, 2026 14:00

This was referenced Jun 29, 2026

Support co-partitioned range inner equi joins #23184

Open

Replace / rename HashPartitioned distribution as KeyPartitioned #23236

Open

gabotechs reviewed Jun 29, 2026

View reviewed changes

2010YOUY01 reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aggregations Support `Partitioning::Range`#23239

Aggregations Support `Partitioning::Range`#23239
gene-bordegaray wants to merge 1 commit into
apache:mainfrom
gene-bordegaray:gene.bordegaray/2026/06/range-partitioned-aggregations-main

gene-bordegaray commented Jun 29, 2026 •

edited

Loading

Uh oh!

gene-bordegaray Jun 29, 2026

Uh oh!

gene-bordegaray commented Jun 29, 2026

Uh oh!

gabotechs Jun 29, 2026

Uh oh!

gene-bordegaray Jun 29, 2026

Uh oh!

gene-bordegaray Jun 29, 2026

Uh oh!

2010YOUY01 Jun 30, 2026

Uh oh!

2010YOUY01 left a comment

Uh oh!

2010YOUY01 Jun 30, 2026

Uh oh!

2010YOUY01 Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

gene-bordegaray commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

gene-bordegaray Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray commented Jun 29, 2026

Uh oh!

gabotechs Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 left a comment

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gene-bordegaray commented Jun 29, 2026 •

edited

Loading