feat: physical execution for range partitioning by saadtajwar · Pull Request #23231 · apache/datafusion

saadtajwar · 2026-06-29T02:05:17Z

Which issue does this PR close?

Closes Support physical execution of Range repartitioning #23137

Rationale for this change

Range repartitioning was already planned and serialized into physical plans, but RepartitionExec could not execute it. This PR completes the core execution path so rows in an input batch are routed to the correct output partition based on range split points and the ordering defined on the partitioning scheme.

What changes are included in this PR?

This PR adds a Range variant to BatchPartitioner that evaluates the ordering expressions on each input batch, compares each row's key against split points using compare_rows (respecting ASC/DESC and null ordering), and assigns row indices to output partitions via binary search. The partitioned row indices are then materialized into sub-batches using the same partition_grouped_take path as hash repartitioning. pull_from_input is wired to construct a range partitioner for Partitioning::Range, replacing the previous not_impl_err! at execution time.

Optimizer-related paths remain intentionally unimplemented and are tracked in #23230: projection pushdown through RepartitionExec (try_swapping_with_projection), sort pushdown (try_pushdown_sort), and changing partition counts via repartitioned().

Are these changes tested?

Yes!

Are there any user-facing changes?

No public API changes

saadtajwar · 2026-06-29T02:07:16Z

+                    )?;
+
+                    indices.iter_mut().for_each(|v| v.clear());
+                    let sort_options: Vec<SortOptions> =


I could create this at construction time to avoid re-creating on every invocation of partition_iter?

saadtajwar · 2026-06-29T02:07:48Z

+    /// This function takes the `arrays` associated with the evaluated expressions for the ordering, split points and sort options, and indices array
+    /// Then for every row, creates the "row key" based on the given ordering for the range, and binary searches through the split points to find the appropriate partition index
+    /// That partition index is associated with the array in `indices`, which is given the row index, meaning that the row is sent to the partition at that index


This comment might not be the best right now lol - I'll try to sleep on this and come may edit with something more descriptive tomorrow, open to any suggestions :)

saadtajwar · 2026-06-29T02:08:16Z

            Partitioning::Range(_) => {
                // Range partitioning optimizer propagation is tracked in
-                // https://github.com/apache/datafusion/issues/22395
+                // https://github.com/apache/datafusion/issues/23230


Intentionally left these un-implemented and created #23230 for them if that's OK!

saadtajwar · 2026-06-29T02:11:36Z

cc @gene-bordegaray ! 🎉

saadtajwar · 2026-06-29T02:19:48Z

-                return not_impl_err!(
-                    "Range partitioning execution is not implemented by RepartitionExec"
-                );
+            Partitioning::Range(range_partitioning) => {


Should we be delegating to BatchPartitioner's try_new method for these?

gene-bordegaray · 2026-06-30T12:56:26Z

Thank you for the work @saadtajwar , I think this will be very useful in upcoming efforts 😄

Before really diving into this we shoudl step back and plan how repartitioning will work from a high level first before diving into the nitty gritty. Per descussions here #23236 it seems that we will be working toward deprecating HashPartitioned and move to KeyPartitioned distribution variant.

So essentially we are going to have operators that require a KeyPartitioned distirbution with two options to achieve this. Repartition via Hash or repartition via Range. It is unclear to me exactly the best way to make this decision and if / how we can recognize to use one or the other. Should this be something that users specify as a config? Is there some way to dtect this? Should we only repartition to range if it is to a superset of the current range partitioning (example: data partitioned on day -> repartition to hour)?

These are some things I would like to discuss with other before we decide to implement anything regarding repartitioning (as of now we just preserve it from a DataSourceExec)

cc: @alamb @gabotechs @stuhood @2010YOUY01

saadtajwar · 2026-06-30T13:34:33Z

Hey @gene-bordegaray - that makes sense, thanks! I just posted some thoughts in #23236 just to help us keep the discussion centralized in one spot - looking forward to working on this all together!

asolimando · 2026-06-30T14:48:24Z

So essentially we are going to have operators that require a KeyPartitioned distirbution with two options to achieve this. Repartition via Hash or repartition via Range. [...] Should we only repartition to range if it is to a superset of the current range partitioning (example: data partitioned on day -> repartition to hour)?

Filters involving ranges (at least BETWEEN, <=, <, >, >=, involving literals at first, but possibly we can do something smart for columns and more complex expressions too) could benefit from range partitioning too, as it would allow partition pruning of entire partitions without evaluation.

stuhood · 2026-06-30T16:45:52Z

It is unclear to me exactly the best way to make this decision and if / how we can recognize to use one or the other. Should this be something that users specify as a config? Is there some way to dtect this? Should we only repartition to range if it is to a superset of the current range partitioning (example: data partitioned on day -> repartition to hour)?

From briefly looking around, I only see a few cases where a logical optimizer might want to request Range rather than Hash (things like non-equi joins, re-organizing data for output that preserves partitioning, global window functions.)

Filters involving ranges (at least BETWEEN, <=, <, >, >=, involving literals at first, but possibly we can do something smart for columns and more complex expressions too) could benefit from range partitioning too, as it would allow partition pruning of entire partitions without evaluation.

They benefit from existing Range partitioning, but it probably wouldn't make sense to repartition data using Range for that purpose: Hash will get you better balance more cheaply (rather than via binary search), and then each partition can directly evaluate the filter.

Before really diving into this we shoudl step back and plan how repartitioning will work from a high level first before diving into the nitty gritty.

But the choice to introduce Range partitioning would be a logical decision, right? So, while I agree that changing logical optimizers to request Range would take a lot of thought and design, implementing the physical side (this PR) doesn't seem to be blocked on that? Or are you concerned that the API might still shift, or that it won't have enough test-coverage?

saadtajwar · 2026-06-30T22:41:36Z

Agree with @stuhood on the above, especially on the below - while I'm still trying to understand the Distribution options and when range partitioning would be chosen as the distribution scheme, as long as the current Range partitioning API wouldn't change (using ordering, split points, etc), especially the physical executors, it makes sense to me that the physical execution here should remain the same

But the choice to introduce Range partitioning would be a logical decision, right? So, while I agree that changing logical optimizers to request Range would take a lot of thought and design, implementing the physical side (this PR) doesn't seem to be blocked on that? Or are you concerned that the API might still shift, or that it won't have enough test-coverage?

gene-bordegaray · 2026-07-01T12:51:23Z

But the choice to introduce Range partitioning would be a logical decision, right? So, while I agree that changing logical optimizers to request Range would take a lot of thought and design, implementing the physical side (this PR) doesn't seem to be blocked on that? Or are you concerned that the API might still shift, or that it won't have enough test-coverage?

@stuhood I am most concerned with implementing physical layer behavior before having a real use for it that we can represent. What would the use case of being able to repartiution on range right now be? Do you have a use case where you would like to phsyically insert a repartition on range? maybe this is a good place to start the conversation on where and how this should be decided 🤔

alamb · 2026-07-01T17:47:36Z

@stuhood I am most concerned with implementing physical layer behavior before having a real use for it that we can represent. What would the use case of being able to repartiution on range right now be? Do you have a use case where you would like to phsyically insert a repartition on range? maybe this is a good place to start the conversation on where and how this should be decided 🤔

The main usecase we have at the moment for range partitioning is when the input source data is already range partitioned and the point of the work in this epic is for DataFusion to know about that (pre-existing) partitioning and take advantage of it

I think you guys are talking about having hte optimizer decide to repartition data into ranges (e.g. when it wants to add more parallelism to the plan). That would probably need to be a cost based decision based on statistics (like value distributions) that we don't yet have in DataFusion (and maybe never will have).

alamb · 2026-07-01T17:48:06Z

TLDR is I agree with @stuhood

They benefit from existing Range partitioning, but it probably wouldn't make sense to repartition data using Range for that purpose: Hash will get you better balance more cheaply (rather than via binary search), and then each partition can directly evaluate the filter.

👍

stuhood · 2026-07-01T18:51:18Z

But the choice to introduce Range partitioning would be a logical decision, right? So, while I agree that changing logical optimizers to request Range would take a lot of thought and design, implementing the physical side (this PR) doesn't seem to be blocked on that? Or are you concerned that the API might still shift, or that it won't have enough test-coverage?

@stuhood I am most concerned with implementing physical layer behavior before having a real use for it that we can represent. What would the use case of being able to repartiution on range right now be? Do you have a use case where you would like to phsyically insert a repartition on range? maybe this is a good place to start the conversation on where and how this should be decided 🤔

Understood.

Yea, I don't feel strongly about it either way... but I don't really love the idea of leaving in todo!()s that would panic if actually used. Even if the DF repository itself does not contain optimizer rules which introduce Range, I could imagine consuming repos attempting to introduce it, only to have it fail? But perhaps the DF project would prefer those folks to then come and talk about their use cases, to see whether they could be upstreamed.

alamb · 2026-07-01T19:02:57Z

Yea, I don't feel strongly about it either way... but I don't really love the idea of leaving in todo!()s that would panic if actually used. Even if the DF repository itself does not contain optimizer rules which introduce Range, I could imagine consuming repos attempting to introduce it, only to have it fail? But perhaps the DF project would prefer those folks to then come and talk about their use cases, to see whether they could be upstreamed.

I agree panics are not great -- returing an NotYetImplemned error would be better.

gene-bordegaray · 2026-07-01T19:15:45Z

I agree panics are not good and to keep the existing not_impl_err()s but for this specific PR are we agreeing that it is best to not introduce this until DF can explicitly make a decision like with CBO as mentioned or a user has a need for this so we can better understand the use case before implementing?

saadtajwar and others added 8 commits June 24, 2026 11:34

feat: physical execution of range repartitioning

f083b3a

adding indices for range repr

15850ba

first pass on putting rows into partition indexes

04e0d5e

binary search done

cec03e4

added testing

a78b198

comment change

86c7e5d

another comment change for partition_grouped_take

3286de6

Merge branch 'main' into saadt/range-repartition-physical-exec

30dbdeb

github-actions Bot added the physical-plan Changes to the physical-plan crate label Jun 29, 2026

saadtajwar commented Jun 29, 2026

View reviewed changes

saadtajwar mentioned this pull request Jun 29, 2026

Support physical execution of Range repartitioning #23137

Open

saadtajwar commented Jun 29, 2026

View reviewed changes

saadtajwar mentioned this pull request Jun 30, 2026

Replace / rename HashPartitioned distribution as KeyPartitioned #23236

Closed

Uh oh!

Conversation

saadtajwar commented Jun 29, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

saadtajwar Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

saadtajwar Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

saadtajwar Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

saadtajwar commented Jun 29, 2026

Uh oh!

saadtajwar Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saadtajwar commented Jun 30, 2026

Uh oh!

asolimando commented Jun 30, 2026

Uh oh!

stuhood commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saadtajwar commented Jun 30, 2026

Uh oh!

gene-bordegaray commented Jul 1, 2026

Uh oh!

alamb commented Jul 1, 2026

Uh oh!

alamb commented Jul 1, 2026

Uh oh!

stuhood commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Jul 1, 2026

Uh oh!

gene-bordegaray commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

saadtajwar Jun 29, 2026 •

edited

Loading

gene-bordegaray commented Jun 30, 2026 •

edited

Loading

stuhood commented Jun 30, 2026 •

edited

Loading

stuhood commented Jul 1, 2026 •

edited

Loading