Skip to content

Allow Partitioning::Range to satisfy window Distribution::KeyPartitioned requirements #23289

Description

@gene-bordegaray

Related:

Is your feature request related to a problem or challenge?

Window functions like WindowAggExec and BoundedWindowAggExec require Distribution::KeyPartitioned, but Partitioning::Range does not yet generally satisfy this through Partitioning::satisfaction.

This means DF can insert an unnecessary repartition even when the input is already range partitioned which satisfies the distribution.

Example:

BoundedWindowAggExec: PARTITION BY [a] ORDER BY [...]
  Input: Partitioning::Range(ordering=[a ASC], split_points=[...])

Rows with the same a value are colocated in one range partition, so the window does not need a hash repartition just to satisfy the KeyPartitioned distribution.

Describe the solution you'd like

Allow compatible Partitioning::Range inputs to satisfy window partition requirements for window operators.

This should include:

  • WindowAggExec::required_input_distribution
  • BoundedWindowAggExec::required_input_distribution
  • Exact range satisfaction, such as Range([a]) satisfying PARTITION BY a
  • Subset range satisfaction, such as Range([a]) satisfying PARTITION BY a, b, when subset satisfaction is enabled
  • Fallback to hash repartitioning when the range key is incompatible.
  • This should include a suit of slt tests in range_partitioning.slt that tests core behavior across positive / negative cases and settings like:
    • datafusion.optimizer.preserve_file_partitions
    • datafusion.execution.target_partitions
    • datafusion.optimizer.subset_repartition_threshold

This should be a private satisfaction implementation. General Partitioning::satisfaction for Partitioning::Range will be implemented once a more operators have been covered to reduce blast-radius of optimizer changes.

Additional context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions