Skip to content

Allow Partitioning::Range to satisfy PartitionedTopK Distribution::KeyPartitioned requirements #23290

Description

@gene-bordegaray

Related:

Is your feature request related to a problem or challenge?

Partitioned TopK operations like PartitionedTopKExec require Distribution::KeyPartitioned, but Partitioning::Range does not yet generally satisfy this through Partitioning::satisfaction.

This means DF can insert an unnecessary repartition even when the input is already range partitioned which satisfies the distribution.

Example:

PartitionedTopKExec: partition=[a], order=[...]
  Input: Partitioning::Range(ordering=[a ASC], split_points=[...])

Rows with the same a value are colocated in one range partition, so PartitionedTopK does not need a hash repartition just to satisfy the KeyPartitioned distribution.

Describe the solution you'd like

Allow compatible Partitioning::Range inputs to satisfy partition requirements for PartitionedTopKExec.

This should include:

  • PartitionedTopKExec::required_input_distribution
  • Exact range satisfaction, such as Range([a]) satisfying partition=[a]
  • Subset range satisfaction, such as Range([a]) satisfying partition=[a, b], when subset satisfaction is enabled
  • Fallback to hash repartitioning when the range key is incompatible.
  • This should include a suit of slt tests in range_partitioning.slt that tests core behavior across positive / negative cases and settings like:
    • datafusion.optimizer.preserve_file_partitions
    • datafusion.execution.target_partitions
    • datafusion.optimizer.subset_repartition_threshold

This should be a private satisfaction implementation. General Partitioning::satisfaction for Partitioning::Range will be implemented once a more operators have been covered to reduce blast-radius of optimizer changes.

Additional context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions