Skip to content

Replace / rename HashPartitioned distribution as KeyPartitioned #23236

Description

@gene-bordegaray

Is your feature request related to a problem or challenge?

Distribution::HashPartitioned is documented as requiring rows with equal key values to land in the same partition. While working through range partitioning, @2010YOUY01 and @gabotechs pointed out this is really a key-partitioning contract, not a requirement that the existing input is specifically hash partitioned.

This name has historically caused confusion / misuse, and as range partitioning support expands this continues to come up. The key point is that range partitioning can satisfy some single-input key partitioning requirements without specifically being hash partitioned.

Describe the solution you'd like

Clarify the API direction for this distribution requirement. Options discussed include:

  • keep HashPartitioned but document it as historical naming for key partitioning (I am not a fan of this one)
  • migrate to a KeyPartitioned name and have both Partitioning::Hash and compatible Partitioning::Range satisfy this

This issue is only about the per-input distribution requirement. Multi-input / join co-partitioning should be handled separately.

NOTE: I would prefer this rename / replacement to happen after aggregations (a unary operator) and joins (a multi-input operator) HashPartitioned distributions are satisfied via range partitioning before replacing the public HashPartitioned variant. I want to take this approach to ensure we have worked out a majority of the kinks and nuances of this replacement before making large public API changes.

Additional context

Epic: #22395

Related PRs / discussion:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions