[SPARK-55551][SQL] Improve BroadcastHashJoinExec output partitioning#54335
[SPARK-55551][SQL] Improve BroadcastHashJoinExec output partitioning#54335peter-toth wants to merge 2 commits intoapache:masterfrom
BroadcastHashJoinExec output partitioning#54335Conversation
| HashPartitioning(Seq(l3), 1)))), | ||
| right = DummySparkPlan()) | ||
| expected = PartitioningCollection(Seq( | ||
| PartitioningCollection(Seq( |
There was a problem hiding this comment.
Keeping a PartitioningCollection in a PartitioningCollection has no benefit.
| @@ -96,28 +97,23 @@ case class BroadcastHashJoinExec private( | |||
| } | |||
|
|
|||
| // Expands the given partitioning collection recursively. | |||
There was a problem hiding this comment.
Could you revise this method description a little more according to your code change? The method is slightly changed to handle Partitioning instead of PartitionCollection.
There was a problem hiding this comment.
Indeed. I fixed the comments in 3d75c6b.
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
Outdated
Show resolved
Hide resolved
| case other => other | ||
| val expandedPartitioning = expandOutputPartitioning(streamedPlan.outputPartitioning) | ||
| expandedPartitioning match { | ||
| case Nil => UnknownPartitioning(streamedPlan.outputPartitioning.numPartitions) |
There was a problem hiding this comment.
This logic looks new to me. Is this an improvement?
There was a problem hiding this comment.
Thanks for pointing this out. I added a comment in 3d75c6b. This could only happen if there was an empty PartitioningCollection in streamedPlan.outputPartitioning. UnknownPartitioning is always a valid alternative as it satisfies only distributions with the lowest requirements.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
The refactoring itself looks reasonable to me. I have a few comments.
c67158d to
3d75c6b
Compare
|
Thanks @dongjoon-hyun for taking a look. Cc @cloud-fan as well. |
What changes were proposed in this pull request?
This is a minor refector of
BroadcastHashJoinExec.outputPartitioningto:Partitioning with Expressioninstead ofHashPartitioningLike.Why are the changes needed?
Code cleanup and add support for future partitionings that implement
Expressionbut notHashPartitioningLike. (LikeKeyedPartitioningis in #54330.)Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.