Skip to content

Poc: expose group hash feedback from GroupValues#23229

Draft
Rachelint wants to merge 10 commits into
apache:mainfrom
Rachelint:fuse-aggr-repart-poc-2
Draft

Poc: expose group hash feedback from GroupValues#23229
Rachelint wants to merge 10 commits into
apache:mainfrom
Rachelint:fuse-aggr-repart-poc-2

Conversation

@Rachelint

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of hash aggregate repartition POC.

Rationale for this change

This POC prepares GroupValues for partition-aware partial aggregate output. The partial aggregate output path needs to know which input rows created new groups, and the hash for those rows, so it can later route newly-created group ids to target partitions without relying on a separate repartition + coalesce pipeline.

What changes are included in this PR?

  • Extend GroupValues::intern to fill per-row hashes and optionally return the input rows that created new groups.
  • Keep hash calculation inside GroupValues implementations.
  • Wire reusable hash/new-group-row buffers through aggregate hash table and row-hash paths.
  • Update existing call sites and benchmarks to the new intern signature.

Are these changes tested?

Yes:

  • cargo fmt --all
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test -p datafusion-physical-plan group_values --lib

@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Jun 29, 2026
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-physical-plan v54.0.0 (current)
       Built [  75.939s] (current)
     Parsing datafusion-physical-plan v54.0.0 (current)
      Parsed [   0.134s] (current)
    Building datafusion-physical-plan v54.0.0 (baseline)
       Built [  39.856s] (baseline)
     Parsing datafusion-physical-plan v54.0.0 (baseline)
      Parsed [   0.137s] (baseline)
    Checking datafusion-physical-plan v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.587s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure trait_method_parameter_count_changed: pub trait method parameter count changed ---

Description:
A trait method now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-item-signature
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/trait_method_parameter_count_changed.ron

Failed in:
  GroupValues::intern now takes 4 instead of 2 parameters, in file /home/runner/work/datafusion/datafusion/datafusion/physical-plan/src/aggregates/group_values/mod.rs:102

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [ 118.031s] datafusion-physical-plan

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 29, 2026
@Rachelint Rachelint force-pushed the fuse-aggr-repart-poc-2 branch from f501156 to 03c7079 Compare June 29, 2026 01:29
@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 29, 2026
@Rachelint Rachelint marked this pull request as draft June 29, 2026 05:37
@Rachelint Rachelint force-pushed the fuse-aggr-repart-poc-2 branch from 8e3137d to 0e09f71 Compare June 29, 2026 08:45
@github-actions github-actions Bot removed the physical-expr Changes to the physical-expr crates label Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant