Skip to content

[Bench] u32 group indices#20801

Closed
Dandandan wants to merge 1 commit intoapache:mainfrom
Dandandan:claude/optimize-aggregations-u32-5Vv7z
Closed

[Bench] u32 group indices#20801
Dandandan wants to merge 1 commit intoapache:mainfrom
Dandandan:claude/optimize-aggregations-u32-5Vv7z

Conversation

@Dandandan
Copy link
Contributor

Introduce GroupIndex type alias (u32) for group indices in aggregations, replacing the previous usize. This halves the memory per group index on 64-bit platforms (4 bytes vs 8 bytes), improving cache utilization during hash aggregation. A u32 supports up to ~4 billion groups, which is more than sufficient in practice since accumulator state memory would be exhausted long before reaching that limit.

Changes:

  • Add GroupIndex = u32 type alias in datafusion-expr-common
  • Update GroupsAccumulator trait methods to use &[GroupIndex]
  • Update GroupValues::intern() to produce Vec<GroupIndex>
  • Update all accumulate helpers (NullState, accumulate_indices, etc.)
  • Update all GroupsAccumulator and GroupValues implementations
  • Update FFI layer to use RVec<u32> for group indices
  • Update call sites in row_hash, recursive_query, and order tracking

https://claude.ai/code/session_011CtJW17mfcKqhc5SYCs7JZ

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Introduce GroupIndex type alias (u32) for group indices in aggregations,
replacing the previous usize. This halves the memory per group index on
64-bit platforms (4 bytes vs 8 bytes), improving cache utilization during
hash aggregation. A u32 supports up to ~4 billion groups, which is more
than sufficient in practice since accumulator state memory would be
exhausted long before reaching that limit.

Changes:
- Add `GroupIndex = u32` type alias in `datafusion-expr-common`
- Update `GroupsAccumulator` trait methods to use `&[GroupIndex]`
- Update `GroupValues::intern()` to produce `Vec<GroupIndex>`
- Update all accumulate helpers (NullState, accumulate_indices, etc.)
- Update all GroupsAccumulator and GroupValues implementations
- Update FFI layer to use `RVec<u32>` for group indices
- Update call sites in row_hash, recursive_query, and order tracking

https://claude.ai/code/session_011CtJW17mfcKqhc5SYCs7JZ
@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate functions Changes to functions implementation ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate spark labels Mar 8, 2026
@Dandandan
Copy link
Contributor Author

Run benchmarks

@Dandandan Dandandan changed the title Use u32 group indices for aggregations to reduce memory usage [Bench] u32 group indices f Mar 8, 2026
@Dandandan Dandandan changed the title [Bench] u32 group indices f [Bench] u32 group indices Mar 8, 2026
@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing claude/optimize-aggregations-u32-5Vv7z (accb52f) to 92078d9 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and claude_optimize-aggregations-u32-5Vv7z
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ claude_optimize-aggregations-u32-5Vv7z ┃       Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0 │  2177.31 ms │                             2228.66 ms │    no change │
│ QQuery 1 │   756.65 ms │                              860.13 ms │ 1.14x slower │
│ QQuery 2 │  1615.14 ms │                             1735.07 ms │ 1.07x slower │
│ QQuery 3 │  1004.98 ms │                             1043.56 ms │    no change │
│ QQuery 4 │  2106.99 ms │                             2113.28 ms │    no change │
│ QQuery 5 │ 26650.05 ms │                            26157.24 ms │    no change │
│ QQuery 6 │  3634.04 ms │                             3817.63 ms │ 1.05x slower │
│ QQuery 7 │  2484.80 ms │                             2409.37 ms │    no change │
└──────────┴─────────────┴────────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                     │ 40429.97ms │
│ Total Time (claude_optimize-aggregations-u32-5Vv7z)   │ 40364.96ms │
│ Average Time (HEAD)                                   │  5053.75ms │
│ Average Time (claude_optimize-aggregations-u32-5Vv7z) │  5045.62ms │
│ Queries Faster                                        │          0 │
│ Queries Slower                                        │          3 │
│ Queries with No Change                                │          5 │
│ Queries with Failure                                  │          0 │
└───────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ claude_optimize-aggregations-u32-5Vv7z ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.55 ms │                                2.59 ms │     no change │
│ QQuery 1  │    46.81 ms │                               46.28 ms │     no change │
│ QQuery 2  │   151.92 ms │                              148.73 ms │     no change │
│ QQuery 3  │   155.82 ms │                              157.02 ms │     no change │
│ QQuery 4  │   961.14 ms │                              954.40 ms │     no change │
│ QQuery 5  │  1178.05 ms │                             1223.42 ms │     no change │
│ QQuery 6  │     6.26 ms │                                6.82 ms │  1.09x slower │
│ QQuery 7  │    50.70 ms │                               53.96 ms │  1.06x slower │
│ QQuery 8  │  1332.28 ms │                             1347.87 ms │     no change │
│ QQuery 9  │  1703.69 ms │                             1765.90 ms │     no change │
│ QQuery 10 │   313.38 ms │                              333.34 ms │  1.06x slower │
│ QQuery 11 │   351.89 ms │                              383.82 ms │  1.09x slower │
│ QQuery 12 │  1117.82 ms │                             1148.01 ms │     no change │
│ QQuery 13 │  1739.09 ms │                             1826.77 ms │  1.05x slower │
│ QQuery 14 │  1103.14 ms │                             1162.27 ms │  1.05x slower │
│ QQuery 15 │  1100.72 ms │                             1114.98 ms │     no change │
│ QQuery 16 │  2284.11 ms │                             2345.55 ms │     no change │
│ QQuery 17 │  2260.32 ms │                             2324.02 ms │     no change │
│ QQuery 18 │  4371.24 ms │                             4485.12 ms │     no change │
│ QQuery 19 │   115.24 ms │                              120.97 ms │     no change │
│ QQuery 20 │  1688.17 ms │                             1792.74 ms │  1.06x slower │
│ QQuery 21 │  1924.29 ms │                             2061.46 ms │  1.07x slower │
│ QQuery 22 │  3357.49 ms │                             3550.88 ms │  1.06x slower │
│ QQuery 23 │ 10828.24 ms │                            11467.30 ms │  1.06x slower │
│ QQuery 24 │   177.50 ms │                              185.68 ms │     no change │
│ QQuery 25 │   408.09 ms │                              431.90 ms │  1.06x slower │
│ QQuery 26 │   184.70 ms │                              196.32 ms │  1.06x slower │
│ QQuery 27 │  2451.25 ms │                             2603.71 ms │  1.06x slower │
│ QQuery 28 │ 21602.52 ms │                            23100.32 ms │  1.07x slower │
│ QQuery 29 │   956.28 ms │                              959.86 ms │     no change │
│ QQuery 30 │  1172.33 ms │                             1193.06 ms │     no change │
│ QQuery 31 │  1242.88 ms │                             1260.20 ms │     no change │
│ QQuery 32 │  4119.72 ms │                             3790.19 ms │ +1.09x faster │
│ QQuery 33 │  4978.08 ms │                             5170.86 ms │     no change │
│ QQuery 34 │  5874.89 ms │                             5776.21 ms │     no change │
│ QQuery 35 │  1087.81 ms │                             1084.90 ms │     no change │
│ QQuery 36 │   181.34 ms │                              184.35 ms │     no change │
│ QQuery 37 │    69.35 ms │                               72.42 ms │     no change │
│ QQuery 38 │   108.99 ms │                              116.39 ms │  1.07x slower │
│ QQuery 39 │   318.49 ms │                              342.67 ms │  1.08x slower │
│ QQuery 40 │    40.22 ms │                               41.56 ms │     no change │
│ QQuery 41 │    33.92 ms │                               34.20 ms │     no change │
│ QQuery 42 │    30.88 ms │                               31.10 ms │     no change │
└───────────┴─────────────┴────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                     │ 83183.62ms │
│ Total Time (claude_optimize-aggregations-u32-5Vv7z)   │ 86400.13ms │
│ Average Time (HEAD)                                   │  1934.50ms │
│ Average Time (claude_optimize-aggregations-u32-5Vv7z) │  2009.31ms │
│ Queries Faster                                        │          1 │
│ Queries Slower                                        │         16 │
│ Queries with No Change                                │         26 │
│ Queries with Failure                                  │          0 │
└───────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ claude_optimize-aggregations-u32-5Vv7z ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 102.40 ms │                              102.26 ms │     no change │
│ QQuery 2  │  30.54 ms │                               31.09 ms │     no change │
│ QQuery 3  │  34.49 ms │                               35.38 ms │     no change │
│ QQuery 4  │  31.28 ms │                               30.31 ms │     no change │
│ QQuery 5  │  82.05 ms │                               81.91 ms │     no change │
│ QQuery 6  │  20.04 ms │                               20.41 ms │     no change │
│ QQuery 7  │ 147.24 ms │                              142.23 ms │     no change │
│ QQuery 8  │  39.30 ms │                               39.45 ms │     no change │
│ QQuery 9  │ 101.12 ms │                               98.91 ms │     no change │
│ QQuery 10 │  62.82 ms │                               62.76 ms │     no change │
│ QQuery 11 │  18.73 ms │                               18.50 ms │     no change │
│ QQuery 12 │  54.00 ms │                               53.21 ms │     no change │
│ QQuery 13 │  48.14 ms │                               48.32 ms │     no change │
│ QQuery 14 │  14.60 ms │                               14.62 ms │     no change │
│ QQuery 15 │  29.32 ms │                               29.15 ms │     no change │
│ QQuery 16 │  26.87 ms │                               26.79 ms │     no change │
│ QQuery 17 │ 142.73 ms │                              138.86 ms │     no change │
│ QQuery 18 │ 262.86 ms │                              266.19 ms │     no change │
│ QQuery 19 │  41.14 ms │                               41.90 ms │     no change │
│ QQuery 20 │  57.76 ms │                               52.87 ms │ +1.09x faster │
│ QQuery 21 │ 192.01 ms │                              190.21 ms │     no change │
│ QQuery 22 │  22.29 ms │                               22.28 ms │     no change │
└───────────┴───────────┴────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                     ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                     │ 1561.74ms │
│ Total Time (claude_optimize-aggregations-u32-5Vv7z)   │ 1547.64ms │
│ Average Time (HEAD)                                   │   70.99ms │
│ Average Time (claude_optimize-aggregations-u32-5Vv7z) │   70.35ms │
│ Queries Faster                                        │         1 │
│ Queries Slower                                        │         0 │
│ Queries with No Change                                │        21 │
│ Queries with Failure                                  │         0 │
└───────────────────────────────────────────────────────┴───────────┘

@Dandandan Dandandan closed this Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions physical-plan Changes to the physical-plan crate spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants