Skip to content

Conversation

@Tushar7012
Copy link
Contributor

  • Use values.views() instead of values.iter() for direct u128 access
  • Use is_valid(i) for efficient null checking via validity bitmap
  • Avoid dereferencing overhead for inline strings
  • No additional memory overhead in Entry struct

Closes #19961

Which issue does this PR close?

Closes #19961

Rationale for this change

The ArrowBytesViewMap was using values.iter() which creates unnecessary Option wrappers and extra overhead when iterating over byte view arrays. For ClickBench query 5, >50% CPU was spent during the intern operation.

This PR optimizes the hot path in insert_if_new_inner by using direct view access methods that avoid the iteration overhead.

What changes are included in this PR?

  • Replace values.iter() with values.views(): Access the raw &[u128] view buffer directly instead of creating Option wrappers for each value
  • Use is_valid(i) for null checking: Check validity via the bitmap instead of pattern matching on Option
  • Direct index-based access: Use values.value(i) only when the value is needed, avoiding unnecessary dereferencing
  • No additional memory overhead: Per maintainer feedback, we don't store the u128 view in the Entry struct

Are these changes tested?

Yes, all existing tests pass:

  • binary_view_map::tests::string_view_set_empty
  • binary_view_map::tests::string_view_set_one_null
  • binary_view_map::tests::string_view_set_many_null
  • binary_view_map::tests::test_string_view_set_basic
  • binary_view_map::tests::test_string_set_non_utf8
  • binary_view_map::tests::test_binary_set
  • binary_view_map::tests::test_map
  • binary_view_map::tests::test_string_set_memory_usage

Are there any user-facing changes?

No, this is an internal performance optimization with no changes to public APIs.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 24, 2026
@Tushar7012
Copy link
Contributor Author

Hi @Dandandan

This PR implements the optimization you suggested in #19961.

Changes made:

  • Use values.views() instead of values.iter() for direct &[u128] access
  • Use is_valid(i) for efficient null checking via validity bitmap
  • No additional memory overhead in Entry struct (per your feedback!)

All 8 existing tests pass. Ready for review when you have time!

Could you also approve the CI workflows to run? Thanks!

@Dandandan
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (2739149) to 05802e2 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

let value: &[u8] = value.as_ref();
// Get the input value - Arrow's value() method is already optimized
// to handle inline strings efficiently
let input_value: &[u8] = values.value(i).as_ref();
Copy link
Contributor

@Dandandan Dandandan Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is slow for inlined bytes (u128) as it converts it back to &[u8] (the same as iter() does, so it will not really speed things up. We can also move it inside the equality check,

We should avoid it if we know either:

  • all views are inlined (e.g. no buffer)
  • this view is inlined (length < 12)


v == value
// Compare stored value with input value
let stored_value = self.builder.get_value(header.view_idx);
Copy link
Contributor

@Dandandan Dandandan Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slow for inlined values - we should avoid self.builder.get_value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we either have to avoid the Builder API (store views/buffer/nulls here ourself or add some methods (e.g. .views() append_inline_view, etc.) to make it much faster

} else {
// no existing value, make a new one.
// Only dereference bytes here when we actually need to insert
let value: &[u8] = values.value(i).as_ref();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid this for inlined values (and it's the same as above input_value, so now it does it twice.

let value: &[u8] = values.value(i).as_ref();
let payload = make_payload_fn(Some(value));

let inner_view_idx = self.builder.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize the part of adding a new view by only adjusting the new buffer index, instead of creating a new view from scratch (slow)

@Dandandan
Copy link
Contributor

Thanks @Tushar7012 for you PR, I left some notes what needs to be changed.

@Tushar7012 Tushar7012 force-pushed the optimize-arrow-bytes-view-map branch from 2739149 to 45135f8 Compare January 24, 2026 15:12
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2309.17 ms │                    2303.64 ms │     no change │
│ QQuery 1 │   957.66 ms │                     942.00 ms │     no change │
│ QQuery 2 │  1924.99 ms │                    1814.37 ms │ +1.06x faster │
│ QQuery 3 │  1031.12 ms │                    1036.51 ms │     no change │
│ QQuery 4 │  2349.66 ms │                    2118.53 ms │ +1.11x faster │
│ QQuery 5 │ 27769.06 ms │                   27703.46 ms │     no change │
│ QQuery 6 │  4033.38 ms │                    3966.05 ms │     no change │
│ QQuery 7 │  2697.97 ms │                    2678.68 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 43073.01ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 42563.24ms │
│ Average Time (HEAD)                          │  5384.13ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5320.41ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.92 ms │                       1.94 ms │     no change │
│ QQuery 1  │    50.61 ms │                      50.65 ms │     no change │
│ QQuery 2  │   132.22 ms │                     133.30 ms │     no change │
│ QQuery 3  │   152.05 ms │                     156.42 ms │     no change │
│ QQuery 4  │  1001.26 ms │                    1082.45 ms │  1.08x slower │
│ QQuery 5  │  1317.90 ms │                    1346.29 ms │     no change │
│ QQuery 6  │     1.84 ms │                       1.88 ms │     no change │
│ QQuery 7  │    54.46 ms │                      54.25 ms │     no change │
│ QQuery 8  │  1380.58 ms │                    1464.61 ms │  1.06x slower │
│ QQuery 9  │  1727.45 ms │                    1842.90 ms │  1.07x slower │
│ QQuery 10 │   345.16 ms │                     346.12 ms │     no change │
│ QQuery 11 │   391.97 ms │                     392.36 ms │     no change │
│ QQuery 12 │  1237.79 ms │                    1265.35 ms │     no change │
│ QQuery 13 │  1912.04 ms │                    1950.24 ms │     no change │
│ QQuery 14 │  1229.43 ms │                    1260.85 ms │     no change │
│ QQuery 15 │  1192.52 ms │                    1264.49 ms │  1.06x slower │
│ QQuery 16 │  2445.11 ms │                    2573.95 ms │  1.05x slower │
│ QQuery 17 │  2475.03 ms │                    2554.38 ms │     no change │
│ QQuery 18 │  5104.78 ms │                    4752.67 ms │ +1.07x faster │
│ QQuery 19 │   121.03 ms │                     119.39 ms │     no change │
│ QQuery 20 │  1973.45 ms │                    1825.64 ms │ +1.08x faster │
│ QQuery 21 │  2227.20 ms │                    2133.50 ms │     no change │
│ QQuery 22 │  3749.15 ms │                    3655.73 ms │     no change │
│ QQuery 23 │ 12336.75 ms │                   12055.95 ms │     no change │
│ QQuery 24 │   213.86 ms │                     219.51 ms │     no change │
│ QQuery 25 │   474.80 ms │                     461.60 ms │     no change │
│ QQuery 26 │   207.51 ms │                     209.79 ms │     no change │
│ QQuery 27 │  2694.84 ms │                    2608.22 ms │     no change │
│ QQuery 28 │ 23396.30 ms │                   23349.87 ms │     no change │
│ QQuery 29 │   996.62 ms │                     976.69 ms │     no change │
│ QQuery 30 │  1300.65 ms │                    1268.85 ms │     no change │
│ QQuery 31 │  1328.79 ms │                    1311.05 ms │     no change │
│ QQuery 32 │  4369.44 ms │                    4420.28 ms │     no change │
│ QQuery 33 │  5497.42 ms │                    5367.07 ms │     no change │
│ QQuery 34 │  5569.41 ms │                    5354.07 ms │     no change │
│ QQuery 35 │  1926.65 ms │                    1869.70 ms │     no change │
│ QQuery 36 │    66.62 ms │                      68.31 ms │     no change │
│ QQuery 37 │    46.91 ms │                      45.62 ms │     no change │
│ QQuery 38 │    67.00 ms │                      67.30 ms │     no change │
│ QQuery 39 │   103.74 ms │                     101.27 ms │     no change │
│ QQuery 40 │    25.84 ms │                      26.96 ms │     no change │
│ QQuery 41 │    23.38 ms │                      24.16 ms │     no change │
│ QQuery 42 │    20.13 ms │                      20.62 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 90891.60ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 90056.22ms │
│ Average Time (HEAD)                          │  2113.76ms │
│ Average Time (optimize-arrow-bytes-view-map) │  2094.33ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          5 │
│ Queries with No Change                       │         36 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 135.07 ms │                     105.11 ms │ +1.29x faster │
│ QQuery 2  │  33.62 ms │                      33.68 ms │     no change │
│ QQuery 3  │  37.50 ms │                      39.82 ms │  1.06x slower │
│ QQuery 4  │  30.52 ms │                      30.28 ms │     no change │
│ QQuery 5  │  91.48 ms │                      89.03 ms │     no change │
│ QQuery 6  │  20.66 ms │                      20.79 ms │     no change │
│ QQuery 7  │ 158.84 ms │                     159.48 ms │     no change │
│ QQuery 8  │  43.32 ms │                      40.77 ms │ +1.06x faster │
│ QQuery 9  │  98.25 ms │                     105.38 ms │  1.07x slower │
│ QQuery 10 │  70.89 ms │                      67.03 ms │ +1.06x faster │
│ QQuery 11 │  18.62 ms │                      19.56 ms │  1.05x slower │
│ QQuery 12 │  51.81 ms │                      52.31 ms │     no change │
│ QQuery 13 │  50.44 ms │                      49.68 ms │     no change │
│ QQuery 14 │  14.75 ms │                      15.08 ms │     no change │
│ QQuery 15 │  30.18 ms │                      30.34 ms │     no change │
│ QQuery 16 │  27.73 ms │                      31.74 ms │  1.14x slower │
│ QQuery 17 │ 143.06 ms │                     144.02 ms │     no change │
│ QQuery 18 │ 281.47 ms │                     283.03 ms │     no change │
│ QQuery 19 │  41.04 ms │                      41.37 ms │     no change │
│ QQuery 20 │  56.14 ms │                      53.74 ms │     no change │
│ QQuery 21 │ 187.34 ms │                     192.52 ms │     no change │
│ QQuery 22 │  22.73 ms │                      22.50 ms │     no change │
└───────────┴───────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1645.47ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1627.26ms │
│ Average Time (HEAD)                          │   74.79ms │
│ Average Time (optimize-arrow-bytes-view-map) │   73.97ms │
│ Queries Faster                               │         3 │
│ Queries Slower                               │         4 │
│ Queries with No Change                       │        15 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

@Tushar7012 Tushar7012 force-pushed the optimize-arrow-bytes-view-map branch from 45135f8 to 4010560 Compare January 24, 2026 15:55
- Use values.views() instead of values.iter() for direct u128 access
- Use is_valid(i) for efficient null checking via validity bitmap
- Avoid dereferencing overhead for inline strings
- No additional memory overhead in Entry struct

Closes apache#19961
@Tushar7012 Tushar7012 force-pushed the optimize-arrow-bytes-view-map branch from 4010560 to 51768bf Compare January 24, 2026 16:02
@Dandandan
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (51768bf) to e5e7636 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

return header.view == view_u128;
}

// For larger strings: first compare the 4-byte prefix (bytes 4-7 of u128)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those other paths, while OK will probably not improve that much, as header.hash != hash already will filter out nearly 100% of the false positives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insight! That makes sense - the hash comparison is doing most of the heavy lifting. I'll keep the code as-is since it's still slightly better and doesn't add complexity.

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2282.80 ms │                    2219.33 ms │     no change │
│ QQuery 1 │   961.54 ms │                     726.56 ms │ +1.32x faster │
│ QQuery 2 │  1850.89 ms │                    1515.96 ms │ +1.22x faster │
│ QQuery 3 │  1033.76 ms │                    1038.79 ms │     no change │
│ QQuery 4 │  2170.02 ms │                    2205.61 ms │     no change │
│ QQuery 5 │ 28120.24 ms │                   28106.55 ms │     no change │
│ QQuery 6 │  4044.65 ms │                    4008.67 ms │     no change │
│ QQuery 7 │  2689.97 ms │                    2661.63 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 43153.88ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 42483.11ms │
│ Average Time (HEAD)                          │  5394.23ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5310.39ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.91 ms │                       1.91 ms │     no change │
│ QQuery 1  │    49.29 ms │                      50.23 ms │     no change │
│ QQuery 2  │   133.90 ms │                     131.61 ms │     no change │
│ QQuery 3  │   151.70 ms │                     147.90 ms │     no change │
│ QQuery 4  │   985.12 ms │                    1019.27 ms │     no change │
│ QQuery 5  │  1359.15 ms │                    1338.51 ms │     no change │
│ QQuery 6  │     1.92 ms │                       1.85 ms │     no change │
│ QQuery 7  │    54.03 ms │                      53.02 ms │     no change │
│ QQuery 8  │  1361.19 ms │                    1385.04 ms │     no change │
│ QQuery 9  │  1722.94 ms │                    1693.62 ms │     no change │
│ QQuery 10 │   338.36 ms │                     335.32 ms │     no change │
│ QQuery 11 │   384.82 ms │                     389.75 ms │     no change │
│ QQuery 12 │  1222.01 ms │                    1319.55 ms │  1.08x slower │
│ QQuery 13 │  1902.59 ms │                    1943.15 ms │     no change │
│ QQuery 14 │  1184.53 ms │                    1247.96 ms │  1.05x slower │
│ QQuery 15 │  1153.50 ms │                    1165.11 ms │     no change │
│ QQuery 16 │  2400.41 ms │                    2459.66 ms │     no change │
│ QQuery 17 │  2384.45 ms │                    2470.13 ms │     no change │
│ QQuery 18 │  5437.86 ms │                    4767.72 ms │ +1.14x faster │
│ QQuery 19 │   123.03 ms │                     120.41 ms │     no change │
│ QQuery 20 │  1962.51 ms │                    1911.78 ms │     no change │
│ QQuery 21 │  2225.66 ms │                    2193.16 ms │     no change │
│ QQuery 22 │  8035.21 ms │                    3769.75 ms │ +2.13x faster │
│ QQuery 23 │ 26720.94 ms │                   12171.65 ms │ +2.20x faster │
│ QQuery 24 │   220.52 ms │                     206.32 ms │ +1.07x faster │
│ QQuery 25 │   467.23 ms │                     473.78 ms │     no change │
│ QQuery 26 │   225.96 ms │                     222.37 ms │     no change │
│ QQuery 27 │  2701.22 ms │                    2645.19 ms │     no change │
│ QQuery 28 │ 23870.48 ms │                   23200.69 ms │     no change │
│ QQuery 29 │   967.78 ms │                     981.32 ms │     no change │
│ QQuery 30 │  1256.40 ms │                    1240.59 ms │     no change │
│ QQuery 31 │  1338.09 ms │                    1342.61 ms │     no change │
│ QQuery 32 │  4510.17 ms │                    4126.65 ms │ +1.09x faster │
│ QQuery 33 │  5423.92 ms │                    5700.79 ms │  1.05x slower │
│ QQuery 34 │  5652.67 ms │                    6206.14 ms │  1.10x slower │
│ QQuery 35 │  1842.79 ms │                    1867.50 ms │     no change │
│ QQuery 36 │    66.00 ms │                      68.36 ms │     no change │
│ QQuery 37 │    44.12 ms │                      46.05 ms │     no change │
│ QQuery 38 │    68.01 ms │                      66.30 ms │     no change │
│ QQuery 39 │   105.52 ms │                     102.88 ms │     no change │
│ QQuery 40 │    26.64 ms │                      26.51 ms │     no change │
│ QQuery 41 │    22.98 ms │                      23.92 ms │     no change │
│ QQuery 42 │    20.11 ms │                      19.10 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 110127.66ms │
│ Total Time (optimize-arrow-bytes-view-map)   │  90655.16ms │
│ Average Time (HEAD)                          │   2561.11ms │
│ Average Time (optimize-arrow-bytes-view-map) │   2108.26ms │
│ Queries Faster                               │           5 │
│ Queries Slower                               │           4 │
│ Queries with No Change                       │          34 │
│ Queries with Failure                         │           0 │
└──────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 101.74 ms │                     102.03 ms │     no change │
│ QQuery 2  │  33.61 ms │                      33.32 ms │     no change │
│ QQuery 3  │  35.05 ms │                      39.13 ms │  1.12x slower │
│ QQuery 4  │  30.85 ms │                      30.56 ms │     no change │
│ QQuery 5  │  89.18 ms │                      89.29 ms │     no change │
│ QQuery 6  │  20.80 ms │                      20.55 ms │     no change │
│ QQuery 7  │ 155.10 ms │                     151.92 ms │     no change │
│ QQuery 8  │  42.88 ms │                      41.59 ms │     no change │
│ QQuery 9  │ 109.43 ms │                     103.29 ms │ +1.06x faster │
│ QQuery 10 │  65.67 ms │                      68.76 ms │     no change │
│ QQuery 11 │  18.89 ms │                      20.19 ms │  1.07x slower │
│ QQuery 12 │  51.96 ms │                      52.43 ms │     no change │
│ QQuery 13 │  49.08 ms │                      49.60 ms │     no change │
│ QQuery 14 │  14.86 ms │                      14.93 ms │     no change │
│ QQuery 15 │  29.55 ms │                      29.87 ms │     no change │
│ QQuery 16 │  28.63 ms │                      27.96 ms │     no change │
│ QQuery 17 │ 143.14 ms │                     137.48 ms │     no change │
│ QQuery 18 │ 279.11 ms │                     274.91 ms │     no change │
│ QQuery 19 │  40.78 ms │                      41.51 ms │     no change │
│ QQuery 20 │  56.29 ms │                      55.84 ms │     no change │
│ QQuery 21 │ 188.12 ms │                     186.79 ms │     no change │
│ QQuery 22 │  25.37 ms │                      22.19 ms │ +1.14x faster │
└───────────┴───────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1610.08ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1594.14ms │
│ Average Time (HEAD)                          │   73.19ms │
│ Average Time (optimize-arrow-bytes-view-map) │   72.46ms │
│ Queries Faster                               │         2 │
│ Queries Slower                               │         2 │
│ Queries with No Change                       │        18 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor

│ QQuery 1 │   961.54 ms │                     726.56 ms │ +1.32x faster │
│ QQuery 2 │  1850.89 ms │                    1515.96 ms │ +1.22x faster │

Starting to look good! (I think we can go further when specializing more for inline cases).

@Dandandan
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (51768bf) to e5e7636 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2239.59 ms │                    2222.17 ms │     no change │
│ QQuery 1 │  1060.41 ms │                     755.06 ms │ +1.40x faster │
│ QQuery 2 │  1863.00 ms │                    1547.04 ms │ +1.20x faster │
│ QQuery 3 │  1049.20 ms │                    1009.75 ms │     no change │
│ QQuery 4 │  2131.65 ms │                    2125.03 ms │     no change │
│ QQuery 5 │ 28019.05 ms │                   28100.10 ms │     no change │
│ QQuery 6 │  4035.52 ms │                    4015.97 ms │     no change │
│ QQuery 7 │  2475.53 ms │                    2505.04 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 42873.94ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 42280.16ms │
│ Average Time (HEAD)                          │  5359.24ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5285.02ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.92 ms │                       1.93 ms │     no change │
│ QQuery 1  │    50.61 ms │                      51.78 ms │     no change │
│ QQuery 2  │   131.49 ms │                     133.72 ms │     no change │
│ QQuery 3  │   152.56 ms │                     150.58 ms │     no change │
│ QQuery 4  │   976.67 ms │                     975.12 ms │     no change │
│ QQuery 5  │  1297.55 ms │                    1321.50 ms │     no change │
│ QQuery 6  │     1.84 ms │                       1.87 ms │     no change │
│ QQuery 7  │    53.60 ms │                      53.89 ms │     no change │
│ QQuery 8  │  1344.36 ms │                    1345.56 ms │     no change │
│ QQuery 9  │  1691.81 ms │                    1705.73 ms │     no change │
│ QQuery 10 │   335.67 ms │                     340.63 ms │     no change │
│ QQuery 11 │   381.56 ms │                     390.63 ms │     no change │
│ QQuery 12 │  1234.25 ms │                    1309.99 ms │  1.06x slower │
│ QQuery 13 │  1902.22 ms │                    1926.63 ms │     no change │
│ QQuery 14 │  1198.18 ms │                    1215.71 ms │     no change │
│ QQuery 15 │  1138.81 ms │                    1129.16 ms │     no change │
│ QQuery 16 │  2374.72 ms │                    2410.36 ms │     no change │
│ QQuery 17 │  2363.88 ms │                    2398.59 ms │     no change │
│ QQuery 18 │  5448.14 ms │                    4641.28 ms │ +1.17x faster │
│ QQuery 19 │   119.12 ms │                     119.58 ms │     no change │
│ QQuery 20 │  1912.21 ms │                    1893.55 ms │     no change │
│ QQuery 21 │  2205.77 ms │                    2205.96 ms │     no change │
│ QQuery 22 │  3781.29 ms │                    3733.94 ms │     no change │
│ QQuery 23 │ 14899.55 ms │                   12082.87 ms │ +1.23x faster │
│ QQuery 24 │   219.00 ms │                     216.48 ms │     no change │
│ QQuery 25 │   465.54 ms │                     471.73 ms │     no change │
│ QQuery 26 │   224.75 ms │                     201.38 ms │ +1.12x faster │
│ QQuery 27 │  2670.96 ms │                    2652.12 ms │     no change │
│ QQuery 28 │ 23296.97 ms │                   23076.25 ms │     no change │
│ QQuery 29 │  1051.38 ms │                     975.00 ms │ +1.08x faster │
│ QQuery 30 │  1283.44 ms │                    1249.21 ms │     no change │
│ QQuery 31 │  1321.24 ms │                    1300.31 ms │     no change │
│ QQuery 32 │  4546.01 ms │                    4178.70 ms │ +1.09x faster │
│ QQuery 33 │  5421.09 ms │                    5514.08 ms │     no change │
│ QQuery 34 │  5748.05 ms │                    5800.66 ms │     no change │
│ QQuery 35 │  1846.22 ms │                    1810.44 ms │     no change │
│ QQuery 36 │    68.07 ms │                      68.67 ms │     no change │
│ QQuery 37 │    46.19 ms │                      45.92 ms │     no change │
│ QQuery 38 │    66.58 ms │                      67.73 ms │     no change │
│ QQuery 39 │   101.60 ms │                     100.10 ms │     no change │
│ QQuery 40 │    25.46 ms │                      26.98 ms │  1.06x slower │
│ QQuery 41 │    24.55 ms │                      24.33 ms │     no change │
│ QQuery 42 │    19.20 ms │                      19.97 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 93444.08ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 89340.64ms │
│ Average Time (HEAD)                          │  2173.12ms │
│ Average Time (optimize-arrow-bytes-view-map) │  2077.69ms │
│ Queries Faster                               │          5 │
│ Queries Slower                               │          2 │
│ Queries with No Change                       │         36 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 103.53 ms │                     101.64 ms │     no change │
│ QQuery 2  │  33.02 ms │                      32.24 ms │     no change │
│ QQuery 3  │  39.51 ms │                      37.52 ms │ +1.05x faster │
│ QQuery 4  │  31.26 ms │                      31.49 ms │     no change │
│ QQuery 5  │  87.66 ms │                      91.88 ms │     no change │
│ QQuery 6  │  20.55 ms │                      20.43 ms │     no change │
│ QQuery 7  │ 154.18 ms │                     154.28 ms │     no change │
│ QQuery 8  │  41.07 ms │                      38.65 ms │ +1.06x faster │
│ QQuery 9  │  98.74 ms │                      99.45 ms │     no change │
│ QQuery 10 │  68.63 ms │                      66.23 ms │     no change │
│ QQuery 11 │  18.27 ms │                      18.63 ms │     no change │
│ QQuery 12 │  50.44 ms │                      51.62 ms │     no change │
│ QQuery 13 │  49.12 ms │                      47.24 ms │     no change │
│ QQuery 14 │  15.11 ms │                      17.66 ms │  1.17x slower │
│ QQuery 15 │  29.88 ms │                      30.76 ms │     no change │
│ QQuery 16 │  27.75 ms │                      28.84 ms │     no change │
│ QQuery 17 │ 139.77 ms │                     142.35 ms │     no change │
│ QQuery 18 │ 279.11 ms │                     277.62 ms │     no change │
│ QQuery 19 │  41.03 ms │                      40.30 ms │     no change │
│ QQuery 20 │  55.55 ms │                      54.86 ms │     no change │
│ QQuery 21 │ 188.18 ms │                     192.54 ms │     no change │
│ QQuery 22 │  22.66 ms │                      22.52 ms │     no change │
└───────────┴───────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1595.04ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1598.73ms │
│ Average Time (HEAD)                          │   72.50ms │
│ Average Time (optimize-arrow-bytes-view-map) │   72.67ms │
│ Queries Faster                               │         2 │
│ Queries Slower                               │         1 │
│ Queries with No Change                       │        19 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor

Dandandan commented Jan 25, 2026

The speedup of those queries (q1 /2 in extended benchmarks) seems reproducible... if we save some of the builder overhead I think it will also speed up the other cases.

@Tushar7012
Copy link
Contributor Author

Thanks for reviewing the benchmarks! Great to hear the speedup is reproducible.

Regarding the builder overhead optimization for other cases - would you like me to explore that in a follow-up PR, or should we address it in this one before merging?

Let me know if there's anything else needed for approval!

@Dandandan
Copy link
Contributor

Thanks for reviewing the benchmarks! Great to hear the speedup is reproducible.

Regarding the builder overhead optimization for other cases - would you like me to explore that in a follow-up PR, or should we address it in this one before merging?

Let me know if there's anything else needed for approval!

I think it would be better to avoid the extra 16 bytes by either

  • storing only the views just in the struct, removing the builder usage and updating the original view, we don't need any offset more, so we end up saving 8 bytes per value
  • remove the u128 view in the struct and keep the reference. I think in the builder interface we need to expose the views to be able to support this

Copilot AI review requested due to automatic review settings January 25, 2026 17:13
@Tushar7012
Copy link
Contributor Author

Thanks for the detailed suggestions!

To make sure I understand correctly:

Option 1: Remove the builder usage entirely and store only the u128 views directly in the Entry struct, updating the original view in-place. This eliminates the offset field and saves 8 bytes per value.

Option 2: Keep the current struct without the u128 view and instead expose views from the builder interface to access them directly by reference.

I'm leaning towards Option 1 as it seems more self-contained, but I want to confirm - for Option 1, would we be modifying the source GenericByteViewBuilder directly, or maintaining our own separate view storage?

Let me know which approach you'd prefer and I'll implement it!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes ArrowBytesViewMap’s hot insertion/lookup path for StringViewArray/BinaryViewArray by iterating over raw u128 views and using validity-bit checks to avoid per-element Option/dereference overhead.

Changes:

  • Iterate over values.views() and use values.is_valid(i) for null handling.
  • Add fast-path equality for inlined (≤12B) values via direct u128 view comparison.
  • Add prefix-based precheck for non-inlined values to reduce full byte dereferencing/comparison.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 408 to 411
/// The original u128 view for fast comparison of inline strings (<=12 bytes)
/// and prefix comparison for larger strings
view: u128,

Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description claims "No additional memory overhead in Entry struct", but this change adds view: u128 to Entry, increasing per-entry size (and thus memory usage / map_size) by at least 16 bytes plus padding. If avoiding additional overhead is a requirement, consider storing only the needed metadata (e.g., len: u32 + prefix: u32) or deriving these from the builder’s stored view via view_idx; otherwise, update the PR description to match the implementation.

Copilot uses AI. Check for mistakes.
Comment on lines 293 to 302
// For larger strings: first compare the 4-byte prefix (bytes 4-7 of u128)
// The prefix is stored in the next 4 bytes after length
// Only dereference full bytes if prefixes match
let stored_prefix = ((header.view >> 32) & 0xFFFFFFFF) as u32;
let input_prefix = ((view_u128 >> 32) & 0xFFFFFFFF) as u32;
if stored_prefix != input_prefix {
return false;
}

// Prefix matched - must compare full bytes
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-inline comparison path skips a cheap length equality check before doing prefix/full-byte comparisons. Since both input and stored views encode length in the low 32 bits, checking header.view as u32 == len (or equivalent) first would avoid calling builder.get_value(...) / values.value(i) when lengths differ but hashes (and possibly prefixes) match, improving the hot path.

Suggested change
// For larger strings: first compare the 4-byte prefix (bytes 4-7 of u128)
// The prefix is stored in the next 4 bytes after length
// Only dereference full bytes if prefixes match
let stored_prefix = ((header.view >> 32) & 0xFFFFFFFF) as u32;
let input_prefix = ((view_u128 >> 32) & 0xFFFFFFFF) as u32;
if stored_prefix != input_prefix {
return false;
}
// Prefix matched - must compare full bytes
// For larger strings: first compare length using the low 32 bits.
// If lengths differ, the values cannot be equal, so we can skip
// prefix and full-byte comparisons and avoid dereferencing bytes.
if (header.view as u32) != len {
return false;
}
// Next compare the 4-byte prefix (bytes 4-7 of u128).
// The prefix is stored in the next 4 bytes after length.
// Only dereference full bytes if prefixes match.
let stored_prefix = ((header.view >> 32) & 0xFFFFFFFF) as u32;
let input_prefix = ((view_u128 >> 32) & 0xFFFFFFFF) as u32;
if stored_prefix != input_prefix {
return false;
}
// Prefix and length matched - now compare full bytes.

Copilot uses AI. Check for mistakes.
Comment on lines 296 to 297
let stored_prefix = ((header.view >> 32) & 0xFFFFFFFF) as u32;
let input_prefix = ((view_u128 >> 32) & 0xFFFFFFFF) as u32;
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix extraction via manual bit shifts ((view >> 32) & 0xFFFFFFFF) duplicates knowledge of the ByteView layout and is harder to maintain. There is already a pattern elsewhere to extract the prefix using Arrow helpers (e.g. GenericByteViewArray::<B>::inline_value(&view, 4) in physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:271-275), which avoids hardcoding masks/shifts and documents intent more clearly.

Suggested change
let stored_prefix = ((header.view >> 32) & 0xFFFFFFFF) as u32;
let input_prefix = ((view_u128 >> 32) & 0xFFFFFFFF) as u32;
let stored_bytes = header.view.to_le_bytes();
let stored_prefix =
u32::from_le_bytes(stored_bytes[4..8].try_into().unwrap());
let input_bytes = view_u128.to_le_bytes();
let input_prefix =
u32::from_le_bytes(input_bytes[4..8].try_into().unwrap());

Copilot uses AI. Check for mistakes.
Comment on lines 286 to 296
// Fast path: for inline strings (<=12 bytes), the entire value
// is stored in the u128 view, so we can compare directly
// This avoids the expensive conversion back to bytes
if len <= 12 {
return header.view == view_u128;
}

// For larger strings: first compare the 4-byte prefix (bytes 4-7 of u128)
// The prefix is stored in the next 4 bytes after length
// Only dereference full bytes if prefixes match
let stored_prefix = ((header.view >> 32) & 0xFFFFFFFF) as u32;
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several new comments describe behavior in terms of "strings" (e.g. "inline strings" / "larger strings"), but this function is generic over ByteViewType and is used for both Utf8View and BinaryView. Consider adjusting wording to "values"/"byte sequences" to avoid confusion for the binary case.

Copilot uses AI. Check for mistakes.
/// Completed buffers containing string data
completed: Vec<Buffer>,
/// Tracks null values (true = null)
nulls: Vec<bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dandandan
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (b9f7d68) to 03c2751 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@Dandandan
Copy link
Contributor

Benchmarks look good! The optimization shows improvements especially on clickbench:

clickbench_extended: 3 queries faster, 0 slower clickbench_partitioned: 4 queries faster (Query 23 is 1.96x faster!), 2 minor regressions tpch: neutral Removed builder_idx from Entry and replaced GenericByteViewBuilder with direct buffer management as suggested.

Some of those unfortunately are noise, but the clickbench_extended are real improvements!

Per reviewer feedback, replaced Vec<bool> with BooleanBufferBuilder for tracking null values. This uses bit-packed storage (1 bit per value) instead of byte-per-value, reducing memory usage by 8x for the null bitmap. Also fixed clippy warnings for mem_replace_with_default.
@Tushar7012
Copy link
Contributor Author

Fixed! Replaced Vec<bool> with BooleanBufferBuilder as suggested.

Changes:

  • Now using bit-packed storage (1 bit per value vs 1 byte) for 8x memory efficiency
  • Fixed clippy warning for mem_replace_with_default
  • All tests passing locally

Ready for review!

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2346.89 ms │                    2330.85 ms │     no change │
│ QQuery 1 │  1104.51 ms │                     839.85 ms │ +1.32x faster │
│ QQuery 2 │  2081.87 ms │                    1736.59 ms │ +1.20x faster │
│ QQuery 3 │  1030.06 ms │                    1061.87 ms │     no change │
│ QQuery 4 │  2150.38 ms │                    2145.66 ms │     no change │
│ QQuery 5 │ 28236.66 ms │                   28334.07 ms │     no change │
│ QQuery 6 │  4010.36 ms │                    4003.28 ms │     no change │
│ QQuery 7 │  2674.81 ms │                    2621.57 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 43635.54ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 43073.74ms │
│ Average Time (HEAD)                          │  5454.44ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5384.22ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.96 ms │                       2.00 ms │     no change │
│ QQuery 1  │    52.18 ms │                      51.90 ms │     no change │
│ QQuery 2  │   136.45 ms │                     133.17 ms │     no change │
│ QQuery 3  │   156.18 ms │                     154.41 ms │     no change │
│ QQuery 4  │  1053.25 ms │                    1032.37 ms │     no change │
│ QQuery 5  │  1442.27 ms │                    1289.93 ms │ +1.12x faster │
│ QQuery 6  │     1.82 ms │                       1.84 ms │     no change │
│ QQuery 7  │    55.06 ms │                      54.37 ms │     no change │
│ QQuery 8  │  1436.79 ms │                    1368.88 ms │     no change │
│ QQuery 9  │  1888.46 ms │                    1799.30 ms │     no change │
│ QQuery 10 │   349.25 ms │                     347.13 ms │     no change │
│ QQuery 11 │   401.30 ms │                     403.47 ms │     no change │
│ QQuery 12 │  1301.55 ms │                    1215.71 ms │ +1.07x faster │
│ QQuery 13 │  1986.23 ms │                    1961.97 ms │     no change │
│ QQuery 14 │  1277.51 ms │                    1221.46 ms │     no change │
│ QQuery 15 │  1234.01 ms │                    1150.76 ms │ +1.07x faster │
│ QQuery 16 │  2457.97 ms │                    2455.01 ms │     no change │
│ QQuery 17 │  2449.91 ms │                    2414.40 ms │     no change │
│ QQuery 18 │  5097.69 ms │                    4655.39 ms │ +1.10x faster │
│ QQuery 19 │   122.54 ms │                     121.91 ms │     no change │
│ QQuery 20 │  1970.22 ms │                    1887.90 ms │     no change │
│ QQuery 21 │  2252.35 ms │                    2204.58 ms │     no change │
│ QQuery 22 │  3894.04 ms │                    3773.37 ms │     no change │
│ QQuery 23 │ 17731.74 ms │                   12109.55 ms │ +1.46x faster │
│ QQuery 24 │   223.30 ms │                     207.71 ms │ +1.08x faster │
│ QQuery 25 │   479.65 ms │                     480.86 ms │     no change │
│ QQuery 26 │   216.68 ms │                     220.93 ms │     no change │
│ QQuery 27 │  2752.85 ms │                    2622.15 ms │     no change │
│ QQuery 28 │ 23621.38 ms │                   23156.22 ms │     no change │
│ QQuery 29 │   981.13 ms │                     957.81 ms │     no change │
│ QQuery 30 │  1311.98 ms │                    1236.55 ms │ +1.06x faster │
│ QQuery 31 │  1407.43 ms │                    1319.76 ms │ +1.07x faster │
│ QQuery 32 │  4263.44 ms │                    4418.15 ms │     no change │
│ QQuery 33 │  5569.80 ms │                    5344.86 ms │     no change │
│ QQuery 34 │  5862.32 ms │                    5601.23 ms │     no change │
│ QQuery 35 │  1902.78 ms │                    1847.03 ms │     no change │
│ QQuery 36 │    68.43 ms │                      68.40 ms │     no change │
│ QQuery 37 │    46.99 ms │                      44.74 ms │     no change │
│ QQuery 38 │    67.77 ms │                      65.25 ms │     no change │
│ QQuery 39 │   101.39 ms │                     102.42 ms │     no change │
│ QQuery 40 │    27.39 ms │                      26.32 ms │     no change │
│ QQuery 41 │    24.54 ms │                      23.33 ms │     no change │
│ QQuery 42 │    20.63 ms │                      20.40 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 97700.58ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 89574.89ms │
│ Average Time (HEAD)                          │  2272.11ms │
│ Average Time (optimize-arrow-bytes-view-map) │  2083.14ms │
│ Queries Faster                               │          8 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │         35 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 101.64 ms │                     102.28 ms │     no change │
│ QQuery 2  │  33.32 ms │                      32.37 ms │     no change │
│ QQuery 3  │  40.53 ms │                      39.01 ms │     no change │
│ QQuery 4  │  31.20 ms │                      30.58 ms │     no change │
│ QQuery 5  │  89.57 ms │                      90.67 ms │     no change │
│ QQuery 6  │  20.84 ms │                      20.58 ms │     no change │
│ QQuery 7  │ 156.74 ms │                     159.66 ms │     no change │
│ QQuery 8  │  40.83 ms │                      40.29 ms │     no change │
│ QQuery 9  │ 105.71 ms │                     104.92 ms │     no change │
│ QQuery 10 │  71.23 ms │                      66.86 ms │ +1.07x faster │
│ QQuery 11 │  19.75 ms │                      19.03 ms │     no change │
│ QQuery 12 │  52.92 ms │                      57.00 ms │  1.08x slower │
│ QQuery 13 │  48.39 ms │                      50.18 ms │     no change │
│ QQuery 14 │  15.35 ms │                      15.37 ms │     no change │
│ QQuery 15 │  30.20 ms │                      29.80 ms │     no change │
│ QQuery 16 │  28.63 ms │                      29.07 ms │     no change │
│ QQuery 17 │ 140.71 ms │                     144.94 ms │     no change │
│ QQuery 18 │ 277.06 ms │                     285.45 ms │     no change │
│ QQuery 19 │  39.45 ms │                      39.77 ms │     no change │
│ QQuery 20 │  54.49 ms │                      56.82 ms │     no change │
│ QQuery 21 │ 192.36 ms │                     192.38 ms │     no change │
│ QQuery 22 │  22.53 ms │                      22.63 ms │     no change │
└───────────┴───────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1613.46ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1629.66ms │
│ Average Time (HEAD)                          │   73.34ms │
│ Average Time (optimize-arrow-bytes-view-map) │   74.08ms │
│ Queries Faster                               │         1 │
│ Queries Slower                               │         1 │
│ Queries with No Change                       │        20 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (32bd8b1) to 4d63f8c diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@Dandandan
Copy link
Contributor

│ QQuery 5 │ 1442.27 ms │ 1289.93 ms │ +1.12x faster │

Nice, starts to show up as improvement for normal clickbench as well.

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2400.96 ms │                    2394.14 ms │     no change │
│ QQuery 1 │  1102.58 ms │                     823.46 ms │ +1.34x faster │
│ QQuery 2 │  2041.47 ms │                    1747.07 ms │ +1.17x faster │
│ QQuery 3 │  1040.49 ms │                    1075.75 ms │     no change │
│ QQuery 4 │  2214.04 ms │                    2249.16 ms │     no change │
│ QQuery 5 │ 28633.51 ms │                   28921.74 ms │     no change │
│ QQuery 6 │  3970.57 ms │                    4019.04 ms │     no change │
│ QQuery 7 │  2853.64 ms │                    2739.25 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 44257.26ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 43969.60ms │
│ Average Time (HEAD)                          │  5532.16ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5496.20ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.01 ms │                       2.27 ms │  1.13x slower │
│ QQuery 1  │    53.18 ms │                      52.34 ms │     no change │
│ QQuery 2  │   136.17 ms │                     138.30 ms │     no change │
│ QQuery 3  │   147.91 ms │                     153.13 ms │     no change │
│ QQuery 4  │  1067.16 ms │                    1112.78 ms │     no change │
│ QQuery 5  │  1379.32 ms │                    1422.00 ms │     no change │
│ QQuery 6  │     1.87 ms │                       2.18 ms │  1.17x slower │
│ QQuery 7  │    55.35 ms │                      62.27 ms │  1.12x slower │
│ QQuery 8  │  1435.80 ms │                    1478.93 ms │     no change │
│ QQuery 9  │  1816.86 ms │                    1997.56 ms │  1.10x slower │
│ QQuery 10 │   350.87 ms │                     372.86 ms │  1.06x slower │
│ QQuery 11 │   396.64 ms │                     427.74 ms │  1.08x slower │
│ QQuery 12 │  1308.21 ms │                    1352.22 ms │     no change │
│ QQuery 13 │  1976.23 ms │                    2013.46 ms │     no change │
│ QQuery 14 │  1229.45 ms │                    1286.49 ms │     no change │
│ QQuery 15 │  1250.89 ms │                    1281.45 ms │     no change │
│ QQuery 16 │  2548.49 ms │                    2624.97 ms │     no change │
│ QQuery 17 │  2449.72 ms │                    2637.87 ms │  1.08x slower │
│ QQuery 18 │  5319.25 ms │                    5084.79 ms │     no change │
│ QQuery 19 │   125.65 ms │                     124.80 ms │     no change │
│ QQuery 20 │  1913.13 ms │                    1980.85 ms │     no change │
│ QQuery 21 │  2197.25 ms │                    2262.60 ms │     no change │
│ QQuery 22 │  3735.66 ms │                    3686.64 ms │     no change │
│ QQuery 23 │ 16159.93 ms │                   12126.12 ms │ +1.33x faster │
│ QQuery 24 │   213.31 ms │                     214.23 ms │     no change │
│ QQuery 25 │   473.76 ms │                     463.12 ms │     no change │
│ QQuery 26 │   215.06 ms │                     216.58 ms │     no change │
│ QQuery 27 │  2714.92 ms │                    2628.79 ms │     no change │
│ QQuery 28 │ 24657.73 ms │                   24531.40 ms │     no change │
│ QQuery 29 │   979.66 ms │                     973.64 ms │     no change │
│ QQuery 30 │  1304.77 ms │                    1263.80 ms │     no change │
│ QQuery 31 │  1354.75 ms │                    1314.49 ms │     no change │
│ QQuery 32 │  4752.06 ms │                    4394.89 ms │ +1.08x faster │
│ QQuery 33 │  5745.41 ms │                    5762.86 ms │     no change │
│ QQuery 34 │  6435.49 ms │                    5763.88 ms │ +1.12x faster │
│ QQuery 35 │  2125.22 ms │                    1987.99 ms │ +1.07x faster │
│ QQuery 36 │    73.02 ms │                      70.75 ms │     no change │
│ QQuery 37 │    49.71 ms │                      47.19 ms │ +1.05x faster │
│ QQuery 38 │    70.38 ms │                      68.33 ms │     no change │
│ QQuery 39 │   110.40 ms │                     107.67 ms │     no change │
│ QQuery 40 │    29.37 ms │                      26.95 ms │ +1.09x faster │
│ QQuery 41 │    25.68 ms │                      24.74 ms │     no change │
│ QQuery 42 │    22.86 ms │                      20.40 ms │ +1.12x faster │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 98410.56ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 93566.29ms │
│ Average Time (HEAD)                          │  2288.62ms │
│ Average Time (optimize-arrow-bytes-view-map) │  2175.96ms │
│ Queries Faster                               │          7 │
│ Queries Slower                               │          7 │
│ Queries with No Change                       │         29 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 103.31 ms │                     102.99 ms │     no change │
│ QQuery 2  │  33.78 ms │                      32.55 ms │     no change │
│ QQuery 3  │  41.19 ms │                      39.93 ms │     no change │
│ QQuery 4  │  31.17 ms │                      30.32 ms │     no change │
│ QQuery 5  │  90.04 ms │                      91.70 ms │     no change │
│ QQuery 6  │  20.86 ms │                      20.85 ms │     no change │
│ QQuery 7  │ 161.09 ms │                     161.54 ms │     no change │
│ QQuery 8  │  46.65 ms │                      39.71 ms │ +1.17x faster │
│ QQuery 9  │ 122.57 ms │                     105.57 ms │ +1.16x faster │
│ QQuery 10 │  81.65 ms │                      68.10 ms │ +1.20x faster │
│ QQuery 11 │  20.63 ms │                      18.66 ms │ +1.11x faster │
│ QQuery 12 │  53.19 ms │                      51.07 ms │     no change │
│ QQuery 13 │  49.25 ms │                      49.37 ms │     no change │
│ QQuery 14 │  15.23 ms │                      15.04 ms │     no change │
│ QQuery 15 │  31.06 ms │                      29.94 ms │     no change │
│ QQuery 16 │  28.67 ms │                      28.29 ms │     no change │
│ QQuery 17 │ 145.52 ms │                     142.70 ms │     no change │
│ QQuery 18 │ 287.58 ms │                     279.83 ms │     no change │
│ QQuery 19 │  39.38 ms │                      40.41 ms │     no change │
│ QQuery 20 │  58.07 ms │                      55.71 ms │     no change │
│ QQuery 21 │ 193.10 ms │                     188.30 ms │     no change │
│ QQuery 22 │  22.84 ms │                      22.00 ms │     no change │
└───────────┴───────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1676.83ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1614.58ms │
│ Average Time (HEAD)                          │   76.22ms │
│ Average Time (optimize-arrow-bytes-view-map) │   73.39ms │
│ Queries Faster                               │         4 │
│ Queries Slower                               │         0 │
│ Queries with No Change                       │        18 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

// get the value as bytes
let value: &[u8] = value.as_ref();
// Extract length from the view (first 4 bytes of u128 in little-endian)
let len = (view_u128 & 0xFFFFFFFF) as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view_u128 as u32 does the same thing

}

// For larger strings: first compare the 4-byte prefix
let stored_prefix = ((header.view >> 32) & 0xFFFFFFFF) as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& 0xFFFFFFFF is not needed


// For larger strings: first compare the 4-byte prefix
let stored_prefix = ((header.view >> 32) & 0xFFFFFFFF) as u32;
let input_prefix = ((view_u128 >> 32) & 0xFFFFFFFF) as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& 0xFFFFFFFF is not needed

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small suggestions, but I think we're good to go then.

@github-actions github-actions bot added the catalog Related to the catalog crate label Jan 26, 2026
let meta_fetch_concurrency =
ctx.config_options().execution.meta_fetch_concurrency;
let file_list = stream::iter(file_list).flatten_unordered(meta_fetch_concurrency);
let file_list = stream::iter(self.table_paths.iter())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you commit this by accident?

Copy link
Contributor Author

@Tushar7012 Tushar7012 Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right - this was committed by accident. I've reverted the table.rs changes in commit 9302c3b. These changes belong in a separate PR for parallelizing list_files_for_scan. Thanks for catching it

@github-actions github-actions bot removed the catalog Related to the catalog crate label Jan 26, 2026
@Dandandan Dandandan added this pull request to the merge queue Jan 26, 2026
@Dandandan
Copy link
Contributor

Thank you @Tushar7012 !

Merged via the queue into apache:main with commit cda6425 Jan 26, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize ArrowBytesViewMap

3 participants