Skip to content

Introduce count{l,r}_{zero,one} for batch_bool#1269

Merged
serge-sans-paille merged 1 commit intoxtensor-stack:masterfrom
onalante-ebay:batch_countl_zero
Mar 6, 2026
Merged

Introduce count{l,r}_{zero,one} for batch_bool#1269
serge-sans-paille merged 1 commit intoxtensor-stack:masterfrom
onalante-ebay:batch_countl_zero

Conversation

@onalante-ebay
Copy link
Contributor

@onalante-ebay onalante-ebay commented Mar 4, 2026

In #1236, it was mentioned that variable-sized bit groups for certain
batch_bool reductions would be slightly more efficient than extracting
a proper bitmask. To achieve this, the xsimd API is extended with the
functions xsimd::count{l,r}_{zero,one}, and count is revised to
allow per-platform kernels. The default implementations for each
function simply apply the corresponding scalar operation (for which
__cpp_lib_bitops == 201907L is partially backported) on
batch_bool::mask. This is specialized for NEON(64) by instead
applying the scalar operation to the narrowed batch, then scaling the
result by the "lane" size of the bit group size.

@serge-sans-paille
Copy link
Contributor

I'm fine with the overall approach, but I think it means those operation should live in the kernel namespace with the appropriate dispatch, as we do for other operations.

Please ping me once you reach a green CI, and thanks for working on this 🙇

@onalante-ebay
Copy link
Contributor Author

onalante-ebay commented Mar 4, 2026

Right, the public xsimd::count{l,r}_{zero,one} functions call kernel::count{l,r}_{zero,one} as is done for other operations. Have I made a mistake with the implementation?

@onalante-ebay onalante-ebay force-pushed the batch_countl_zero branch 2 times, most recently from 9cf4926 to 8e73330 Compare March 4, 2026 16:09
@DiamonDinoia
Copy link
Contributor

I would check for __cpp_lib_bitops and if it fails provide a custom popcount.

@onalante-ebay
Copy link
Contributor Author

onalante-ebay commented Mar 4, 2026

Done. I was concerned that just trusting __cpp_lib_bitops might be problematic since libstdc++<13 would return some bit operations results' as the argument type rather than int (e.g. bit_width)1. Thankfully, this does not appear to apply to the count operations.

Footnotes

  1. Though this worry is admittedly overblown in that the result could simply just be cast to int.

@onalante-ebay
Copy link
Contributor Author

@serge-sans-paille CI is passing.

@onalante-ebay onalante-ebay force-pushed the batch_countl_zero branch 2 times, most recently from 06dac6b to a3765aa Compare March 5, 2026 16:49
In xtensor-stack#1236, it was mentioned that variable-sized bit groups for certain
`batch_bool` reductions would be slightly more efficient than extracting
a proper bitmask.  To achieve this, the xsimd API is extended with the
functions `xsimd::count{l,r}_{zero,one}`, and `count` is revised to
allow per-platform kernels.  The default implementations for each
function simply apply the corresponding scalar operation (for which
`__cpp_lib_bitops == 201907L` is partially backported) on
`batch_bool::mask`.  This is specialized for NEON(64) by instead
applying the scalar operation to the narrowed batch, then scaling the
result by the bit group size.
@serge-sans-paille
Copy link
Contributor

LGTM! Please just squash the history and we're good.

Thanks a lot for your effort and... a question, if you don't mind: in which context are you using xsimd, and what for?

@onalante-ebay
Copy link
Contributor Author

Squashed. Sorry, I am not at liberty to discuss the context at this time.

@serge-sans-paille
Copy link
Contributor

That's totally fine!
Thanks for this cool PR.

@serge-sans-paille serge-sans-paille merged commit 923b986 into xtensor-stack:master Mar 6, 2026
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants