Skip to content

fix(neon): unsigned bitwise shifts are never called#1266

Merged
serge-sans-paille merged 4 commits intoxtensor-stack:masterfrom
AntoinePrv:neon-rshift
Mar 6, 2026
Merged

fix(neon): unsigned bitwise shifts are never called#1266
serge-sans-paille merged 4 commits intoxtensor-stack:masterfrom
AntoinePrv:neon-rshift

Conversation

@AntoinePrv
Copy link
Contributor

@AntoinePrv AntoinePrv commented Mar 4, 2026

Shift public API expects the same type for data and shifts:

XSIMD_INLINE batch<T, A> bitwise_lshift(batch<T, A> const& x, batch<T, A> const& shift) noexcept

However, the neon implementations for unsigned data take the shifts as signed (likely because the underlying intrinsict use signed shifts).

XSIMD_INLINE batch<T, A> bitwise_lshift(batch<T, A> const& lhs, batch<as_signed_integer_t<T>, A> const& rhs, requires_arch<neon>) noexcept

As a result the neon overload is never selected, falling back on common scalar implementation.

Example

#include <cstdint>
#include <xsimd/xsimd.hpp>

auto f(xsimd::batch<std::uint16_t> const& a, xsimd::batch<std::uint16_t> const& b)
{
    return xsimd::bitwise_rshift(a, b);
}

Before:

ldr     q0, [x0]
ldr     q1, [x1]
ushll.4s        v2, v0, #0
ushll2.4s       v0, v0, #0
ushll.4s        v3, v1, #0
movi.2d v4, #0000000000000000
usubw2.4s       v1, v4, v1
ushl.4s v0, v0, v1
neg.4s  v1, v3
ushl.4s v1, v2, v1
uzp1.8h v0, v1, v0
ret

After:

ldr     q0, [x0]
ldr     q1, [x1]
neg.8h  v1, v1
ushl.8h v0, v0, v1
ret

I also grep with -O0 and the assembly was previously pointing to common as a requires_arch in the third parameter.

@AntoinePrv
Copy link
Contributor Author

Note on 64 bits

The fact that bitwise_lshift for 64 bits is in neon and bitwise_rshift for 64 bit sits alone in neon64 also seems suspicious.

Copy link
Contributor

@serge-sans-paille serge-sans-paille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with asserts added if you feel like doing so

@serge-sans-paille
Copy link
Contributor

Note on 64 bits

The fact that bitwise_lshift for 64 bits is in neon and bitwise_rshift for 64 bit sits alone in neon64 also seems suspicious.

What puzzles me is... how is

vshlq_u64(lhs, vnegq_s64(rhs));

an implementation of a right shift

@serge-sans-paille
Copy link
Contributor

Note on 64 bits

The fact that bitwise_lshift for 64 bits is in neon and bitwise_rshift for 64 bit sits alone in neon64 also seems suspicious.

What puzzles me is... how is

vshlq_u64(lhs, vnegq_s64(rhs));

an implementation of a right shift

And the answer is because left shift seems to support shifting right if the operand is negative, fine.
And it's in neon64 because vnegq_s64 requires neon64. But we could emulate this on neon32 appropriately, I'll do that.

@serge-sans-paille
Copy link
Contributor

And the answer is because left shift seems to support shifting right if the operand is negative, fine. And it's in neon64 because vnegq_s64 requires neon64. But we could emulate this on neon32 appropriately, I'll do that.

see #1268

@AntoinePrv
Copy link
Contributor Author

@serge-sans-paille all good now

}

template <class A, class T>
XSIMD_INLINE bool all_positive(batch<T, A> const& b) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just xsimd::all(b >= 0)

Copy link
Contributor Author

@AntoinePrv AntoinePrv Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not work because kernel::all on which it depends is declared below.
Would also require compare utilities to be declared.

This is not the only case where I find it hard to reuse other functions, with possibly circular include loops.
With all the SFINAE, forward declaring all of them is not very maintainable.
(In C++17 we could get rid of all SFINAE in favor of if constexpr and have simple signatures that we can easily forward declare).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the C++17 simplification. But that's for another day. meanwhile you could at least store to a buffer and call std::all_of ?

Copy link
Contributor Author

@AntoinePrv AntoinePrv Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serge-sans-paille I'm happy to do the changes, but not sure it will be for the best.
What is the objective? Less assertion code? No extra visible symbol (all_positive)?

If we want to get rid of all_positive then we would need at least this in all functions. Possibly more with #ifdef NDEBUG to avoid warnings given [[maybe_unused]] is C++17.

std::array<T, batch<T, A>> tmp = {};
rhs.store(tmp.data());
assert(std::all_of(tmp.begin(), tmp.end(), [](auto x) { return x >= 0; }));

Or do you mean to use std::all_of inside all_positive?

@serge-sans-paille
Copy link
Contributor

Thanks!

@serge-sans-paille serge-sans-paille merged commit b8f7ebe into xtensor-stack:master Mar 6, 2026
69 of 70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants