Skip to content

Comments

Refactor cpuid#1251

Open
AntoinePrv wants to merge 16 commits intoxtensor-stack:masterfrom
AntoinePrv:cpuid
Open

Refactor cpuid#1251
AntoinePrv wants to merge 16 commits intoxtensor-stack:masterfrom
AntoinePrv:cpuid

Conversation

@AntoinePrv
Copy link
Contributor

This is a no-addition, non-breaking refactor of existing CPU id features as first class citizen as part of #1245.

  • Two individual and simple classes parsing for XCR0 and CPUID
  • The classes/header is safe on all platform, avoiding
  • supported_arch keeps the same structure and merges use of both class at the moment but both class could be combined a user-friendly x86_cpu_features.

Copy link
Contributor

@serge-sans-paille serge-sans-paille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early review, a few open questions but nothing looks bad here. Way to go!

{
inline void get_cpuid(int reg[4], int level, int count = 0) noexcept;

inline std::uint32_t get_xcr0_low() noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to somehow ensure that this type is the same as ref_t

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this could be a private static method for better encapsulation right?

constexpr bool fma4() const noexcept
{
return utils::bit_is_set<16>(m_regs.reg8[2]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: should we have a macro that makes the generation of those field cleaner to the eye?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way. I am personally not big on macros. I find it easier read this way (could also be made to fit on one line) and find&replace is quite enough a change is required.


namespace detail
{
inline void get_cpuid(int reg[4], int level, int count) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: could be private for better encapsulation.

Copy link
Contributor Author

@AntoinePrv AntoinePrv Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the contrary, I was wondering whether to make it part of public API.
It's a well defined function, unlikely to change, and that could be useful for users to get features we do not expose (not sure we'll go to cover 100% of cpuid).

* The full license is in the file LICENSE, distributed with this software. *
****************************************************************************/

#include "../config/xsimd_inline.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It for header including what they use (otherwise I get nasty errors in my editor).
But this one did not belong here but in xsimd_register.hpp where XSIMD_INLINE is used.

Copy link
Contributor Author

@AntoinePrv AntoinePrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on change may make relative to the previous implementation

Comment on lines +260 to +261
#elif defined(__INTEL_COMPILER)
__cpuid(reg.data(), level);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change to use inline ASM for intel compiler? Missing the count option here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumle so, feel free to give it a try!

#elif defined(_MSC_VER) && _MSC_VER >= 1400
return static_cast<xcr0_reg_t>(_xgetbv(0));

#elif defined(__GNUC__)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about __clang__ and __INTEL_COMPILER? Should we reproduce the get_cpuid pattern?

Copy link
Member

@JohanMabille JohanMabille Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. Also, IIRC, __clang__ comes with __GNUC__ defined too, and Intel compiler defines __GNUC__ or _MSC_VER depending on the platform. So we may simplify this logic (but first we need to verify that my asumption is right).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that clang defines __GNUC__. And it happens to be true for icx too, see https://godbolt.org/z/56j57vhnT

@AntoinePrv AntoinePrv marked this pull request as draft February 2, 2026 17:18
@AntoinePrv
Copy link
Contributor Author

@serge-sans-paille what do you think of this?

I wanted to add an x86_cpu_features that mixed xcr0, cpuid, as well as future opinionated features (e.g. override avx2 detection from cpuid if the specific implementation is known not to be performant).
However today that felt very redundant, perhaps in the future if we simplify the logic of supported_arch.

@AntoinePrv AntoinePrv marked this pull request as ready for review February 4, 2026 10:13
#elif defined(_MSC_VER) && _MSC_VER >= 1400
return static_cast<xcr0_reg_t>(_xgetbv(0));

#elif defined(__GNUC__)
Copy link
Member

@JohanMabille JohanMabille Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. Also, IIRC, __clang__ comes with __GNUC__ defined too, and Intel compiler defines __GNUC__ or _MSC_VER depending on the platform. So we may simplify this logic (but first we need to verify that my asumption is right).

@serge-sans-paille
Copy link
Contributor

serge-sans-paille commented Feb 20, 2026

@AntoinePrv please ping me when you need a review!

@AntoinePrv
Copy link
Contributor Author

@serge-sans-paille this is ready. This is intended to make a clean abstraction for CPUID.
In follow-up PRs, I will:

  • Add some sort of similar abstraction for arm64
  • Propose a generic user-facing interface.

constexpr static x86_xcr0 safe_default() noexcept
{
reg_t low = {};
low = utils::set_bit<static_cast<reg_t>(bit::sse)>(low);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coild be a call to make_mask directly, or (better ?) use an overload of set_bit with no parameter.

{
// Check all SSE, AVX, and AVX512 bits even though AVX512 must
// imply AVX and SSE
return bit_is_set<bit::sse, bit::avx, bit::zmm_hi256>(m_low);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit strange to have a function named bit_is_set that takes several bits as parameter. Maybe all_bits_set<...> ?

/**
* Read the CpuId registers from the CPU if on the correct architecture.
*
* This is only safe to call if bit 18 of CR4.OSXSAVE has been set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a comment, you could make this an assert

regs.reg1 = detail::get_cpuid(0x1);
regs.reg7 = detail::get_cpuid(0x7);
regs.reg7a = detail::get_cpuid(0x7, 0x1);
regs.reg8 = detail::get_cpuid(0x80000001);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: could we avoid some get_cpuid call in some situation? For instance reg8 is only used for fma4, and reg7 for avx2 or later, which mean we could skip filling reg7 if we don't have avx, and fma4 if have fma3.

Comment on lines +260 to +261
#elif defined(__INTEL_COMPILER)
__cpuid(reg.data(), level);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumle so, feel free to give it a try!

#if !XSIMD_TARGET_X86
(void)level;
(void)count;
return {}; // All bits to zero
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could avoid this return and let the return reg at the end do the job. I wouldn't mind having

#if XSIMD_TARGET_X86
// current implementation
#else
inline cpuid_reg_t get_cpuid(int , int ) noexcept {
return {}; // All bits to zero
}

#elif defined(_MSC_VER) && _MSC_VER >= 1400
return static_cast<xcr0_reg_t>(_xgetbv(0));

#elif defined(__GNUC__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that clang defines __GNUC__. And it happens to be true for icx too, see https://godbolt.org/z/56j57vhnT


auto get_cpuid = [](int reg[4], int level, int count = 0) noexcept
{
// Safe on all platforms, we simply be false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that english sentence is... difficult to understand


#if defined(_MSC_VER)
__cpuidex(reg, level, count);
sse2 = cpuid.sse2() && xcr0.sse_enabled();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some random thoughts: it looks like we actually want supported_arch to have a filed member that matches the data structure of cpuid, and have a similar layout for xcr0 so that we could just write something like *this = cpuid & xcr0 and implement this very efficiently.

{
template <typename I>
constexpr I make_bit_mask(I bit)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could static_assert that 8 * bit < sizeof(I), altough I guess the compiler would issue a warning in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants