Conversation
serge-sans-paille
left a comment
There was a problem hiding this comment.
Some early review, a few open questions but nothing looks bad here. Way to go!
| { | ||
| inline void get_cpuid(int reg[4], int level, int count = 0) noexcept; | ||
|
|
||
| inline std::uint32_t get_xcr0_low() noexcept; |
There was a problem hiding this comment.
it would be good to somehow ensure that this type is the same as ref_t
There was a problem hiding this comment.
And this could be a private static method for better encapsulation right?
| constexpr bool fma4() const noexcept | ||
| { | ||
| return utils::bit_is_set<16>(m_regs.reg8[2]); | ||
| } |
There was a problem hiding this comment.
Open question: should we have a macro that makes the generation of those field cleaner to the eye?
There was a problem hiding this comment.
Either way. I am personally not big on macros. I find it easier read this way (could also be made to fit on one line) and find&replace is quite enough a change is required.
|
|
||
| namespace detail | ||
| { | ||
| inline void get_cpuid(int reg[4], int level, int count) noexcept |
There was a problem hiding this comment.
Same here: could be private for better encapsulation.
There was a problem hiding this comment.
On the contrary, I was wondering whether to make it part of public API.
It's a well defined function, unlikely to change, and that could be useful for users to get features we do not expose (not sure we'll go to cover 100% of cpuid).
| * The full license is in the file LICENSE, distributed with this software. * | ||
| ****************************************************************************/ | ||
|
|
||
| #include "../config/xsimd_inline.hpp" |
There was a problem hiding this comment.
why do you need this?
There was a problem hiding this comment.
It for header including what they use (otherwise I get nasty errors in my editor).
But this one did not belong here but in xsimd_register.hpp where XSIMD_INLINE is used.
AntoinePrv
left a comment
There was a problem hiding this comment.
Comments on change may make relative to the previous implementation
| #elif defined(__INTEL_COMPILER) | ||
| __cpuid(reg.data(), level); |
There was a problem hiding this comment.
Should we change to use inline ASM for intel compiler? Missing the count option here.
There was a problem hiding this comment.
I assumle so, feel free to give it a try!
| #elif defined(_MSC_VER) && _MSC_VER >= 1400 | ||
| return static_cast<xcr0_reg_t>(_xgetbv(0)); | ||
|
|
||
| #elif defined(__GNUC__) |
There was a problem hiding this comment.
What about __clang__ and __INTEL_COMPILER? Should we reproduce the get_cpuid pattern?
There was a problem hiding this comment.
Yes, I think so. Also, IIRC, __clang__ comes with __GNUC__ defined too, and Intel compiler defines __GNUC__ or _MSC_VER depending on the platform. So we may simplify this logic (but first we need to verify that my asumption is right).
There was a problem hiding this comment.
It's true that clang defines __GNUC__. And it happens to be true for icx too, see https://godbolt.org/z/56j57vhnT
|
@serge-sans-paille what do you think of this? I wanted to add an |
| #elif defined(_MSC_VER) && _MSC_VER >= 1400 | ||
| return static_cast<xcr0_reg_t>(_xgetbv(0)); | ||
|
|
||
| #elif defined(__GNUC__) |
There was a problem hiding this comment.
Yes, I think so. Also, IIRC, __clang__ comes with __GNUC__ defined too, and Intel compiler defines __GNUC__ or _MSC_VER depending on the platform. So we may simplify this logic (but first we need to verify that my asumption is right).
|
@AntoinePrv please ping me when you need a review! |
|
@serge-sans-paille this is ready. This is intended to make a clean abstraction for CPUID.
|
| constexpr static x86_xcr0 safe_default() noexcept | ||
| { | ||
| reg_t low = {}; | ||
| low = utils::set_bit<static_cast<reg_t>(bit::sse)>(low); |
There was a problem hiding this comment.
Coild be a call to make_mask directly, or (better ?) use an overload of set_bit with no parameter.
| { | ||
| // Check all SSE, AVX, and AVX512 bits even though AVX512 must | ||
| // imply AVX and SSE | ||
| return bit_is_set<bit::sse, bit::avx, bit::zmm_hi256>(m_low); |
There was a problem hiding this comment.
it's a bit strange to have a function named bit_is_set that takes several bits as parameter. Maybe all_bits_set<...> ?
| /** | ||
| * Read the CpuId registers from the CPU if on the correct architecture. | ||
| * | ||
| * This is only safe to call if bit 18 of CR4.OSXSAVE has been set. |
There was a problem hiding this comment.
Instead of a comment, you could make this an assert
| regs.reg1 = detail::get_cpuid(0x1); | ||
| regs.reg7 = detail::get_cpuid(0x7); | ||
| regs.reg7a = detail::get_cpuid(0x7, 0x1); | ||
| regs.reg8 = detail::get_cpuid(0x80000001); |
There was a problem hiding this comment.
Open question: could we avoid some get_cpuid call in some situation? For instance reg8 is only used for fma4, and reg7 for avx2 or later, which mean we could skip filling reg7 if we don't have avx, and fma4 if have fma3.
| #elif defined(__INTEL_COMPILER) | ||
| __cpuid(reg.data(), level); |
There was a problem hiding this comment.
I assumle so, feel free to give it a try!
| #if !XSIMD_TARGET_X86 | ||
| (void)level; | ||
| (void)count; | ||
| return {}; // All bits to zero |
There was a problem hiding this comment.
you could avoid this return and let the return reg at the end do the job. I wouldn't mind having
#if XSIMD_TARGET_X86
// current implementation
#else
inline cpuid_reg_t get_cpuid(int , int ) noexcept {
return {}; // All bits to zero
}
| #elif defined(_MSC_VER) && _MSC_VER >= 1400 | ||
| return static_cast<xcr0_reg_t>(_xgetbv(0)); | ||
|
|
||
| #elif defined(__GNUC__) |
There was a problem hiding this comment.
It's true that clang defines __GNUC__. And it happens to be true for icx too, see https://godbolt.org/z/56j57vhnT
|
|
||
| auto get_cpuid = [](int reg[4], int level, int count = 0) noexcept | ||
| { | ||
| // Safe on all platforms, we simply be false |
There was a problem hiding this comment.
that english sentence is... difficult to understand
|
|
||
| #if defined(_MSC_VER) | ||
| __cpuidex(reg, level, count); | ||
| sse2 = cpuid.sse2() && xcr0.sse_enabled(); |
There was a problem hiding this comment.
some random thoughts: it looks like we actually want supported_arch to have a filed member that matches the data structure of cpuid, and have a similar layout for xcr0 so that we could just write something like *this = cpuid & xcr0 and implement this very efficiently.
| { | ||
| template <typename I> | ||
| constexpr I make_bit_mask(I bit) | ||
| { |
There was a problem hiding this comment.
you could static_assert that 8 * bit < sizeof(I), altough I guess the compiler would issue a warning in that case.
This is a no-addition, non-breaking refactor of existing CPU id features as first class citizen as part of #1245.
supported_archkeeps the same structure and merges use of both class at the moment but both class could be combined a user-friendlyx86_cpu_features.