s390x: use llvm intrinsics instead of simd_fmin/fmax#2058
s390x: use llvm intrinsics instead of simd_fmin/fmax#2058folkertdev merged 3 commits intorust-lang:mainfrom
Conversation
|
r? @sayantn rustbot has assigned @sayantn. Use Why was this reviewer chosen?The reviewer was selected based on:
|
52be653 to
41871cf
Compare
|
In general, this makes sense to me. Not sure I understand the macro logic either - I'll leave that to @folkertdev . I'm wondering whether there is anything we need to do to restrict usage of the low-level LLVM intrinsics to z14 and higher ( |
|
The code already uses a bunch of What I would have expected is that these operations have target_feature requirements which ensure that the HW instruction exists. Otherwise, there'll be some extra runtime overhead, won't there? But that does not seem to be the case, only the tests are gated on the "enhancements-1" feature. |
|
The macros are inherited from the The design of s390x vector functions is different from e.g. x86 in that the functions in C are polymorphic. There is one |
Okay... what does that mean for worrying about vector-enhancements-1 and the fallback emulation? |
|
The only option I see is to From what i understand it is quite common in practice to compile s390x for a particular CPU, but anything compiled with the baseline would use the fallback. |
|
Is that something we have to do by hand? None of the existing intrinsics seem to do anything like it. |
|
In most cases we use the |
|
Ah I didn't realize So what is the fallback strategy? I could make it something like if cfg!(target_feature = "vector-enhancements-1") {
vfminsb(a, b, const { 0 })
} else {
// fallback according to docs: !(b<a)?a:b
simd_select(simd_not(simd_lt(b, a)), a, b)
}but that I don't think we can do the fallback by hand. We could just set |
|
hmm, right. I don't think we can do better than that unfortunately. With the others inlining helps us out so that we get codegen with the features of the surrounding function. Maybe we can add an intrinsic that explicitly has the fallback implementation on older hardware to LLVM? |
|
An s390x-specific intrinsic? No other targets needs that, is this target truly so cursed? IMO this should be handled on the LLVM side -- if the intent is to provide the intrinsic also on z13, then |
|
Ah, sorry, I meant "we" as in the people in this thread, specifically @uweigand as (also) a s390x LLVM maintainer. So yes my suggestion is also to add this to LLVM, either by making the existing LLVM intrinsic have a fallback implementation, or implementing a variant that has the fallback. |
|
I looked into what is being tested here and I am confused. Should the assert_instr tests show up in the CI logs somewhere? I can't see any trace of them. |
|
Not obvious, but the assembly tests only run in the "release" CI jobs, not the "dev" ones (those only run the manual correctness tests). It does kind of make sense in that the assembly would not look as expected in "dev" mode, but we've had bugs in the past where the implementation was actually incorrect in dev mode. anyway, the test does run with the "release" job: |
|
I had to add manual tests to trigger this: |
|
Due to the fallback issue, we can't actually use the However, we can still avoid the use of Rust's |
|
r? @folkertdev |
This comment was marked as resolved.
This comment was marked as resolved.
|
(Trying) |
There was a problem hiding this comment.
The best we can do for now. @uweigand do you have thoughts on how to fix this in LLVM?
Replied in the linked issue to consolidate discussion there. |
| [0, !0, !0, !0] | ||
| } | ||
|
|
||
| // f32 is the tricky case for max/min as that needs a fallback on z13 |
There was a problem hiding this comment.
Ah yes. On z13 the f32 case would cause a compile-time error in the C version. The tricky case (available but emulated) is f64 on z13.
Based on the discussion at rust-lang/rust#153395 (comment). Also see #2060 -- that problem is not actually fixed, for the reasons explained there, but at least we become independent of what Rust's portable intrinsics do (and in particular we get the signed zero guarantee, which is not currently documented for simd_fmin/fmax).
I wrote this code entirely by pattern matching, I have no idea if it makes any sense.^^ The s390x folder here is an impenetrable undocumented macro soup so I don't even know what is happening on the Rust side.
Cc @uweigand @folkertdev