Skip to content

s390x: use llvm intrinsics instead of simd_fmin/fmax#2058

Merged
folkertdev merged 3 commits intorust-lang:mainfrom
RalfJung:s390x-minmax
Mar 11, 2026
Merged

s390x: use llvm intrinsics instead of simd_fmin/fmax#2058
folkertdev merged 3 commits intorust-lang:mainfrom
RalfJung:s390x-minmax

Conversation

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Mar 10, 2026

Based on the discussion at rust-lang/rust#153395 (comment). Also see #2060 -- that problem is not actually fixed, for the reasons explained there, but at least we become independent of what Rust's portable intrinsics do (and in particular we get the signed zero guarantee, which is not currently documented for simd_fmin/fmax).

I wrote this code entirely by pattern matching, I have no idea if it makes any sense.^^ The s390x folder here is an impenetrable undocumented macro soup so I don't even know what is happening on the Rust side.

Cc @uweigand @folkertdev

@rustbot
Copy link
Collaborator

rustbot commented Mar 10, 2026

r? @sayantn

rustbot has assigned @sayantn.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @Amanieu, @folkertdev, @sayantn
  • @Amanieu, @folkertdev, @sayantn expanded to Amanieu, folkertdev, sayantn
  • Random selection from Amanieu, folkertdev, sayantn

@RalfJung RalfJung force-pushed the s390x-minmax branch 3 times, most recently from 52be653 to 41871cf Compare March 10, 2026 16:52
@uweigand
Copy link

In general, this makes sense to me. Not sure I understand the macro logic either - I'll leave that to @folkertdev .

I'm wondering whether there is anything we need to do to restrict usage of the low-level LLVM intrinsics to z14 and higher (vector-enhancements-1 facility)? Without the facility, use of these intrinsics will cause an internal error in LLVM - it would be preferable to either restrict use of the high-level intrinsics as well, or else emulate them.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 10, 2026

The code already uses a bunch of llvm.s390.* intrinsics, apparently without any special care for the instruction set version. Some of those have their tests guarded by the "enhancements-1" check as well, e.g. nearbyint_v4f32. Are min/max somehow different here?

What I would have expected is that these operations have target_feature requirements which ensure that the HW instruction exists. Otherwise, there'll be some extra runtime overhead, won't there? But that does not seem to be the case, only the tests are gated on the "enhancements-1" feature.

@folkertdev
Copy link
Contributor

The macros are inherited from the powerpc implementation. Ultimately I think we should move more towards a generator-based approach like the aarch64 backend. Also because right now it's really hard to see whether a particular combination of arguments is actually implemented. It would be nice if we could generate this somehow from e.g. C headers (the hexagon backend recently started doing this).

The design of s390x vector functions is different from e.g. x86 in that the functions in C are polymorphic. There is one vec_min that works on vector int, vector float, vector double, ect. We implement this using traits, and we can't guard a specific type/implementation with a target feature.

@RalfJung
Copy link
Member Author

The design of s390x vector functions is different from e.g. x86 in that the functions in C are polymorphic. There is one vec_min that works on vector int, vector float, vector double, ect. We implement this using traits, and we can't guard a specific type/implementation with a target feature.

Okay... what does that mean for worrying about vector-enhancements-1 and the fallback emulation?

@folkertdev
Copy link
Contributor

The only option I see is to cfg on whether the target feature is present at build time, and pick the fallback if not. We're in core, so we could not even perform runtime detection if we wanted to.

From what i understand it is quite common in practice to compile s390x for a particular CPU, but anything compiled with the baseline would use the fallback.

@RalfJung
Copy link
Member Author

Is that something we have to do by hand? None of the existing intrinsics seem to do anything like it.

@folkertdev
Copy link
Contributor

In most cases we use the intrinsics::simd which should automatically provide a fallback. Similarly nearbyint and roundeven are portable llvm intrinsics. So this is the first time we can't make it LLVM's problem I think.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 10, 2026

Ah I didn't realize nearbyint_v4f32 is not an llvm.s390 import.

So what is the fallback strategy? I could make it something like

if cfg!(target_feature = "vector-enhancements-1") {
  vfminsb(a, b, const { 0 })
} else {
  // fallback according to docs: !(b<a)?a:b
  simd_select(simd_not(simd_lt(b, a)), a, b)
}

but that cfg! is evaluated when the standard library is built, so the sysroot we distribute would not even contain the vfminsb path. I don't think we have a way of selecting something based on the target features that are available at codegen time?

I don't think we can do the fallback by hand. We could just set #[target_feature(enable = "vector-enhancements-1")], making it UB to invoke that particular overload without the given target feature...

@folkertdev
Copy link
Contributor

hmm, right. I don't think we can do better than that unfortunately. With the others inlining helps us out so that we get codegen with the features of the surrounding function. Maybe we can add an intrinsic that explicitly has the fallback implementation on older hardware to LLVM?

@RalfJung
Copy link
Member Author

An s390x-specific intrinsic? No other targets needs that, is this target truly so cursed?

IMO this should be handled on the LLVM side -- if the intent is to provide the intrinsic also on z13, then llvm.s390.blah should work on z13.

@folkertdev
Copy link
Contributor

folkertdev commented Mar 10, 2026

Ah, sorry, I meant "we" as in the people in this thread, specifically @uweigand as (also) a s390x LLVM maintainer.

So yes my suggestion is also to add this to LLVM, either by making the existing LLVM intrinsic have a fallback implementation, or implementing a variant that has the fallback.

@RalfJung
Copy link
Member Author

I looked into what is being tested here and I am confused. Should the assert_instr tests show up in the CI logs somewhere? I can't see any trace of them.

@folkertdev
Copy link
Contributor

Not obvious, but the assembly tests only run in the "release" CI jobs, not the "dev" ones (those only run the manual correctness tests). It does kind of make sense in that the assembly would not look as expected in "dev" mode, but we've had bugs in the past where the implementation was actually incorrect in dev mode.

anyway, the test does run with the "release" job:

https://triage.rust-lang.org/gha-logs/rust-lang/stdarch/66535751792?pr=2058#L2026-03-10T21:50:32.0189142Z

@RalfJung
Copy link
Member Author

I had to add manual tests to trigger this:

rustc-LLVM ERROR: Cannot select: intrinsic %llvm.s390.vfmaxsb
error: could not compile `core_arch` (lib test)

@RalfJung RalfJung changed the title s390x: use llvm.s390 intrinsics instead of simd_fmin/fmax s390x: use llvm intrinsics instead of simd_fmin/fmax Mar 11, 2026
@RalfJung
Copy link
Member Author

Due to the fallback issue, we can't actually use the llvm.s390.* intrinsics. That means we can't fix the wrong semantics. I opened #2060 to track that.

However, we can still avoid the use of Rust's simd_fmin/fmax intrinsics and directly invoke llvm.minnum. IMO that makes sense since it lets us adjust simd_fmin/fmax to be consistent with the scalar version.

@RalfJung
Copy link
Member Author

r? @folkertdev

@rustbot

This comment was marked as resolved.

@jieyouxu
Copy link
Member

(Trying)
r? folkertdev

@rustbot rustbot assigned folkertdev and unassigned sayantn Mar 11, 2026
Copy link
Contributor

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best we can do for now. @uweigand do you have thoughts on how to fix this in LLVM?

View changes since this review

@folkertdev folkertdev added this pull request to the merge queue Mar 11, 2026
Merged via the queue into rust-lang:main with commit 98e033d Mar 11, 2026
77 checks passed
@uweigand
Copy link

The best we can do for now. @uweigand do you have thoughts on how to fix this in LLVM?

Replied in the linked issue to consolidate discussion there.

[0, !0, !0, !0]
}

// f32 is the tricky case for max/min as that needs a fallback on z13
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uweigand so... this comment is wrong then?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. On z13 the f32 case would cause a compile-time error in the C version. The tricky case (available but emulated) is f64 on z13.

@RalfJung RalfJung deleted the s390x-minmax branch March 11, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants