s390x: use llvm intrinsics instead of simd_fmin/fmax by RalfJung · Pull Request #2058 · rust-lang/stdarch

RalfJung · 2026-03-10T16:44:50Z

Based on the discussion at rust-lang/rust#153395 (comment). Also see #2060 -- that problem is not actually fixed, for the reasons explained there, but at least we become independent of what Rust's portable intrinsics do (and in particular we get the signed zero guarantee, which is not currently documented for simd_fmin/fmax).

I wrote this code entirely by pattern matching, I have no idea if it makes any sense.^^ The s390x folder here is an impenetrable undocumented macro soup so I don't even know what is happening on the Rust side.

Cc @uweigand @folkertdev

rustbot · 2026-03-10T16:44:56Z

r? @sayantn

rustbot has assigned @sayantn.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

Owners of files modified in this PR: @Amanieu, @folkertdev, @sayantn
@Amanieu, @folkertdev, @sayantn expanded to Amanieu, folkertdev, sayantn
Random selection from Amanieu, folkertdev, sayantn

uweigand · 2026-03-10T17:05:55Z

In general, this makes sense to me. Not sure I understand the macro logic either - I'll leave that to @folkertdev .

I'm wondering whether there is anything we need to do to restrict usage of the low-level LLVM intrinsics to z14 and higher (vector-enhancements-1 facility)? Without the facility, use of these intrinsics will cause an internal error in LLVM - it would be preferable to either restrict use of the high-level intrinsics as well, or else emulate them.

RalfJung · 2026-03-10T17:16:23Z

The code already uses a bunch of llvm.s390.* intrinsics, apparently without any special care for the instruction set version. Some of those have their tests guarded by the "enhancements-1" check as well, e.g. nearbyint_v4f32. Are min/max somehow different here?

What I would have expected is that these operations have target_feature requirements which ensure that the HW instruction exists. Otherwise, there'll be some extra runtime overhead, won't there? But that does not seem to be the case, only the tests are gated on the "enhancements-1" feature.

folkertdev · 2026-03-10T17:25:45Z

The macros are inherited from the powerpc implementation. Ultimately I think we should move more towards a generator-based approach like the aarch64 backend. Also because right now it's really hard to see whether a particular combination of arguments is actually implemented. It would be nice if we could generate this somehow from e.g. C headers (the hexagon backend recently started doing this).

The design of s390x vector functions is different from e.g. x86 in that the functions in C are polymorphic. There is one vec_min that works on vector int, vector float, vector double, ect. We implement this using traits, and we can't guard a specific type/implementation with a target feature.

RalfJung · 2026-03-10T17:40:24Z

The design of s390x vector functions is different from e.g. x86 in that the functions in C are polymorphic. There is one vec_min that works on vector int, vector float, vector double, ect. We implement this using traits, and we can't guard a specific type/implementation with a target feature.

Okay... what does that mean for worrying about vector-enhancements-1 and the fallback emulation?

folkertdev · 2026-03-10T17:47:32Z

The only option I see is to cfg on whether the target feature is present at build time, and pick the fallback if not. We're in core, so we could not even perform runtime detection if we wanted to.

From what i understand it is quite common in practice to compile s390x for a particular CPU, but anything compiled with the baseline would use the fallback.

RalfJung · 2026-03-10T17:54:41Z

Is that something we have to do by hand? None of the existing intrinsics seem to do anything like it.

folkertdev · 2026-03-10T18:10:13Z

In most cases we use the intrinsics::simd which should automatically provide a fallback. Similarly nearbyint and roundeven are portable llvm intrinsics. So this is the first time we can't make it LLVM's problem I think.

RalfJung · 2026-03-10T21:46:06Z

Ah I didn't realize nearbyint_v4f32 is not an llvm.s390 import.

So what is the fallback strategy? I could make it something like

if cfg!(target_feature = "vector-enhancements-1") {
  vfminsb(a, b, const { 0 })
} else {
  // fallback according to docs: !(b<a)?a:b
  simd_select(simd_not(simd_lt(b, a)), a, b)
}

but that cfg! is evaluated when the standard library is built, so the sysroot we distribute would not even contain the vfminsb path. I don't think we have a way of selecting something based on the target features that are available at codegen time?

I don't think we can do the fallback by hand. We could just set #[target_feature(enable = "vector-enhancements-1")], making it UB to invoke that particular overload without the given target feature...

folkertdev · 2026-03-10T22:02:40Z

hmm, right. I don't think we can do better than that unfortunately. With the others inlining helps us out so that we get codegen with the features of the surrounding function. Maybe we can add an intrinsic that explicitly has the fallback implementation on older hardware to LLVM?

RalfJung · 2026-03-10T22:11:48Z

An s390x-specific intrinsic? No other targets needs that, is this target truly so cursed?

IMO this should be handled on the LLVM side -- if the intent is to provide the intrinsic also on z13, then llvm.s390.blah should work on z13.

folkertdev · 2026-03-10T22:15:28Z

Ah, sorry, I meant "we" as in the people in this thread, specifically @uweigand as (also) a s390x LLVM maintainer.

So yes my suggestion is also to add this to LLVM, either by making the existing LLVM intrinsic have a fallback implementation, or implementing a variant that has the fallback.

RalfJung · 2026-03-10T22:20:13Z

I looked into what is being tested here and I am confused. Should the assert_instr tests show up in the CI logs somewhere? I can't see any trace of them.

folkertdev · 2026-03-10T22:27:40Z

Not obvious, but the assembly tests only run in the "release" CI jobs, not the "dev" ones (those only run the manual correctness tests). It does kind of make sense in that the assembly would not look as expected in "dev" mode, but we've had bugs in the past where the implementation was actually incorrect in dev mode.

anyway, the test does run with the "release" job:

https://triage.rust-lang.org/gha-logs/rust-lang/stdarch/66535751792?pr=2058#L2026-03-10T21:50:32.0189142Z

RalfJung · 2026-03-10T22:37:07Z

I had to add manual tests to trigger this:

rustc-LLVM ERROR: Cannot select: intrinsic %llvm.s390.vfmaxsb
error: could not compile `core_arch` (lib test)

RalfJung · 2026-03-11T07:58:53Z

Due to the fallback issue, we can't actually use the llvm.s390.* intrinsics. That means we can't fix the wrong semantics. I opened #2060 to track that.

However, we can still avoid the use of Rust's simd_fmin/fmax intrinsics and directly invoke llvm.minnum. IMO that makes sense since it lets us adjust simd_fmin/fmax to be consistent with the scalar version.

RalfJung · 2026-03-11T07:59:59Z

r? @folkertdev

jieyouxu · 2026-03-11T08:48:13Z

(Trying)
r? folkertdev

folkertdev

The best we can do for now. @uweigand do you have thoughts on how to fix this in LLVM?

View changes since this review

uweigand · 2026-03-11T14:09:44Z

The best we can do for now. @uweigand do you have thoughts on how to fix this in LLVM?

Replied in the linked issue to consolidate discussion there.

RalfJung · 2026-03-11T16:33:35Z

crates/core_arch/src/s390x/vector.rs

        [0, !0, !0, !0]
    }

+    // f32 is the tricky case for max/min as that needs a fallback on z13


@uweigand so... this comment is wrong then?

Ah yes. On z13 the f32 case would cause a compile-time error in the C version. The tricky case (available but emulated) is f64 on z13.

rustbot assigned sayantn Mar 10, 2026

RalfJung force-pushed the s390x-minmax branch 3 times, most recently from 52be653 to 41871cf Compare March 10, 2026 16:52

s390x: use llvm.s390 intrinsics instead of simd_fmin/fmax

6efd06e

RalfJung force-pushed the s390x-minmax branch from 41871cf to 6efd06e Compare March 10, 2026 21:47

add f32 min/max tests

bfc3662

RalfJung force-pushed the s390x-minmax branch from 73fb0ec to bfc3662 Compare March 10, 2026 22:29

RalfJung mentioned this pull request Mar 11, 2026

s390x: add f32 min/max tests #2059

Closed

RalfJung changed the title ~~s390x: use llvm.s390 intrinsics instead of simd_fmin/fmax~~ s390x: use llvm intrinsics instead of simd_fmin/fmax Mar 11, 2026

RalfJung force-pushed the s390x-minmax branch from abdd350 to 01026d8 Compare March 11, 2026 07:56

RalfJung force-pushed the s390x-minmax branch from 01026d8 to c95fe96 Compare March 11, 2026 07:59

This comment was marked as resolved.

Sign in to view

go back to portable LLVM intrinsic to avoid fallback trouble

454668b

RalfJung force-pushed the s390x-minmax branch from c95fe96 to 454668b Compare March 11, 2026 08:05

rustbot assigned folkertdev and unassigned sayantn Mar 11, 2026

folkertdev approved these changes Mar 11, 2026

View reviewed changes

folkertdev added this pull request to the merge queue Mar 11, 2026

Merged via the queue into rust-lang:main with commit 98e033d Mar 11, 2026
77 checks passed

RalfJung commented Mar 11, 2026

View reviewed changes

RalfJung deleted the s390x-minmax branch March 11, 2026 17:31

RalfJung mentioned this pull request Mar 11, 2026

s390x: add f64 tests for vec_min #2061

Merged

Conversation

RalfJung commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Mar 10, 2026

Uh oh!

uweigand commented Mar 10, 2026

Uh oh!

RalfJung commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

folkertdev commented Mar 10, 2026

Uh oh!

RalfJung commented Mar 10, 2026

Uh oh!

folkertdev commented Mar 10, 2026

Uh oh!

RalfJung commented Mar 10, 2026

Uh oh!

folkertdev commented Mar 10, 2026

Uh oh!

RalfJung commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

folkertdev commented Mar 10, 2026

Uh oh!

RalfJung commented Mar 10, 2026

Uh oh!

folkertdev commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Mar 10, 2026

Uh oh!

folkertdev commented Mar 10, 2026

Uh oh!

RalfJung commented Mar 10, 2026

Uh oh!

RalfJung commented Mar 11, 2026

Uh oh!

RalfJung commented Mar 11, 2026

Uh oh!

This comment was marked as resolved.

jieyouxu commented Mar 11, 2026

Uh oh!

folkertdev left a comment • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

uweigand commented Mar 11, 2026

Uh oh!

RalfJung Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

uweigand Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

RalfJung commented Mar 10, 2026 •

edited

Loading

RalfJung commented Mar 10, 2026 •

edited

Loading

RalfJung commented Mar 10, 2026 •

edited

Loading

folkertdev commented Mar 10, 2026 •

edited

Loading

folkertdev left a comment •

edited by rustbot

Loading