Skip to content

codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer.#155343

Merged
rust-bors[bot] merged 2 commits intorust-lang:mainfrom
dianqk:indirect-by-ref
Apr 22, 2026
Merged

codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer.#155343
rust-bors[bot] merged 2 commits intorust-lang:mainfrom
dianqk:indirect-by-ref

Conversation

@dianqk
Copy link
Copy Markdown
Member

@dianqk dianqk commented Apr 15, 2026

View all comments

Fixes #155241.

When a value is passed via an indirect pointer, the value needs to be copied to a new alloca. For x86_64-unknown-linux-gnu, Thing is the case:

#[derive(Clone, Copy)]
struct Thing(usize, usize, usize);

pub fn foo() {
    let thing = Thing(0, 0, 0);
    bar(thing);
    assert_eq!(thing.0, 0);
}

#[inline(never)]
#[unsafe(no_mangle)]
pub fn bar(mut thing: Thing) {
    thing.0 = 1;
}

Before passing the thing to the bar function, the thing needs to be copied to an alloca that is passed to bar.

%0 = alloca [24 x i8], align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 8 %thing, i64 24, i1 false)
call void @bar(ptr %0)

This patch applies the rule to the untupled arguments as well.

#![feature(fn_traits)]

#[derive(Clone, Copy)]
struct Thing(usize, usize, usize);

#[inline(never)]
#[unsafe(no_mangle)]
pub fn foo() {
    let thing = (Thing(0, 0, 0),);
    (|mut thing: Thing| {
        thing.0 = 1;
    }).call(thing);
    assert_eq!(thing.0.0, 0);
}

For this case, this patch changes from

; call example::foo::{closure#0}
call void @_RNCNvCs15qdZVLwHPA_7example3foo0B3_(ptr ..., ptr %thing)

to

%0 = alloca [24 x i8], align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 8 %thing, i64 24, i1 false)
; call example::foo::{closure#0}
call void @_RNCNvCs15qdZVLwHPA_7example3foo0B3_(ptr ..., ptr %0)

However, the same rule cannot be applied to tail calls that would be unsound, because the caller's stack frame is overwritten by the callee's stack frame. Fortunately, #151143 has already handled the special case. We must not copy again.

No copy is needed for by-move arguments, because the argument is passed to the called "in-place".

No copy is also needed for by-val arguments, because the attribute implies that a hidden copy of the pointee is made between the caller and the callee.

NOTE: The patch has a trick for tail calls that we pass by-move. We can choose to copy an alloca even for by-move arguments, but tail calls require MUST-by-move.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 15, 2026
self.codegen_argument(
bx,
op,
// Pass by-move for an explicit tail call, which has been handled above as well.
Copy link
Copy Markdown
Member Author

@dianqk dianqk Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit awkward; perhaps the tail call handling can be moved into codegen_argument as well.

View changes since the review

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue has nothing to do with tail calls, does it...?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. #151143 works well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this PR is changing something about tail calls...?

Maybe it's just the comment that is confusing me. Why is it okay to always to by-move for tail call arguments? Cc @folkertdev

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because #151143 has done a similar thing, we don't need to copy it again. IIUC, tail call is special; the same arguments on the caller must be passed to the callee. Extra copy is UB.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#151143 says something about

Therefore we store the argument for the callee in the corresponding caller's slot.

So I guess there is code elsewhere establishing an invariant that the argument here is the equivalent slot and we can pass a pointer to it. The comment should explain that and point at that code.

// CHECK: call void @test_simd(<4 x i32> <i32 2, i32 4, i32 6, i32 8>
test_simd(const { Simd::<i32, 4>([2, 4, 6, 8]) });

// CHECK: call void @test_simd_unaligned(%"minisimd::PackedSimd<i32, 3>" %1
Copy link
Copy Markdown
Member Author

@dianqk dianqk Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A redundant memcpy is skipped here.

View changes since the review

@dianqk dianqk force-pushed the indirect-by-ref branch 2 times, most recently from 5123815 to 9ec6be7 Compare April 16, 2026 13:22
@dianqk dianqk marked this pull request as ready for review April 16, 2026 13:24
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 16, 2026

RalfJung is not on the review rotation at the moment.
They may take a while to respond.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 16, 2026
@dianqk dianqk changed the title [DRAFT] codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer. codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer. Apr 16, 2026
self.codegen_argument(
bx,
location,
false,
Copy link
Copy Markdown
Member

@RalfJung RalfJung Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
false,
/* by_move */ false,

View changes since the review

@RalfJung
Copy link
Copy Markdown
Member

This makes sense to me at a high level, but I don't know the surrounding code so reviewing the exact details of the conditionals introduced here would take me more time than I can spare right now -- sorry. I hope we can find another reviewer.

maybe
r? @nikic

@rustbot rustbot assigned nikic and unassigned RalfJung Apr 20, 2026
@nikic
Copy link
Copy Markdown
Contributor

nikic commented Apr 20, 2026

The PR description could use more words...

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Apr 20, 2026
codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer.
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 20, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Apr 20, 2026

☀️ Try build successful (CI)
Build commit: 6c71f51 (6c71f515200b0bbf6b3f6336c134d6d7ebe97f72, parent: c28e3037785af39226f5751294ed1c6cf4698e10)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (6c71f51): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This perf run didn't have relevant results for this metric.

Max RSS (memory usage)

Results (secondary -2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.0% [-2.0%, -2.0%] 1
All ❌✅ (primary) - - 0

Cycles

Results (primary 2.5%, secondary 2.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.5% [2.2%, 2.8%] 3
Regressions ❌
(secondary)
2.6% [2.6%, 2.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.5% [2.2%, 2.8%] 3

Binary size

Results (secondary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Bootstrap: 491.853s -> 492.621s (0.16%)
Artifact size: 394.39 MiB -> 394.38 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 21, 2026
@dianqk
Copy link
Copy Markdown
Member Author

dianqk commented Apr 21, 2026

The PR description could use more words...

Done. I also fixed the comment about tail calls.

//! Arguments passed indirectly via a hidden pointer must be copied to an alloca,
//! except for by-val or by-move.
//@ compile-flags: -Cno-prepopulate-passes -Copt-level=3
//@ only-x86_64-unknown-linux-gnu
Copy link
Copy Markdown
Contributor

@nikic nikic Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made a minicore test instead?

View changes since the review

// temporaries, then copy back to the caller's argument slots.
// Finally, we pass the caller's argument slots as arguments.
//
// To do that, the argument must be MUST-by-move value.
Copy link
Copy Markdown
Contributor

@nikic nikic Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "MUST-by-move" mean, as opposed to just by-move?

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean codegen allows copying for by-move, but not for MUST-by-move.

@nikic
Copy link
Copy Markdown
Contributor

nikic commented Apr 22, 2026

@bors r+

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Apr 22, 2026

📌 Commit 10d8329 has been approved by nikic

It is now in the queue for this repository.

@rust-bors rust-bors Bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 22, 2026
@rust-bors

This comment has been minimized.

@rust-bors rust-bors Bot added merged-by-bors This PR was explicitly merged by bors. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 22, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Apr 22, 2026

☀️ Test successful - CI
Approved by: nikic
Duration: 3h 28m 21s
Pushing f676c20 to main...

@rust-bors rust-bors Bot merged commit f676c20 into rust-lang:main Apr 22, 2026
12 checks passed
@rustbot rustbot added this to the 1.97.0 milestone Apr 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing cf1817b (parent) -> f676c20 (this PR)

Test differences

Show 14 test diffs

Stage 1

  • [codegen] tests/codegen-llvm/indirect-bycopy-bymove-byval.rs#i686-linux: [missing] -> pass (J2)
  • [codegen] tests/codegen-llvm/indirect-bycopy-bymove-byval.rs#i686-windows: [missing] -> pass (J2)
  • [codegen] tests/codegen-llvm/indirect-bycopy-bymove-byval.rs#x64-linux: [missing] -> pass (J2)
  • [ui] tests/ui/moves/bycopy_untupled.rs#noopt: [missing] -> pass (J3)
  • [ui] tests/ui/moves/bycopy_untupled.rs#opt: [missing] -> pass (J3)

Stage 2

  • [codegen] tests/codegen-llvm/indirect-bycopy-bymove-byval.rs#i686-linux: [missing] -> pass (J0)
  • [codegen] tests/codegen-llvm/indirect-bycopy-bymove-byval.rs#i686-windows: [missing] -> pass (J0)
  • [codegen] tests/codegen-llvm/indirect-bycopy-bymove-byval.rs#x64-linux: [missing] -> pass (J0)
  • [ui] tests/ui/moves/bycopy_untupled.rs#noopt: [missing] -> pass (J1)
  • [ui] tests/ui/moves/bycopy_untupled.rs#opt: [missing] -> pass (J1)

Additionally, 4 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard f676c20edd32321e9fa2781543d8970109707e30 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-linux: 2h 41m -> 1h 43m (-35.9%)
  2. dist-x86_64-msvc-alt: 2h 8m -> 2h 49m (+31.7%)
  3. test-various: 2h 10m -> 1h 29m (-31.6%)
  4. dist-i686-mingw: 2h 10m -> 2h 47m (+28.2%)
  5. i686-msvc-2: 2h 16m -> 1h 45m (-22.3%)
  6. i686-msvc-1: 2h 22m -> 2h 54m (+22.2%)
  7. x86_64-gnu-llvm-21-3: 1h 31m -> 1h 50m (+21.7%)
  8. x86_64-mingw-2: 2h 39m -> 2h 6m (-21.0%)
  9. pr-check-2: 33m 35s -> 40m 28s (+20.5%)
  10. x86_64-gnu-llvm-21: 1h -> 1h 12m (+19.8%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (f676c20): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This perf run didn't have relevant results for this metric.

Max RSS (memory usage)

This perf run didn't have relevant results for this metric.

Cycles

Results (primary 1.9%, secondary 2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.9% [1.7%, 2.2%] 2
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.9% [1.7%, 2.2%] 2

Binary size

This perf run didn't have relevant results for this metric.

Bootstrap: 491.458s -> 492.288s (0.17%)
Artifact size: 394.40 MiB -> 394.44 MiB (0.01%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

release-mode heap corruption with non-generic FnOnce(Vec<usize>)

5 participants