Skip to content

gensyms: Don't use PROVIDE() in linker script#39

Closed
aszlig wants to merge 1 commit into
masterfrom
ldscript-fix
Closed

gensyms: Don't use PROVIDE() in linker script#39
aszlig wants to merge 1 commit into
masterfrom
ldscript-fix

Conversation

@aszlig
Copy link
Copy Markdown
Contributor

@aszlig aszlig commented May 18, 2026

A while ago we had a report about integration test failure/timeout on aarch64 on OpenSUSE, but we didn't have that failure on Nix(OS).

Recently, this has also turned up on x86_64-linux and @mweinelt has bisected this issue to a recent binutils update in nixpkgs. I decided to go from there and bisected binutils itself, which led to a commit that led to that failure:

x86-64: Estimate output section layout before sizing dynamic sections

When sizing dynamic sections, elf_x86_64_scan_relocs converts GOTPCREL relocations to R_X86_64_PC32, R_X86_64_32S or R_X86_64_32 for local symbols. But at that time, since the output section layout is unknown, the local symbol values can't be determined. Later linker issues an error if the converted relocation overflows when resolving relocations against these local symbols. Update the x86-64 ELF linker to estimate output section layout before sizing dynamic sections and use the preliminary output section layout info to skip the GOTPCREL relocation conversion if the converted relocation overflows.

Since the output section layout is now computed before bfd_elf_size_dynamic_sections, my guess is that the PROVIDE directives we're using are now turned into no-ops. From the documentation:

In some cases, it is desirable for a linker script to define a symbol only if it is referenced and is not defined by any object included in the link. For example, traditional linkers defined the symbol etext. However, ANSI C requires that the user be able to use etext as a function name without encountering an error. The PROVIDE keyword may be used to define a symbol, such as etext, only if it is referenced but not defined.

We only reference the symbol via version script, so my guess is that this now no longer counts as a reference since we only really reference the symbols via dlsym().

I also looked at recent Hydra builds of ip2unix and the integration test failure on OpenSUSE might also be the one we're hitting since mid-2025.

What I'm not sure however is why I used PROVIDE in the first place instead of just an assignment, especially after I now read through the full documentation. I guess back then I only skimmed the documentation and made assumptions on what PROVIDE does.

So while I can't 100% confirm that my reasoning above is really what happened, I'm pretty sure that we shouldn't use PROVIDE here. Given that 100% confirming the former would require lots of research, I'm stopping here since in any case, direct assignment should fix the issue.

Fixes: #37

A while ago we had a report[1] about integration test failure/timeout on
aarch64 on OpenSUSE, but we didn't have that failure on Nix(OS).

Recently, this has also turned up on x86_64-linux and @mweinelt has
bisected this issue to a recent binutils update in nixpkgs[2]. I decided
to go from there and bisected binutils itself, which led to a commit[3]
that led to that failure:

  x86-64: Estimate output section layout before sizing dynamic sections

  When sizing dynamic sections, elf_x86_64_scan_relocs converts GOTPCREL
  relocations to R_X86_64_PC32, R_X86_64_32S or R_X86_64_32 for local
  symbols.  But at that time, since the output section layout is
  unknown, the local symbol values can't be determined.  Later linker
  issues an error if the converted relocation overflows when resolving
  relocations against these local symbols.  Update the x86-64 ELF linker
  to estimate output section layout before sizing dynamic sections and
  use the preliminary output section layout info to skip the GOTPCREL
  relocation conversion if the converted relocation overflows.

Since the output section layout is now computed before
bfd_elf_size_dynamic_sections, my guess is that the PROVIDE directives
we're using are now turned into no-ops. From the documentation[4]:

  In some cases, it is desirable for a linker script to define a symbol
  only if it is referenced and is not defined by any object included in
  the link. For example, traditional linkers defined the symbol ‘etext’.
  However, ANSI C requires that the user be able to use ‘etext’ as a
  function name without encountering an error. The PROVIDE keyword may
  be used to define a symbol, such as ‘etext’, only if it is referenced
  but not defined.

We only reference the symbol via version script, so my guess is that
this now no longer counts as a reference since we only really reference
the symbols via dlsym().

I also looked at recent Hydra builds of ip2unix and the integration test
failure on OpenSUSE might also be the one we're hitting since mid-2025.

What I'm not sure however is why I used PROVIDE in the first place
instead of just an assignment, especially after I now read through the
full documentation. I guess back then I only skimmed the documentation
and made assumptions on what PROVIDE does.

So while I can't 100% confirm that my reasoning above is really what
happened, I'm pretty sure that we shouldn't use PROVIDE here. Given that
100% confirming the former would require lots of research, I'm stopping
here since in any case, direct assignment should fix the issue.

[1]: #37
[2]: NixOS/nixpkgs@e2a6058
[3]: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=73ab3b9825d232f0f3a4ad811e88697f9b9ab162
[4]: https://sourceware.org/binutils/docs/ld/PROVIDE.html

Signed-off-by: aszlig <aszlig@nix.build>
@aszlig aszlig closed this in a6a9640 May 18, 2026
@aszlig aszlig deleted the ldscript-fix branch May 19, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration test timeout on aarch64

1 participant