Conversation
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen4 |
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen4 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen4 |
|
New job on instance
|
Errors are quite similar to the ones observed in #1314, many of these: |
|
Updated hooks file with a fix for PyTorch has been ingested (EESSI/software-layer-scripts#172), let's try again. bot: build repo:eessi.io-2025.06-software instance:eessi-bot-aws-eu-south for:arch=x86_64/amd/zen5 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
The neoverse v1 build ran out of memory: |
|
I've modified bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=aarch64/neoverse_v1 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=aarch64/generic |
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=aarch64/neoverse_v1 |
|
New job on instance
|
|
No more memory issues for the neoverse v1 build, but too many failing tests: |
@Flamefire do you perhaps have any clue why these are failing on Neoverse V1? (could send the full log to you if that's useful) |
|
Meanwhile, let's also check how it goes on generic and n1: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=aarch64/generic |
|
New job on instance
|
|
New job on instance
|
@bedroge Yes, tar it up and I'll take a look.
Those were tricky. Maybe I recognize the issue from something I'd seen before |
Thanks a lot! I'm attaching the log. |
|
Ok thanks, I did check what's going on:
Known, can be ignored
Small tolerance issue
Known issue on ARM 3/5 failures now skipped/xfailed upstream
Fails when MKLDNN missing
Caused by OpenBLAS: Fixed since openblas has been upgraded to 0.3.30
Weird timing issue. I can open a PR for EasyBuild to skip affected tests in test_cpu_repro & inductor/test_cpu_select_algorithm & test_linalg That brings down the 73 failures to 7 which would work. |
|
Added patches to my still open PR that fixes some other issues in that easyconfig: easybuilders/easybuild-easyconfigs#25492 Test report coming up, I hope all is still green |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=aarch64/neoverse_v1 |
|
New job on instance
|
|
@bedroge Looks like we have a winner |
Awesome, thanks a lot @Flamefire! |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx |
|
New job on instance
|
|
New job on instance
|
|
The a64fx build also ran out of memory, trying again with an updated hooks file... bot: build repo:eessi.io-2025.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx |
|
New job on instance
|
No description provided.