drop support for compute capability <= 7.0 for newer cuDNN versions by bedroge · Pull Request #170 · EESSI/software-layer-scripts

bedroge · 2026-02-27T13:12:34Z

This one is a little bit more tricky as CUDA itself, as the list of supported compute capabilities in the docs (https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/reference/support-matrix.html) don't really match what running cuobjdump on the binaries shows. Also, there seem to be some gaps in the matrix, and I wonder if that's really correct.

So for now I've chosen an easier approach by just checking if we're building with a newer cuDNN and compute capability <= 7.0, and in that case I do the same thing as what @casparvl implemented for CUDA. In order to check if cuDNN is used as dependency, I've generalized Caspar's get_cuda_version into a get_dependency_software_version function.

Tested this locally with EESSI-extend and the cuDNN from EESSI/software-layer#1410 on a V100 (CC 7.0) and RTX PRO 6000 (CC 12.0f), and got the expected result: on the RTX PRO 6000 I get a full cuDNN installation, while for the V100 I get the following output during the build:

WARNING: Requested a CUDA Compute Capability (['7.0']) that is not supported by the cuDNN version (9.15.0.57) used by this software. Switching to 
'--module-only --force' and injectiong an LmodError into the modulefile. You can override this behaviour by setting the 
EESSI_OVERRIDE_CUDA_CC_CUDNN_CHECK environment variable.

and a module file that has:

if (not os.getenv("EESSI_IGNORE_CUDNN_9_15_0_57_CC_7_0")) then LmodError("EasyConfigs using cuDNN 9.15.0.57 or older are not supported for (all) requested Compute Capabilities: ['7.0'].\n") end

bedroge · 2026-02-27T13:15:08Z

Ultimately we could make the same kind of lookup table as for CUDA. Initially I started working on it:

# The documentation at e.g. https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/reference/support-matrix.html and
# what cuobjdump showns on cuDNN libraries does not fully match. The support matrix below may be too inclusive,
# so if you find that a specific combination is not supported in practice, please remove it from the matrix.
CUDNN_SUPPORTED_CCS = {
    '8.8.0': [],
    '9.15.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.15.1': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.16.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.17.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.17.1': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.18.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.18.1': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.19.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
}

but it's a lot of work, and as mentioned, it's not really clear what is supported and what is not. We could also consider an more simple lookup table with just the min+max supported CCs per X.YZ version? But then again, https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/reference/support-matrix.html says that 12.1 is not supported, the binaries do seem to indicate that it's supported, so it's very confusing and unclear...

eb_hooks.py

casparvl · 2026-02-27T17:49:28Z

My 2 cents:

Go for a lookup table. If you only specify a min and max version, the implicit assumption is that all intermediate versions are supported - which does not seem to be the case (i.e. 11.X almost certainly isn't, since that's not supported in CUDA 12 - see the CUDA lookup table)
If you create a lookup table, and if the docs contradict what the binaries show, assume the binaries to be correct. If the binaries say there is no X.Y support, there is no X.Y code in the binary - so there can't be support. If the binary says there is X.Y code in the binary, that might not be a hard guarantee that the full cuDNN API is supported for that architecture - but the only way to find out is to assume the support is there, install it, and see how this works in practice. If we skip installations for targets that do turn out to be supported, we'd never find out otherwise.

bedroge · 2026-02-27T18:49:07Z

I just feel like a lookup table is a lot of work to set up and to maintain, while (according to the docs) the supported CCs don't change that often. Also, wouldn't the sanity check still catch unsupported CCs, as it did for CC 7.0 in EESSI/software-layer#1410? So whenever we run into this, we can mark those as unsupported in the hooks (and if necessary, change the if statement to something else if there are going to be too many combinations)?

casparvl · 2026-03-09T16:34:11Z

Hm, I don't think it's too bad to maintain - but admittedly it may be easier for CUDA than for cuDNN since we can just query the list from nvcc. Looking at your PR again, it should correctly generate fake modules for cuDNN's that are too new to support CC 7.0.

The fact that it doesn't do so for CC 11.0 may be a minor detail, since the CUDA sanity check will then indeed report that this is also invalid. The only downside of not including that case (and maybe also an upper limit) right away is that when sites install this with EESSI-extend and have 11.0 configured as their CC, they'll hit the CUDA sanity check - and may not fully understand why it fails (while the error message printed by the module is much more informative, as it is more specific).

Anyway, I'm also ok in leaving that out for now. If you can have a look at my (minor) review comment, I'll see if I can test the PR locally - and merge it if it works as expected.

…udnn915_cc70

casparvl

Testing baesd on this feature branch:

[casparl@tcn471 software-layer-scripts]$ eb --hooks eb_hooks.py cuDNN-9.15.0.57-CUDA-12.9.1.eb --accept-eula-for=cuDNN
...
== Running pre-fetch hook...

WARNING: Requested a CUDA Compute Capability (['7.0']) that is not supported by the cuDNN version (9.15.0.57) used by this software. Switching to '--module-only --force' and injectiong an LmodError into
the modulefile. You can override this behaviour by setting the EESSI_OVERRIDE_CUDA_CC_CUDNN_CHECK environment variable.

== Updated build option 'module-only' to 'True'
== Updated build option 'force' to 'True'
...
== Setting EESSI_IGNORE_CUDNN_9_15_0_57_CC_7_0 in initial environment
  >> generating module file @ /home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/modules/all/cuDNN/9.15.0.57-CUDA-12.9.1.lua
== Running post-module hook...
== Restored original build option 'module_only' to False
== Restored original build option 'force' to False
== Removing EESSI_IGNORE_CUDNN_9_15_0_57_CC_7_0 in initial environment
...
== Summary:
   * [SUCCESS] cuDNN/9.15.0.57-CUDA-12.9.1

That looks good.

So does this:

$ module load cuDNN/9.15.0.57-CUDA-12.9.1
Lmod has detected the following error:  EasyConfigs using cuDNN 9.15.0.57 or newer are not supported for (all) requested Compute Capabilities: ['7.0'].

While processing the following module(s):
    Module fullname              Module Filename
    ---------------              ---------------
    cuDNN/9.15.0.57-CUDA-12.9.1  /home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/modules/all/cuDNN/9.15.0.57-CUDA-12.9.1.lua

It also allows suppressing the module error, as intended:

$ EESSI_IGNORE_CUDNN_9_15_0_57_CC_7_0=1 module load cuDNN/9.15.0.57-CUDA-12.9.1
[casparl@tcn471 software-layer-scripts]$

And finally, running with

$  EESSI_OVERRIDE_CUDA_CC_CUDNN_CHECK=1 eb --hooks eb_hooks.py cuDNN-9.15.0.57-CUDA-12.9.1.eb --accept-eula-for=cuDNN --rebuild

We can indeed surpress the check and do a full install (I won't paste output here, we all know what a succesfull EB installation looks like).

LGTM!

casparvl · 2026-03-10T09:47:03Z

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2

eessi-bot-aws · 2026-03-10T09:47:11Z

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /project/def-users/SHARED/jobs/2026.03/pr_170/138001

date	job status	comment
Mar 10 09:47:10 UTC 2026	submitted	job id `138001` awaits release by job manager
Mar 10 09:47:33 UTC 2026	released	job awaits launch by Slurm scheduler
Mar 10 11:26:04 UTC 2026	running	job `138001` is running
Mar 10 11:27:23 UTC 2026	finished	😁 SUCCESS (click triangle for details) Details ✅ job output file `slurm-138001.out` ✅ no message matching `FATAL:` ✅ no message matching `ERROR:` ✅ no message matching `FAILED:` ✅ no message matching `required modules missing:` ✅ found message(s) matching `No missing installations` ✅ found message matching `.tar.* created!` Artefacts `eessi-2025.06-software-linux-x86_64-amd-zen2-17731419420.tar.zst` size: 0 MiB (26445 bytes) entries: 1 modules under 2025.06/software/linux/x86_64/amd/zen2/modules/all no module files in tarball software under 2025.06/software/linux/x86_64/amd/zen2/software no software packages in tarball reprod directories under 2025.06/software/linux/x86_64/amd/zen2/reprod no reprod directories in tarball other under 2025.06/software/linux/x86_64/amd/zen2 `2025.06/init/easybuild/eb_hooks.py`
Mar 10 11:27:23 UTC 2026	test result	😁 SUCCESS (click triangle for details) ReFrame Summary [ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen2+default P: latency: 1.31 us (r:0, l:None, u:None) [ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen2+default P: latency: 2.04 us (r:0, l:None, u:None) [ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen2+default P: latency: 0.17 us (r:0, l:None, u:None) [ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen2+default P: bandwidth: 8003.59 MB/s (r:0, l:None, u:None) [ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted) Details ✅ job output file `slurm-138001.out` ✅ no message matching `ERROR:` ✅ no message matching `[\sFAILED\s].Ran . test case`
Mar 10 12:55:12 UTC 2026	uploaded	transfer of `eessi-2025.06-software-linux-x86_64-amd-zen2-17731419420.tar.zst` to S3 bucket succeeded

casparvl · 2026-03-10T09:47:12Z

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2

eessi-bot-aws · 2026-03-10T09:47:19Z

New job on instance eessi-bot-mc-aws for repository eessi.io-2023.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /project/def-users/SHARED/jobs/2026.03/pr_170/138002

date	job status	comment
Mar 10 09:47:18 UTC 2026	submitted	job id `138002` awaits release by job manager
Mar 10 09:47:31 UTC 2026	released	job awaits launch by Slurm scheduler
Mar 10 11:26:02 UTC 2026	running	job `138002` is running
Mar 10 11:29:49 UTC 2026	finished	😁 SUCCESS (click triangle for details) Details ✅ job output file `slurm-138002.out` ✅ no message matching `FATAL:` ✅ no message matching `ERROR:` ✅ no message matching `FAILED:` ✅ no message matching `required modules missing:` ✅ found message(s) matching `No missing installations` ✅ found message matching `.tar.* created!` Artefacts `eessi-2023.06-software-linux-x86_64-amd-zen2-17731419390.tar.zst` size: 0 MiB (26440 bytes) entries: 1 modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all no module files in tarball software under 2023.06/software/linux/x86_64/amd/zen2/software no software packages in tarball reprod directories under 2023.06/software/linux/x86_64/amd/zen2/reprod no reprod directories in tarball other under 2023.06/software/linux/x86_64/amd/zen2 `2023.06/init/easybuild/eb_hooks.py`
Mar 10 11:29:49 UTC 2026	test result	😁 SUCCESS (click triangle for details) ReFrame Summary [ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86-64-zen2+default P: perf: 265.926 timesteps/s (r:0, l:None, u:None) [ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86-64-zen2+default P: perf: 450.596 timesteps/s (r:0, l:None, u:None) [ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86-64-zen2+default P: latency: 2.81 us (r:0, l:None, u:None) [ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86-64-zen2+default P: latency: 2.94 us (r:0, l:None, u:None) [ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86-64-zen2+default P: latency: 6.03 us (r:0, l:None, u:None) [ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86-64-zen2+default P: latency: 5.74 us (r:0, l:None, u:None) [ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86-64-zen2+default P: latency: 0.77 us (r:0, l:None, u:None) [ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86-64-zen2+default P: latency: 0.73 us (r:0, l:None, u:None) [ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86-64-zen2+default P: bandwidth: 6473.16 MB/s (r:0, l:None, u:None) [ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86-64-zen2+default P: bandwidth: 6463.41 MB/s (r:0, l:None, u:None) [ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted) Details ✅ job output file `slurm-138002.out` ✅ no message matching `ERROR:` ✅ no message matching `[\sFAILED\s].Ran . test case`
Mar 10 12:55:21 UTC 2026	uploaded	transfer of `eessi-2023.06-software-linux-x86_64-amd-zen2-17731419390.tar.zst` to S3 bucket succeeded

add check for new cuDNN versions and older compute capabilities

b26e360

casparvl reviewed Feb 27, 2026

View reviewed changes

eb_hooks.py Outdated Show resolved Hide resolved

bedroge added 2 commits March 10, 2026 10:15

older -> newer

9fbdb3f

Merge branch 'main' of github.com:EESSI/software-layer-scripts into c…

598dada

…udnn915_cc70

casparvl approved these changes Mar 10, 2026

View reviewed changes

casparvl added bot:deploy 2025.06-software.eessi.io 2025.06 version of software.eessi.io labels Mar 10, 2026

bedroge mentioned this pull request Mar 10, 2026

Add hook for TensorFlow v2.18.1 #177

Open

casparvl merged commit 2eb7da3 into EESSI:main Mar 10, 2026
78 of 83 checks passed

bedroge deleted the cudnn915_cc70 branch March 10, 2026 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop support for compute capability <= 7.0 for newer cuDNN versions#170

drop support for compute capability <= 7.0 for newer cuDNN versions#170
casparvl merged 3 commits intoEESSI:mainfrom
bedroge:cudnn915_cc70

bedroge commented Feb 27, 2026

Uh oh!

bedroge commented Feb 27, 2026

Uh oh!

Uh oh!

casparvl commented Feb 27, 2026

Uh oh!

bedroge commented Feb 27, 2026

Uh oh!

casparvl commented Mar 9, 2026

Uh oh!

casparvl left a comment

Uh oh!

casparvl commented Mar 10, 2026

Uh oh!

eessi-bot-aws bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

casparvl commented Mar 10, 2026

Uh oh!

eessi-bot-aws bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bedroge commented Feb 27, 2026

Uh oh!

bedroge commented Feb 27, 2026

Uh oh!

Uh oh!

casparvl commented Feb 27, 2026

Uh oh!

bedroge commented Feb 27, 2026

Uh oh!

casparvl commented Mar 9, 2026

Uh oh!

casparvl left a comment

Choose a reason for hiding this comment

Uh oh!

casparvl commented Mar 10, 2026

Uh oh!

eessi-bot-aws bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casparvl commented Mar 10, 2026

Uh oh!

eessi-bot-aws bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eessi-bot-aws bot commented Mar 10, 2026 •

edited

Loading

eessi-bot-aws bot commented Mar 10, 2026 •

edited

Loading