Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
Describe the bug
When RHEL 9 worker nodes are using EUS kernels, nvidia-driver-daemonset/nvidia-driver-ctr fails to install the below packages:
- kernel-headers
- kernel-devel
Example failure:
Installing Linux kernel headers...
+ echo 'Installing Linux kernel headers...'
+ dnf -q -y --releasever=9.6 install kernel-headers-5.14.0-570.112.1.el9_6.x86_64 kernel-devel-5.14.0-570.112.1.el9_6.x86_64
Error: Unable to find a match: kernel-headers-5.14.0-570.112.1.el9_6.x86_64 kernel-devel-5.14.0-570.112.1.el9_6.x86_64
I believe this is because on RHEL 9, the above packages reside in the AppStream RPM repos. But the driver container is not enabling the rhel-9-for-x86_64-appstream-eus-rpms repo.
Only rhel-9-for-x86_64-baseos-eus-rpmsrepo gets enabled:
|
dnf config-manager --set-enabled rhel-9-for-$DRIVER_ARCH-baseos-eus-rpms || true |
To Reproduce
Install NVIDIA GPU operator and driver on RHEL 9 EUS worker nodes.
sh-5.1# uname -r
5.14.0-570.112.1.el9_6.x86_64
Expected behavior
NVIDIA GPU operator's driver installer container to install all necessary packages successfully.
Environment (please provide the following information):
gpu-driver-container source (Commit SHA or image digest): Any
- NVIDIA Driver Version: Any
- Host OS: RHEL 9
- Kernel Version: 5.14.0-570.112.1.el9_6.x86_64 (or any RHEL 9 EUS kernel)
- Container Runtime Version: Any
- CPU Architecture x86_64
- GPU Model(s): Any
If applicable, also provide:
- Kubernetes Distro and Version: OpenShift
- NVIDIA GPU Operator version: Any
Information to attach (optional if deemed irrelevant)
- Output of
nvidia-smi
- Container logs
- Kernel logs (
dmesg)
- Driver install/build output
Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
Describe the bug
When RHEL 9 worker nodes are using EUS kernels,
nvidia-driver-daemonset/nvidia-driver-ctrfails to install the below packages:Example failure:
I believe this is because on RHEL 9, the above packages reside in the AppStream RPM repos. But the driver container is not enabling the
rhel-9-for-x86_64-appstream-eus-rpmsrepo.Only
rhel-9-for-x86_64-baseos-eus-rpmsrepo gets enabled:gpu-driver-container/rhel9/nvidia-driver
Line 109 in 0757b16
To Reproduce
Install NVIDIA GPU operator and driver on RHEL 9 EUS worker nodes.
Expected behavior
NVIDIA GPU operator's driver installer container to install all necessary packages successfully.
Environment (please provide the following information):
gpu-driver-containersource (Commit SHA or image digest): AnyIf applicable, also provide:
Information to attach (optional if deemed irrelevant)
nvidia-smidmesg)