Switch to Open NVIDIA Kernel Modules on OpenShift
Step-by-step guide to migrate the NVIDIA GPU Operator from proprietary to open kernel modules on OpenShift, enabling DMA-BUF and GPUDirect Storage support.
π‘ Quick Answer: Set
driver.kernelModuleType: openin the ClusterPolicy, reboot GPU worker nodes to clear proprietary modules from kernel memory, then let the GPU Operator rebuild driver pods with open kernel modules.
The NVIDIA Open Kernel Module is required for DMA-BUF GPUDirect RDMA and GPUDirect Storage. This recipe walks through the migration from proprietary to open modules with minimal disruption.
Before You Start
Verify your GPUs support open kernel modules (Turing architecture or newer):
nvidia-smi --query-gpu=gpu_name,compute_cap --format=csvCompute capability 7.5+ (Turing) is required.
Step 1 β Check Current Module Type
# Check what the Operator is currently using
oc get clusterpolicy gpu-cluster-policy -o jsonpath='{.spec.driver.kernelModuleType}'
# Check what modules are loaded on a node
oc debug node/<gpu-node>
chroot /host
modinfo nvidia | grep -E "license|filename"Proprietary modules show license: NVIDIA. Open modules show license: Dual MIT/GPL.
Step 2 β Update the ClusterPolicy
oc edit clusterpolicy gpu-cluster-policyspec:
driver:
kernelModuleType: openAt this point the driver pods will attempt to restart, but proprietary modules may still be loaded in kernel memory on the host.
Step 3 β Cordon and Drain GPU Nodes
Process one node at a time to maintain cluster availability:
# Cordon the node to prevent scheduling
oc adm cordon <gpu-node>
# Drain workloads
oc adm drain <gpu-node> --ignore-daemonsets --delete-emptydir-dataStep 4 β Reboot the Node
Reboot clears all in-memory kernel modules:
oc debug node/<gpu-node>
chroot /host
systemctl rebootStep 5 β Verify Open Modules After Reboot
oc debug node/<gpu-node>
chroot /host
# Verify open kernel module loaded
modinfo nvidia | grep license
# Expected: license: Dual MIT/GPL
# Verify all NVIDIA modules are on disk
find /lib/modules/$(uname -r) -name "nvidia*.ko" | head -10
# modinfo should work for all modules
modinfo nvidia_fs
modinfo nvidia_uvmStep 6 β Uncordon the Node
oc adm uncordon <gpu-node>Step 7 β Repeat for All GPU Nodes
Repeat Steps 3β6 for each GPU worker node in the cluster.
Step 8 β Verify Cluster-Wide Status
# All driver pods should be Running
oc get pods -n gpu-operator -l app=nvidia-driver-daemonset
# Check driver container logs for open module confirmation
oc logs -n gpu-operator ds/nvidia-driver-daemonset -c nvidia-driver-ctr | grep kernel_module_type
# Expected: kernel_module_type=openRollback
If you need to revert:
oc edit clusterpolicy gpu-cluster-policyspec:
driver:
kernelModuleType: proprietaryReboot nodes and restart driver pods. Note that DMA-BUF and GDS will no longer be available.
Why This Matters
Open kernel modules provide full on-disk module management, enable DMA-BUF and GPUDirect Storage, and align with NVIDIAβs recommended configuration for modern GPU Operator deployments.

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
