πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Configuration advanced ⏱ 60 minutes K8s 1.28+

Switch to Open NVIDIA Kernel Modules on OpenShift

Step-by-step guide to migrate the NVIDIA GPU Operator from proprietary to open kernel modules on OpenShift, enabling DMA-BUF and GPUDirect Storage support.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Set driver.kernelModuleType: open in the ClusterPolicy, reboot GPU worker nodes to clear proprietary modules from kernel memory, then let the GPU Operator rebuild driver pods with open kernel modules.

The NVIDIA Open Kernel Module is required for DMA-BUF GPUDirect RDMA and GPUDirect Storage. This recipe walks through the migration from proprietary to open modules with minimal disruption.

Before You Start

Verify your GPUs support open kernel modules (Turing architecture or newer):

nvidia-smi --query-gpu=gpu_name,compute_cap --format=csv

Compute capability 7.5+ (Turing) is required.

Step 1 β€” Check Current Module Type

# Check what the Operator is currently using
oc get clusterpolicy gpu-cluster-policy -o jsonpath='{.spec.driver.kernelModuleType}'

# Check what modules are loaded on a node
oc debug node/<gpu-node>
chroot /host
modinfo nvidia | grep -E "license|filename"

Proprietary modules show license: NVIDIA. Open modules show license: Dual MIT/GPL.

Step 2 β€” Update the ClusterPolicy

oc edit clusterpolicy gpu-cluster-policy
spec:
  driver:
    kernelModuleType: open

At this point the driver pods will attempt to restart, but proprietary modules may still be loaded in kernel memory on the host.

Step 3 β€” Cordon and Drain GPU Nodes

Process one node at a time to maintain cluster availability:

# Cordon the node to prevent scheduling
oc adm cordon <gpu-node>

# Drain workloads
oc adm drain <gpu-node> --ignore-daemonsets --delete-emptydir-data

Step 4 β€” Reboot the Node

Reboot clears all in-memory kernel modules:

oc debug node/<gpu-node>
chroot /host
systemctl reboot

Step 5 β€” Verify Open Modules After Reboot

oc debug node/<gpu-node>
chroot /host

# Verify open kernel module loaded
modinfo nvidia | grep license
# Expected: license: Dual MIT/GPL

# Verify all NVIDIA modules are on disk
find /lib/modules/$(uname -r) -name "nvidia*.ko" | head -10

# modinfo should work for all modules
modinfo nvidia_fs
modinfo nvidia_uvm

Step 6 β€” Uncordon the Node

oc adm uncordon <gpu-node>

Step 7 β€” Repeat for All GPU Nodes

Repeat Steps 3–6 for each GPU worker node in the cluster.

Step 8 β€” Verify Cluster-Wide Status

# All driver pods should be Running
oc get pods -n gpu-operator -l app=nvidia-driver-daemonset

# Check driver container logs for open module confirmation
oc logs -n gpu-operator ds/nvidia-driver-daemonset -c nvidia-driver-ctr | grep kernel_module_type
# Expected: kernel_module_type=open

Rollback

If you need to revert:

oc edit clusterpolicy gpu-cluster-policy
spec:
  driver:
    kernelModuleType: proprietary

Reboot nodes and restart driver pods. Note that DMA-BUF and GDS will no longer be available.

Why This Matters

Open kernel modules provide full on-disk module management, enable DMA-BUF and GPUDirect Storage, and align with NVIDIA’s recommended configuration for modern GPU Operator deployments.

#nvidia #gpu-operator #kernel-modules #open-kernel #openshift #migration
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens