πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Troubleshooting advanced ⏱ 15 minutes K8s 1.28+

Diagnose NVIDIA Memory-Only Kernel Modules on OpenShift

Understand why lsmod shows NVIDIA modules loaded but modinfo fails, and how the GPU Operator's proprietary driver container inserts modules without.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: lsmod reads /proc/modules (in-memory state) while modinfo searches for .ko files on disk. Proprietary NVIDIA driver containers use insmod to load modules directly into memory without installing .ko files under /lib/modules/, causing modinfo to fail.

A confusing situation arises on OpenShift when lsmod shows NVIDIA modules loaded but modinfo cannot find them.

The Symptom

lsmod | grep nvidia_fs
# nvidia_fs  323584  0

modinfo nvidia_fs
# modinfo: ERROR: Module nvidia_fs not found.

The module is loaded and functioning, yet modinfo reports it does not exist.

Why This Happens

Two tools, two data sources:

ToolData SourceWhat It Shows
lsmod/proc/modulesKernel memory β€” what is currently loaded
modinfo/lib/modules/$(uname -r)/Disk β€” where .ko files are stored

The NVIDIA GPU Operator’s proprietary driver flow works like this:

  1. Extracts the .run installer inside the container
  2. Runs with --no-kernel-modules flag (skips on-disk installation)
  3. Uses insmod to directly insert .ko files from the container filesystem into the host kernel
  4. Does not copy .ko files to /lib/modules/ on the host

This leaves the kernel with loaded modules that have no backing file on the host disk.

How to Confirm

oc debug node/<node-name>
chroot /host

# Check for .ko files on disk
find /lib/modules/$(uname -r) -name "nvidia*.ko" -o -name "nvidia*.ko.xz"

# Compare with loaded modules
lsmod | grep nvidia

If find returns fewer files than lsmod shows modules, those missing ones are memory-only.

Impact

Memory-only modules cause problems when:

  • GDS tries to load its own nvidia_fs.ko β†’ insmod: File exists
  • Module updates fail because there is nothing to replace on disk
  • Debugging cannot inspect module metadata or version info
  • depmod cannot track module dependencies

Resolution

Switch to the Open Kernel Module, which properly installs .ko files on disk:

oc edit clusterpolicy gpu-cluster-policy
spec:
  driver:
    kernelModuleType: open

After switching, reboot nodes and restart driver pods. Then verify:

# Both commands should succeed
lsmod | grep nvidia_fs
modinfo nvidia_fs

# .ko file exists on disk
ls -la /lib/modules/$(uname -r)/extra/nvidia*.ko

Why This Matters

Memory-only modules create invisible version mismatches and block GDS initialization. Switching to the open kernel module provides full on-disk module management, proper modinfo output, and compatibility with all GPU Operator features.

#nvidia #gpu #kernel-modules #troubleshooting #openshift #modinfo
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens