Troubleshoot nvidia-fs Module Conflict on OpenShift
Diagnose and fix the 'insmod: ERROR: could not insert module nvidia-fs.ko: File exists' error when enabling GPUDirect Storage with the NVIDIA GPU Operator.
π‘ Quick Answer: The
insmod: File existserror fornvidia-fs.kooccurs when the host has the module loaded from a previous or proprietary driver installation but the.kofile is missing on disk. Switch tokernelModuleType: openand reboot the node to clear stale modules.
When GDS is enabled in the ClusterPolicy, the nvidia-fs-ctr container may enter CrashLoopBackOff with the error:
Loading nvidia-fs...
insmod nvidia-fs.ko
insmod: ERROR: could not insert module nvidia-fs.ko: File existsRoot Cause
The host kernel already has nvidia_fs loaded in memory, but the module was inserted by the proprietary driver container without placing a matching .ko file on disk. When the GDS container tries to load its own copy, the kernel rejects it because a module with that name already exists.
Diagnose the Problem
SSH into the affected node:
oc debug node/<node-name>
chroot /hostCheck module state:
# Module is loaded in memory
lsmod | grep nvidia_fs
# Output: nvidia_fs 323584 0
# But no .ko file exists on disk
modinfo nvidia_fs
# Output: modinfo: ERROR: Module nvidia_fs not found.
# Confirm the file is missing
find /lib/modules/$(uname -r) -name "nvidia*fs*"
# Output: emptyIf lsmod shows the module but modinfo fails, you have memory-only modules from the proprietary driver stack.
Fix β Switch to Open Kernel Module
GDS v2.17.5+ requires the Open Kernel Module. Update your ClusterPolicy:
oc edit clusterpolicy gpu-cluster-policyspec:
driver:
kernelModuleType: open # Required for GDS
gds:
enabled: trueClear Stale Modules
Reboot each GPU worker node to unload the proprietary modules:
oc debug node/<node-name>
chroot /host
systemctl rebootAfter reboot, restart the driver pods:
oc delete pod -n gpu-operator -l app=nvidia-driver-daemonsetVerify the Fix
oc debug node/<node-name>
chroot /host
# Module loaded and file exists
lsmod | grep nvidia_fs
modinfo nvidia_fs
# Verify .ko file is on disk
find /lib/modules/$(uname -r) -name "nvidia*fs*"Both lsmod and modinfo should succeed, and the .ko file should exist under /lib/modules/.
Why This Matters
The File exists error prevents GDS from initializing, blocking direct GPU-to-storage DMA transfers. Switching to the open kernel module ensures the driver container properly manages all module files on disk.

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
