πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Troubleshooting advanced ⏱ 30 minutes K8s 1.28+

Troubleshoot nvidia-fs Module Conflict on OpenShift

Diagnose and fix the 'insmod: ERROR: could not insert module nvidia-fs.ko: File exists' error when enabling GPUDirect Storage with the NVIDIA GPU Operator.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: The insmod: File exists error for nvidia-fs.ko occurs when the host has the module loaded from a previous or proprietary driver installation but the .ko file is missing on disk. Switch to kernelModuleType: open and reboot the node to clear stale modules.

When GDS is enabled in the ClusterPolicy, the nvidia-fs-ctr container may enter CrashLoopBackOff with the error:

Loading nvidia-fs...
insmod nvidia-fs.ko
insmod: ERROR: could not insert module nvidia-fs.ko: File exists

Root Cause

The host kernel already has nvidia_fs loaded in memory, but the module was inserted by the proprietary driver container without placing a matching .ko file on disk. When the GDS container tries to load its own copy, the kernel rejects it because a module with that name already exists.

Diagnose the Problem

SSH into the affected node:

oc debug node/<node-name>
chroot /host

Check module state:

# Module is loaded in memory
lsmod | grep nvidia_fs
# Output: nvidia_fs  323584  0

# But no .ko file exists on disk
modinfo nvidia_fs
# Output: modinfo: ERROR: Module nvidia_fs not found.

# Confirm the file is missing
find /lib/modules/$(uname -r) -name "nvidia*fs*"
# Output: empty

If lsmod shows the module but modinfo fails, you have memory-only modules from the proprietary driver stack.

Fix β€” Switch to Open Kernel Module

GDS v2.17.5+ requires the Open Kernel Module. Update your ClusterPolicy:

oc edit clusterpolicy gpu-cluster-policy
spec:
  driver:
    kernelModuleType: open    # Required for GDS
  gds:
    enabled: true

Clear Stale Modules

Reboot each GPU worker node to unload the proprietary modules:

oc debug node/<node-name>
chroot /host
systemctl reboot

After reboot, restart the driver pods:

oc delete pod -n gpu-operator -l app=nvidia-driver-daemonset

Verify the Fix

oc debug node/<node-name>
chroot /host

# Module loaded and file exists
lsmod | grep nvidia_fs
modinfo nvidia_fs

# Verify .ko file is on disk
find /lib/modules/$(uname -r) -name "nvidia*fs*"

Both lsmod and modinfo should succeed, and the .ko file should exist under /lib/modules/.

Why This Matters

The File exists error prevents GDS from initializing, blocking direct GPU-to-storage DMA transfers. Switching to the open kernel module ensures the driver container properly manages all module files on disk.

#nvidia #gpu #gds #nvidia-fs #troubleshooting #openshift
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens