Enable GPUDirect Storage on OpenShift
Configure GPUDirect Storage (GDS) with the NVIDIA GPU Operator on OpenShift, including the Open Kernel Module requirement and nvidia-fs verification.
π‘ Quick Answer: GDS requires the Open Kernel Module (
driver.kernelModuleType: open). Setgds.enabled: truein the ClusterPolicy, and the GPU Operator deploys thenvidia-fs-ctrcontainer to load thenvidia-fskernel module.
GPUDirect Storage (GDS) enables direct DMA transfers between GPU memory and storage, bypassing CPU bounce buffers. Starting with GPU Operator v23.9.1, GDS requires the NVIDIA Open Kernel Module.
Prerequisites
| Requirement | Value |
|---|---|
| GPU Operator | v23.9.1+ |
| GDS Driver | v2.17.5+ |
| Kernel Module | Open (kernelModuleType: open) |
| Kernel | 5.12+ |
Step 1 β Configure the ClusterPolicy
oc edit clusterpolicy gpu-cluster-policySet the required fields:
spec:
driver:
kernelModuleType: open # Required for GDS
gds:
enabled: trueStep 2 β Apply and Restart
# Restart driver pods to pick up the new configuration
oc delete pod -n gpu-operator -l app=nvidia-driver-daemonsetThe GPU Operator will:
- Build the Open Kernel Module for your host kernel
- Deploy the
nvidia-fs-ctrcontainer inside each driver pod - Load the
nvidia_fskernel module on each GPU node
Step 3 β Verify Pod Structure
kubectl describe pod -n gpu-operator nvidia-driver-daemonset-xxxxxConfirm these containers are present:
nvidia-driver-ctrβ main GPU drivernvidia-fs-ctrβ GDS filesystem module
If driver.rdma.enabled=true is also set, you will also see nvidia-peermem-ctr.
Step 4 β Verify Kernel Modules
SSH into a GPU worker node:
oc debug node/<node-name>
chroot /host
lsmod | grep nvidia_fs
modinfo nvidia_fsBoth commands should succeed. If modinfo fails, see the related recipe on troubleshooting nvidia-fs module conflicts.
Step 5 β Verify All Pods Are Running
kubectl get pod -n gpu-operatorThe driver DaemonSet pods should show 3/3 Running (driver + peermem + fs containers) with no CrashLoopBackOff errors.
Common Pitfall
If gds.enabled=true is set but driver.kernelModuleType is proprietary or auto (resolving to proprietary), the nvidia-fs-ctr container will fail with:
insmod: ERROR: could not insert module nvidia-fs.ko: File existsThis happens because the proprietary driver stack inserts modules into kernel memory without placing .ko files on disk, creating a mismatch with the GDS container. The fix is to explicitly set kernelModuleType: open.
Why This Matters
GDS eliminates CPU bounce buffers for storage I/O, reducing latency and CPU overhead. This is critical for AI/ML pipelines that load large datasets from NFS or NVMe storage directly into GPU memory.

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
