πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Storage advanced ⏱ 45 minutes K8s 1.28+

Enable GPUDirect Storage on OpenShift

Configure GPUDirect Storage (GDS) with the NVIDIA GPU Operator on OpenShift, including the Open Kernel Module requirement and nvidia-fs verification.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: GDS requires the Open Kernel Module (driver.kernelModuleType: open). Set gds.enabled: true in the ClusterPolicy, and the GPU Operator deploys the nvidia-fs-ctr container to load the nvidia-fs kernel module.

GPUDirect Storage (GDS) enables direct DMA transfers between GPU memory and storage, bypassing CPU bounce buffers. Starting with GPU Operator v23.9.1, GDS requires the NVIDIA Open Kernel Module.

Prerequisites

RequirementValue
GPU Operatorv23.9.1+
GDS Driverv2.17.5+
Kernel ModuleOpen (kernelModuleType: open)
Kernel5.12+

Step 1 β€” Configure the ClusterPolicy

oc edit clusterpolicy gpu-cluster-policy

Set the required fields:

spec:
  driver:
    kernelModuleType: open    # Required for GDS
  gds:
    enabled: true

Step 2 β€” Apply and Restart

# Restart driver pods to pick up the new configuration
oc delete pod -n gpu-operator -l app=nvidia-driver-daemonset

The GPU Operator will:

  1. Build the Open Kernel Module for your host kernel
  2. Deploy the nvidia-fs-ctr container inside each driver pod
  3. Load the nvidia_fs kernel module on each GPU node

Step 3 β€” Verify Pod Structure

kubectl describe pod -n gpu-operator nvidia-driver-daemonset-xxxxx

Confirm these containers are present:

  • nvidia-driver-ctr β€” main GPU driver
  • nvidia-fs-ctr β€” GDS filesystem module

If driver.rdma.enabled=true is also set, you will also see nvidia-peermem-ctr.

Step 4 β€” Verify Kernel Modules

SSH into a GPU worker node:

oc debug node/<node-name>
chroot /host
lsmod | grep nvidia_fs
modinfo nvidia_fs

Both commands should succeed. If modinfo fails, see the related recipe on troubleshooting nvidia-fs module conflicts.

Step 5 β€” Verify All Pods Are Running

kubectl get pod -n gpu-operator

The driver DaemonSet pods should show 3/3 Running (driver + peermem + fs containers) with no CrashLoopBackOff errors.

Common Pitfall

If gds.enabled=true is set but driver.kernelModuleType is proprietary or auto (resolving to proprietary), the nvidia-fs-ctr container will fail with:

insmod: ERROR: could not insert module nvidia-fs.ko: File exists

This happens because the proprietary driver stack inserts modules into kernel memory without placing .ko files on disk, creating a mismatch with the GDS container. The fix is to explicitly set kernelModuleType: open.

Why This Matters

GDS eliminates CPU bounce buffers for storage I/O, reducing latency and CPU overhead. This is critical for AI/ML pipelines that load large datasets from NFS or NVMe storage directly into GPU memory.

#nvidia #gpu #gds #gpudirect #storage #openshift #gpu-operator
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens