Enable GPUDirect Storage in ClusterPolicy
Enable NVIDIA GPUDirect Storage (GDS) in the GPU Operator ClusterPolicy for direct GPU-to-NVMe data paths. Driver module configuration and verification.
π‘ Quick Answer: Enable GDS by setting
driver.manager.envwithENABLE_GPU_DIRECT_STORAGE=truein your ClusterPolicy, then addgds.enabled: truein the GDS section. This loads thenvidia-fskernel module, enabling direct data transfer between GPUs and NVMe storage, bypassing the CPU.
The Problem
AI training workloads spending 30-40% of time waiting for data I/O from storage to GPU memory. The default path goes: NVMe β CPU β System RAM β PCIe β GPU memory. You need GPUDirect Storage (GDS) to eliminate the CPU bottleneck by transferring data directly from NVMe to GPU via DMA.
The Solution
Step 1: Verify Prerequisites
# Check NVIDIA driver version (GDS requires 525.60+)
oc exec -n gpu-operator $(oc get pod -n gpu-operator -l app=nvidia-driver-daemonset -o name | head -1) -- nvidia-smi --query-gpu=driver_version --format=csv,noheader
# 535.129.03 β
# Verify NVMe drives are available on GPU nodes
oc debug node/gpu-worker-1 -- chroot /host nvme listStep 2: Update ClusterPolicy
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
spec:
driver:
enabled: true
version: "535.129.03"
manager:
env:
- name: ENABLE_GPU_DIRECT_STORAGE
value: "true"
gds:
enabled: true
version: "v2.17.5"
image: nvidia-fs
repository: nvcr.io/nvidia/cloud-nativeApply:
oc apply -f clusterpolicy.yaml
# Or patch existing policy
oc patch clusterpolicy cluster-policy --type=merge -p '{
"spec": {
"driver": {
"manager": {
"env": [{"name": "ENABLE_GPU_DIRECT_STORAGE", "value": "true"}]
}
},
"gds": {
"enabled": true
}
}
}'Step 3: Verify GDS Is Active
# Check nvidia-fs module is loaded
oc debug node/gpu-worker-1 -- chroot /host lsmod | grep nvidia_fs
# nvidia_fs 282624 0
# Check GDS driver pods are running
oc get pods -n gpu-operator -l app=nvidia-gds
# nvidia-gds-xxxxx 1/1 Running 0 5m
# Verify GDS functionality
oc exec -n gpu-operator $(oc get pod -n gpu-operator -l app=nvidia-driver-daemonset -o name | head -1) -- \
nvidia-smi --query-gpu=name,gds --format=csvStep 4: Test GDS Performance
apiVersion: v1
kind: Pod
metadata:
name: gds-benchmark
spec:
containers:
- name: benchmark
image: nvcr.io/nvidia/cuda:12.4.0-devel-ubuntu22.04
command: ["bash", "-c", "apt-get update && apt-get install -y gds-tools && gdscheck -p"]
resources:
limits:
nvidia.com/gpu: 1
volumeMounts:
- name: nvme-data
mountPath: /data
volumes:
- name: nvme-data
hostPath:
path: /mnt/nvme
type: DirectoryCommon Issues
nvidia_fs Module Not Loading
# Check driver container logs for GDS errors
oc logs -n gpu-operator -l app=nvidia-driver-daemonset -c nvidia-driver | grep -i gds
# Common cause: kernel headers mismatch
# Fix: ensure the driver container matches the node's kernel versionGDS Pod CrashLoopBackOff
# Check GDS DaemonSet logs
oc logs -n gpu-operator -l app=nvidia-gds
# Common: incompatible GDS version with driver version
# Verify compatibility matrix: https://docs.nvidia.com/gpudirect-storage/Best Practices
- Use NVMe-backed PVs β GDS only benefits from direct NVMe access, not network storage
- Pin GDS version β match with your NVIDIA driver version per the compatibility matrix
- Test with
gdscheckβ verify GDS is working before deploying AI workloads - Monitor with DCGM β track
DCGM_FI_PROF_PCIE_RX_BYTESandDCGM_FI_PROF_PCIE_TX_BYTES
Key Takeaways
- GDS enables direct NVMe β GPU data transfer, bypassing CPU (up to 3x I/O speedup)
- Enable via ClusterPolicy:
driver.manager.env.ENABLE_GPU_DIRECT_STORAGE=true+gds.enabled: true - Requires NVIDIA driver 525.60+ and the
nvidia-fskernel module - Only benefits workloads reading from local NVMe β not network storage

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
