GPU Operator ClusterPolicy RDMA and GDS Configuration
Configure NVIDIA GPU Operator ClusterPolicy to disable RDMA and enable GPUDirect Storage (GDS). Control nvidia-peermem, nvidia-fs modules, driver
π‘ Quick Answer: In the GPU Operator
ClusterPolicy, setdriver.rdma.enabled: falseto skip nvidia-peermem (no GPUDirect RDMA), andgds.enabled: trueto deploy GPUDirect Storage (nvidia-fs) for direct NVMe-to-GPU transfers. When IOMMU is enabled (iommu=onstrict mode), GDS still works but RDMA may neediommu=ptfor full performance. These are independent toggles β you can have GDS without RDMA.
The Problem
- Not all GPU clusters need RDMA (single-node inference, no SR-IOV NICs)
- GDS is needed for fast checkpoint/data loading from local NVMe but RDMA is not
- Enabling RDMA when no InfiniBand/RoCE fabric exists causes module load errors
- Need to understand which driver components to enable for each topology
- IOMMU strict mode works with GDS but can conflict with RDMA nvidia-peermem
The Solution
ClusterPolicy: RDMA Disabled + GDS Enabled
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
spec:
operator:
defaultRuntime: containerd
driver:
enabled: true
version: "560.35.03"
rdma:
enabled: false # β RDMA DISABLED (no nvidia-peermem)
useHostMofed: false # Irrelevant when rdma.enabled=false
gds:
enabled: true # β GDS ENABLED (nvidia-fs for NVMe direct)
version: "2.17.5"
devicePlugin:
enabled: true
toolkit:
enabled: true
gfd:
enabled: true
dcgmExporter:
enabled: trueWhat Each Toggle Controls
Setting β Module Loaded β Feature Enabled
βββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
driver.rdma.enabled: true β nvidia-peermem β GPUDirect RDMA (GPUβNIC direct DMA)
driver.rdma.enabled: falseβ (none) β No RDMA β network traffic stages via CPU
βββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
gds.enabled: true β nvidia-fs β GPUDirect Storage (GPUβNVMe direct)
gds.enabled: false β (none) β No GDS β storage I/O goes via CPU bounce
βββββββββββββββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ
Components deployed:
rdma.enabled: true β nvidia-peermem-ctr (init container in driver pod)
gds.enabled: true β nvidia-fs-ctr (DaemonSet or driver pod sidecar)When to Disable RDMA
Disable driver.rdma.enabled when:
β No InfiniBand or RoCE NICs installed
β Single-node GPU server (no multi-node training)
β Using shared RDMA plugin instead (manages peermem independently)
β MLNX_OFED not installed on host (nvidia-peermem won't load anyway)
β Running inference only (no collective communication)
β IOMMU strict mode and RDMA conflicts (use GDS only)
Keep driver.rdma.enabled: true when:
β
Multi-node training with InfiniBand/RoCE
β
GPUDirect RDMA needed for NCCL performance
β
SR-IOV NICs with RDMA capability
β
Host has MLNX_OFED or inbox RDMA driversWhen to Enable GDS
Enable gds.enabled when:
β
Local NVMe drives for checkpoints (direct GPUβNVMe, bypass CPU)
β
RAPIDS/cuDF workloads reading Parquet from local disk
β
Training with large datasets on local storage
β
Checkpoint frequency is high (saves CPU cycles)
Disable gds.enabled when:
β No local NVMe (data comes from network: NFS, S3, Ceph)
β Inference only (model loaded once at startup)
β nvidia-fs causes conflicts with other storage drivers
β Kernel too old (< 5.4 β nvidia-fs won't load)RDMA Disabled + GDS Enabled + IOMMU Enabled
# Full ClusterPolicy for: no RDMA, with GDS, IOMMU strict/on
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
spec:
driver:
enabled: true
rdma:
enabled: false # No RDMA (no nvidia-peermem)
useHostMofed: false
# With iommu=on (strict), nvidia-peermem can have issues
# Disabling RDMA avoids this entirely
gds:
enabled: true # GDS works fine with IOMMU enabled
# nvidia-fs uses kernel DMA APIs that work with IOMMU translation
# No performance penalty for GDS with IOMMU (NVMe is local PCIe)
devicePlugin:
enabled: true
config:
name: device-plugin-config
toolkit:
enabled: true
env:
- name: CONTAINERD_CONFIG
value: /etc/containerd/config.tomlVerify Configuration
# Check ClusterPolicy status
kubectl get clusterpolicy cluster-policy -o yaml | grep -A5 "rdma\|gds"
# Check driver pod β should NOT have nvidia-peermem container
kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset -o yaml | grep -c peermem
# 0 (if rdma disabled)
# Check GDS pod/container
kubectl get pods -n gpu-operator -l app=nvidia-fs
# or
kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset -o yaml | grep nvidia-fs
# On node β verify modules
kubectl debug node/gpu-worker-0 -it --image=busybox -- chroot /host bash -c '
echo "=== nvidia-peermem (should NOT be loaded) ==="
lsmod | grep nvidia_peermem || echo "NOT LOADED (correct)"
echo "=== nvidia-fs (should be loaded) ==="
lsmod | grep nvidia_fs || echo "NOT LOADED (problem!)"
echo "=== IOMMU status ==="
dmesg | grep "Default domain" | tail -1
'IOMMU Impact on GDS vs RDMA
β iommu=off β iommu=pt β iommu=on (strict)
βββββββββββββββββββββββββΌββββββββββββΌβββββββββββΌβββββββββββββββββββ
GPUDirect RDMA β β
Fast β β
Fast β β οΈ 10-15% slower
(nvidia-peermem) β β β May have issues
βββββββββββββββββββββββββΌββββββββββββΌβββββββββββΌβββββββββββββββββββ
GPUDirect Storage β β
Fast β β
Fast β β
Fast
(nvidia-fs) β β β (local PCIe, no penalty)
βββββββββββββββββββββββββΌββββββββββββΌβββββββββββΌβββββββββββββββββββ
Regular GPU compute β β
β β
β β
βββββββββββββββββββββββββ΄ββββββββββββ΄βββββββββββ΄βββββββββββββββββββ
Why GDS works with strict IOMMU but RDMA may not:
- GDS: NVMe and GPU on same PCIe tree, IOMMU handles local DMA efficiently
- RDMA: NIC does remote DMA to GPU memory β IOMMU translation adds latency
on every network packet (thousands per second)OpenShift ClusterPolicy with GDS
# On OpenShift, GPU Operator is installed via OLM
# Edit the ClusterPolicy via:
oc edit clusterpolicy cluster-policy
# Or apply declaratively:
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
spec:
driver:
enabled: true
rdma:
enabled: false
upgradePolicy:
autoUpgrade: true
maxParallelUpgrades: 1
waitForCompletion:
timeoutSeconds: 1800
gds:
enabled: true
devicePlugin:
enabled: true
validator:
plugin:
env:
- name: WITH_GDS
value: "true"Combining with Shared RDMA Plugin
# Scenario: RDMA disabled in GPU Operator, but using shared RDMA plugin separately
# This is valid! The shared RDMA plugin manages /dev/infiniband access
# without nvidia-peermem (no GPUDirect β falls back to host memory staging)
# GPU Operator:
spec:
driver:
rdma:
enabled: false # No nvidia-peermem from GPU Operator
# Separately deployed:
# - k8s-rdma-shared-dev-plugin (gives pods /dev/infiniband access)
# - NCCL will use NET/IB transport WITHOUT GDRDMA suffix
# - Data path: GPU β CPU memory β NIC β wire (slower but works)
# If you need GDRDMA with shared plugin:
# Option 1: Set driver.rdma.enabled: true (GPU Operator loads peermem)
# Option 2: Load nvidia-peermem manually on host (outside GPU Operator)Common Issues
GDS pod CrashLoopBackOff after enabling
- Cause: Kernel version incompatible; or nvidia-fs conflicts with existing storage driver
- Fix: Check GDS pod logs; verify kernel β₯ 5.4; check for conflicting modules (nvme_fabrics)
βnvidia_peermem: module not foundβ warnings (rdma disabled)
- Cause: Some NCCL containers try to load peermem at runtime
- Fix: Expected when rdma disabled β NCCL falls back to non-GDRDMA path automatically
GDS enabled but cuFile operations fail
- Cause: nvidia-fs loaded but cuFile not configured; or filesystem doesnβt support GDS
- Fix: GDS requires ext4/XFS on NVMe; create
/etc/cufile.jsonconfig; verify withgdscheck
ClusterPolicy changes not taking effect
- Cause: Operator needs to restart driver pods; or nodes need drain
- Fix: GPU Operator automatically rolling-restarts driver DaemonSet. Wait for rollout; check
kubectl rollout status
Best Practices
- Match toggles to hardware β donβt enable RDMA without RDMA NICs
- GDS without RDMA is valid β independent features for different I/O paths
- IOMMU strict + GDS: fine β no performance penalty for local NVMe
- IOMMU strict + RDMA: avoid β use
iommu=ptinstead if RDMA needed - Validate after changes β check module load status on nodes
- Version-pin GDS β
gds.versionshould match CUDA toolkit version - Separate RDMA management β can use shared plugin independently of GPU Operator
Key Takeaways
driver.rdma.enabled: falseβ no nvidia-peermem β no GPUDirect RDMAgds.enabled: trueβ nvidia-fs deployed β GPUDirect Storage active (GPUβNVMe)- GDS and RDMA are independent β enable based on your I/O topology
- IOMMU strict: works with GDS (local PCIe), problematic for RDMA (remote DMA penalty)
- Without RDMA, NCCL still works over network β just stages data via CPU (slower)
- GPU Operator manages module lifecycle β no manual modprobe needed
- Shared RDMA plugin can work alongside GPU Operator with rdma disabled (no GDRDMA path)

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
