Disable GDS and Enable IOMMU Passthrough on K8s GPUs
Disable GPUDirect Storage (GDS) when not needed and configure IOMMU passthrough mode for GPU and NIC device assignment. Kernel parameters, BIOS settings, VFIO
π‘ Quick Answer: Disable GDS with
CUFILE_ENV_PATH_JSON=/dev/nullor uninstallnvidia-gdspackage when not using direct storage I/O. Enable IOMMU passthrough with kernel parameteriommu=pt(orintel_iommu=on iommu=pt) β this keeps IOMMU active for device isolation but in passthrough mode for DMA performance, avoiding the IOMMU translation overhead that can reduce GPUDirect RDMA throughput by 10-15%.
The Problem
- GPUDirect Storage (GDS) loads kernel modules and drivers that conflict with some workloads
- GDS is only needed for direct NVMe-to-GPU transfers β unnecessary for training/inference
- IOMMU in full translation mode adds DMA overhead (10-15% bandwidth loss for RDMA)
- Need device passthrough for VFIO (SR-IOV VFs) but without IOMMU performance penalty
- Bare-metal GPU nodes need IOMMU for security isolation without sacrificing performance
The Solution
Disable GPUDirect Storage (GDS)
# Option 1: Disable at runtime (per-pod)
export CUFILE_ENV_PATH_JSON=/dev/null
# This tells libcufile to skip initialization
# Option 2: Unload GDS kernel modules
rmmod nvidia_fs
# Verify
lsmod | grep nvidia_fs
# (should return nothing)
# Option 3: Prevent loading at boot
echo "blacklist nvidia_fs" > /etc/modprobe.d/blacklist-gds.conf
depmod -a
# Option 4: Uninstall GDS package entirely
apt-get remove nvidia-gds
# or
yum remove nvidia-gdsDisable GDS in GPU Operator
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
spec:
driver:
enabled: true
rdma:
enabled: true # Keep RDMA (nvidia-peermem)
useHostMofed: true
gds:
enabled: false # Disable GPUDirect Storage
# GDS components won't be deployedDisable GDS per Container
apiVersion: v1
kind: Pod
spec:
containers:
- name: training
image: nvcr.io/nvidia/pytorch:24.04-py3
env:
# Disable cuFile/GDS in container
- name: CUFILE_ENV_PATH_JSON
value: "/dev/null"
# Alternative: disable via CUDA env
- name: CUDA_DISABLE_GDS
value: "1"Enable IOMMU Passthrough Mode
# Check current IOMMU status
dmesg | grep -i iommu
cat /proc/cmdline | grep iommu
# For Intel CPUs: enable IOMMU with passthrough
# Edit /etc/default/grub:
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt"
# For AMD CPUs:
GRUB_CMDLINE_LINUX="amd_iommu=on iommu=pt"
# Regenerate GRUB and reboot
grub2-mkconfig -o /boot/grub2/grub.cfg
# or (Ubuntu)
update-grub
rebootIOMMU Modes Explained
Mode β Kernel Param β DMA Performance β Device Isolation β Use Case
βββββββββββββββΌββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββ
Off β iommu=off β Native (best) β None β Trusted bare-metal
β intel_iommu=off β β β
βββββββββββββββΌββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββ
Passthrough β iommu=pt β Native (best) β Groups only β GPU/RDMA nodes β
β intel_iommu=on β β (no translation) β
βββββββββββββββΌββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββ
Full (strict) β iommu=strict β ~85-90% β Full DMA remap β VMs, untrusted
β intel_iommu=on β (10-15% loss) β β devices
βββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββ
For GPU/RDMA workloads: iommu=pt is the correct choice.
- IOMMU hardware is active (needed for VFIO/SR-IOV device assignment)
- But DMA operations bypass IOMMU translation (no performance penalty)
- Devices assigned to VFIO still get proper isolation via IOMMU groupsOpenShift MachineConfig for IOMMU Passthrough
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-iommu-passthrough
labels:
machineconfiguration.openshift.io/role: gpu-worker
spec:
kernelArguments:
- "intel_iommu=on"
- "iommu=pt"
# Node will reboot after MCP applies thisCombined: Disable GDS + IOMMU Passthrough (OpenShift)
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-gpu-node-kernel-params
labels:
machineconfiguration.openshift.io/role: gpu-worker
spec:
kernelArguments:
- "intel_iommu=on"
- "iommu=pt"
config:
ignition:
version: 3.4.0
storage:
files:
- path: /etc/modprobe.d/blacklist-gds.conf
mode: 0644
contents:
source: data:,blacklist%20nvidia_fs%0A
- path: /etc/modules-load.d/nvidia-peermem.conf
mode: 0644
contents:
source: data:,nvidia-peermem%0AVerify IOMMU Passthrough Active
# Check kernel command line
cat /proc/cmdline
# ... intel_iommu=on iommu=pt ...
# Check IOMMU is active but in passthrough
dmesg | grep -i "iommu"
# [ 0.000000] DMAR: IOMMU enabled
# [ 2.345678] iommu: Default domain type: Passthrough
# Verify IOMMU groups exist (needed for VFIO)
find /sys/kernel/iommu_groups/ -type l | head -10
# /sys/kernel/iommu_groups/0/devices/0000:00:00.0
# /sys/kernel/iommu_groups/1/devices/0000:00:01.0
# Check GPU's IOMMU group
GPU_PCI="0000:41:00.0" # Your GPU's PCIe address
find /sys/kernel/iommu_groups/*/devices -name "$GPU_PCI"
# /sys/kernel/iommu_groups/45/devices/0000:41:00.0
# Verify no DMA translation overhead
dmesg | grep "Passthrough"
# iommu: Default domain type: Passthrough (set via kernel command line)VFIO Device Assignment with Passthrough
# VFIO requires IOMMU (even in passthrough mode, it uses IOMMU groups)
# Bind SR-IOV VF to VFIO driver
echo "0000:41:00.2" > /sys/bus/pci/drivers/mlx5_core/unbind
echo "15b3 101e" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "0000:41:00.2" > /sys/bus/pci/drivers/vfio-pci/bind
# Verify
ls /dev/vfio/
# 45 vfio (group 45 = device's IOMMU group)Performance Impact Measurement
# Test RDMA bandwidth with IOMMU modes:
# 1. With iommu=pt (passthrough):
ib_write_bw -d mlx5_0 --use_cuda=0 -s 4194304
# 4194304 5000 395.2 Gb/sec
# 2. With iommu=strict (full translation):
ib_write_bw -d mlx5_0 --use_cuda=0 -s 4194304
# 4194304 5000 340.1 Gb/sec (14% slower!)
# 3. With iommu=off:
ib_write_bw -d mlx5_0 --use_cuda=0 -s 4194304
# 4194304 5000 396.8 Gb/sec (same as passthrough)When to Keep GDS Enabled
Keep GDS enabled when:
β
Running checkpoint saves to NVMe (direct GPU β NVMe, skips CPU)
β
Loading datasets from local NVMe directly to GPU memory
β
Running GDS-optimized frameworks (RAPIDS, cuDF, Magnum IO)
Disable GDS when:
β Training with data loaded from network storage (NFS, S3, Ceph)
β Running inference only (model weights loaded once at startup)
β GDS modules conflict with other drivers
β No local NVMe drives on the node
β Debugging GPU/RDMA issues (reduce variable count)Common Issues
βnvidia_fs: Unknown symbolβ after driver update
- Cause: GDS module version mismatch with NVIDIA driver
- Fix: Reinstall
nvidia-gdsmatching driver version; or blacklist if not needed
IOMMU groups too large (multiple devices in one group)
- Cause: Platform doesnβt support ACS (Access Control Services) on PCIe switches
- Fix: Enable ACS in BIOS; or use
pcie_acs_override=downstream,multifunction(less secure)
GPUDirect RDMA stops working after enabling iommu=strict
- Cause: Full IOMMU translation breaks nvidia-peermem DMA mappings on some drivers
- Fix: Use
iommu=ptinstead ofiommu=strictβ keeps isolation with native DMA speed
VFIO bind fails: βNo IOMMU groupβ
- Cause: IOMMU disabled in BIOS (VT-d / AMD-Vi) or kernel
- Fix: Enable VT-d in BIOS; add
intel_iommu=onto kernel parameters; reboot
Best Practices
iommu=ptfor all GPU/RDMA nodes β best performance with device isolation- Disable GDS unless actively using NVMe-to-GPU transfers β reduces module conflicts
- Keep nvidia-peermem loaded β needed for GPUDirect RDMA regardless of GDS status
- Enable VT-d in BIOS β required for IOMMU even in passthrough mode
- Use MachineConfig/kernel args for consistency β not manual modprobe
- Verify passthrough after reboot β check
dmesg | grep "Default domain" - Donβt use
iommu=offβ breaks VFIO/SR-IOV device assignment entirely
Key Takeaways
- GDS (GPUDirect Storage) β GDRDMA β GDS is for NVMe, GDRDMA is for networking
- Disable GDS:
CUFILE_ENV_PATH_JSON=/dev/null(runtime) or blacklist nvidia_fs (persistent) - IOMMU passthrough (
iommu=pt): IOMMU hardware active for isolation, no DMA translation overhead - Performance: passthrough = native speed; strict = 10-15% DMA bandwidth loss
- GPU Operator:
gds.enabled: falsedisables GDS;driver.rdma.enabled: truekeeps RDMA - OpenShift:
kernelArgumentsin MachineConfig for IOMMU; storage files for module blacklist - VFIO/SR-IOV requires IOMMU enabled β passthrough mode satisfies both performance and isolation

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
