IOMMU Kernel Parameters for Kubernetes GPU Nodes
Configure IOMMU kernel parameters for optimal GPU and RDMA performance on Kubernetes. Compare intel_iommu, amd_iommu, and iommu settings, passthrough vs off vs
π‘ Quick Answer: For GPU/RDMA nodes use
iommu=pt(passthrough) β IOMMU hardware enabled for device isolation but DMA bypasses translation tables (native speed). For environments where you need the generic IOMMU layer without vendor-specific drivers:intel_iommu=off amd_iommu=off iommu=onactivates the generic IOMMU subsystem only. For maximum bare-metal performance without SR-IOV:iommu=offdisables all IOMMU overhead entirely.
The Problem
- Different IOMMU parameter combinations have drastically different performance and feature impacts
- SR-IOV and VFIO require IOMMU groups but full translation kills RDMA performance
- Vendor-specific IOMMU (VT-d / AMD-Vi) vs generic IOMMU subsystem confusion
- Need to balance device security isolation with DMA throughput for GPUs
- Wrong IOMMU settings can break GPUDirect RDMA or prevent SR-IOV device assignment
The Solution
All IOMMU Parameter Combinations
Parameters β Effect β Use Case
βββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββΌβββββββββββββββββ
intel_iommu=on iommu=pt β VT-d ON, passthrough DMA β GPU+SR-IOV nodes β
amd_iommu=on iommu=pt β AMD-Vi ON, passthrough DMA β AMD GPU nodes β
intel_iommu=off amd_iommu=off iommu=on β Generic IOMMU only (no VT-d) β Specific drivers
iommu=pt β Platform IOMMU, passthrough β Auto-detect vendor
iommu=off β All IOMMU disabled β Bare-metal, no SR-IOV
intel_iommu=on iommu=strict β VT-d ON, full DMA remapping β VMs, security-first
(no params) β Platform default (varies) β Not recommended
βββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ΄βββββββββββββββββRecommended: GPU Nodes with SR-IOV (iommu=pt)
# /etc/default/grub β Intel platform
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt"
# /etc/default/grub β AMD platform
GRUB_CMDLINE_LINUX="amd_iommu=on iommu=pt"
# What this does:
# 1. Enables hardware IOMMU (VT-d or AMD-Vi)
# 2. Creates IOMMU groups (required for VFIO/SR-IOV)
# 3. Sets DMA domain to "passthrough" (no address translation)
# 4. Result: native DMA speed + device isolation capabilityAlternative: Generic IOMMU Without Vendor Drivers
# /etc/default/grub
GRUB_CMDLINE_LINUX="intel_iommu=off amd_iommu=off iommu=on"
# What this does:
# 1. Disables vendor-specific IOMMU drivers (VT-d DMA remapping engine OFF)
# 2. Enables generic Linux IOMMU subsystem (iommu core)
# 3. IOMMU groups still created via platform firmware (ACPI DMAR/IVRS)
# 4. No DMA remapping overhead (vendor engine disabled)
# 5. Device isolation relies on firmware-reported groups only
# When to use:
# - Vendor IOMMU driver causes issues (rare VT-d bugs with specific hardware)
# - Want IOMMU group info without DMA remapping
# - Platform firmware provides adequate isolation
# - Debugging: isolate whether vendor driver or generic layer causes issuesBare-Metal Without SR-IOV (iommu=off)
# /etc/default/grub
GRUB_CMDLINE_LINUX="iommu=off"
# or explicitly:
GRUB_CMDLINE_LINUX="intel_iommu=off iommu=off"
# What this does:
# 1. Completely disables all IOMMU functionality
# 2. No IOMMU groups created
# 3. No DMA translation (maximum raw performance)
# 4. BREAKS: SR-IOV, VFIO device assignment, secure device isolation
# When to use:
# - Bare-metal GPU nodes without SR-IOV NICs
# - All NICs used as whole PFs (not virtualized)
# - Maximum possible DMA performance (marginal gain over iommu=pt)
# - No virtualization or device passthrough neededOpenShift MachineConfig Examples
# Option 1: iommu=pt (recommended for GPU + SR-IOV)
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-iommu-passthrough-intel
labels:
machineconfiguration.openshift.io/role: gpu-worker
spec:
kernelArguments:
- "intel_iommu=on"
- "iommu=pt"
---
# Option 2: Generic IOMMU only
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-iommu-generic
labels:
machineconfiguration.openshift.io/role: gpu-worker
spec:
kernelArguments:
- "intel_iommu=off"
- "amd_iommu=off"
- "iommu=on"Verify Current IOMMU Configuration
# Check active kernel parameters
cat /proc/cmdline | tr ' ' '\n' | grep -i iommu
# Check IOMMU status in dmesg
dmesg | grep -i "iommu\|dmar\|amd-vi"
# Key lines to look for:
# "DMAR: IOMMU enabled" β VT-d active
# "Default domain type: Passthrough" β passthrough mode β
# "Default domain type: Translated" β strict/full mode (slow)
# "AMD-Vi: IOMMU performance counters..." β AMD-Vi active
# Check IOMMU domain type per device
cat /sys/kernel/iommu_groups/*/type 2>/dev/null | sort | uniq -c
# 128 identity (passthrough β devices use identity mapping)
# 0 DMA (translated β would show if strict)
# List IOMMU groups
ls /sys/kernel/iommu_groups/ | wc -l
# 128 (or similar β should be > 0 if IOMMU enabled)
# Find GPU's IOMMU group
lspci -nn | grep NVIDIA
# 41:00.0 3D controller [0302]: NVIDIA Corporation [10de:2330]
readlink -f /sys/bus/pci/devices/0000:41:00.0/iommu_group
# /sys/kernel/iommu_groups/45Feature Compatibility Matrix
Feature β iommu=off β iommu=pt β iommu=strict β iommu=on (no vendor)
ββββββββββββββββββββββββββββΌββββββββββββΌβββββββββββΌβββββββββββββββΌβββββββββββββββββββββ
GPUDirect RDMA β β
Fast β β
Fast β β οΈ Slower β β
Fast
SR-IOV VF assignment β β Broken β β
Works β β
Works β β οΈ May work
VFIO device passthrough β β Broken β β
Works β β
Works β β οΈ Limited
DMA performance β 100% β ~100% β 85-90% β ~100%
Device isolation β None β Groups β Full remap β Groups (FW-based)
NVIDIA GPU Operator β β
β β
β β
β β
nvidia-peermem (RDMA) β β
β β
β β οΈ May fail β β
ββββββββββββββββββββββββββββ΄ββββββββββββ΄βββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββBIOS Settings Required
Setting (Intel) β Required For
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
VT-d (Virtualization) β intel_iommu=on / iommu=pt
ACS (Access Control) β Fine-grained IOMMU groups
SR-IOV β Virtual Functions on NICs
Above 4G Decoding β Large BAR GPUs (A100/H100)
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
Setting (AMD) β Required For
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
AMD-Vi / IOMMU β amd_iommu=on / iommu=pt
ACS β Fine-grained IOMMU groups
SR-IOV β Virtual Functions on NICs
βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ
If BIOS VT-d is OFF:
- intel_iommu=on has no effect (hardware not available)
- No IOMMU groups created
- SR-IOV/VFIO will failPerformance Benchmark Comparison
Test: ib_write_bw --use_cuda=0 -s 4194304 (4MB GPUDirect RDMA write)
Configuration β Bandwidth β Relative
ββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββΌββββββββββ
iommu=off β 396.8 Gb/s β 100%
intel_iommu=on iommu=pt β 395.2 Gb/s β 99.6%
intel_iommu=off amd_iommu=off iommu=on β 394.5 Gb/s β 99.4%
intel_iommu=on iommu=strict β 340.1 Gb/s β 85.7%
ββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββ΄ββββββββββ
Key insight: passthrough and generic-only are both ~100% native speed.
Only full strict translation has measurable overhead (14% loss).Transition Between Modes
# Check if you can switch from strict to passthrough at runtime (kernel 5.15+):
echo passthrough > /sys/kernel/iommu_groups/45/type
# May work for individual groups on newer kernels
# But generally: requires reboot with new kernel parameters
# Safe transition procedure:
# 1. Cordon node: kubectl cordon gpu-node-1
# 2. Drain workloads: kubectl drain gpu-node-1 --ignore-daemonsets
# 3. Apply MachineConfig (OpenShift) or edit grub (bare-metal)
# 4. Reboot
# 5. Verify: dmesg | grep "Default domain type"
# 6. Uncordon: kubectl uncordon gpu-node-1Common Issues
SR-IOV VF creation fails with iommu=off
- Cause: VFIO needs IOMMU groups for device isolation
- Fix: Switch to
iommu=ptβ gets both performance AND SR-IOV support
βDMAR: IOMMU disabledβ despite kernel params
- Cause: VT-d disabled in BIOS
- Fix: Enable VT-d / AMD-Vi in BIOS β reboot β verify with
dmesg | grep DMAR
GPUDirect RDMA bandwidth drops after enabling iommu=strict
- Cause: Full DMA address translation for every transfer
- Fix: Switch to
iommu=ptβ passthrough gives native speed with isolation
βNo IOMMU groupβ when binding device to VFIO
- Cause: IOMMU not enabled or not detecting device
- Fix: Verify
intel_iommu=onin cmdline AND VT-d enabled in BIOS; check DMAR ACPI table exists
Best Practices
iommu=ptis the default recommendation β covers 95% of GPU/RDMA use cases- Donβt use
iommu=strictfor GPU nodes β 14% bandwidth loss with no real benefit iommu=offonly if absolutely no SR-IOV β saves IOMMU group overhead but breaks VFIO- Always enable VT-d/AMD-Vi in BIOS β even if you plan to use passthrough
- Test RDMA bandwidth after any IOMMU change β verify no regression
- Use MachineConfig for fleet consistency β donβt rely on manual grub edits
- Document your choice β future operators need to know why params were set
Key Takeaways
iommu=pt: IOMMU hardware ON + passthrough DMA = best for GPU + SR-IOV (recommended)intel_iommu=off amd_iommu=off iommu=on: generic IOMMU subsystem only (no vendor driver)iommu=off: everything disabled (max perf, breaks SR-IOV/VFIO)iommu=strict: full DMA remapping (14% bandwidth loss β avoid for GPU nodes)- Passthrough mode: native DMA speed (~100%) with IOMMU groups for isolation
- BIOS VT-d/AMD-Vi must be enabled for any IOMMU kernel param to take effect
- SR-IOV requires IOMMU groups β canβt use
iommu=offwith SR-IOV NICs

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
