Disable PCIe ACS for GPU-Direct P2P
Disable PCIe Access Control Services (ACS) to enable GPU-Direct peer-to-peer DMA between GPUs and RDMA NICs. Covers BIOS disable, kernel override, and when
π‘ Quick Answer: For bare-metal GPU clusters running only AI training (no VMs, no multi-tenant isolation), the simplest path is: disable VT-d/AMD-Vi entirely in BIOS. If you need SR-IOV (which requires IOMMU), then use
pcie_acs_override=downstream,multifunctionin kernel args to allow GPU-Direct P2P across PCIe switches.
The Problem
ACS (Access Control Services) on PCIe bridges blocks GPU-to-GPU and GPU-to-NIC direct DMA:
- GPUs behind the same PCIe switch canβt do P2P transfers
- NCCL falls back to CPU-staged copies (10-30x slower)
- GPU-Direct RDMA path broken between NIC and GPU on same root complex
- IOMMU groups become too large or too restrictive
The Solution
Decision: Do You Need IOMMU at All?
Question β Action
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Running VMs on this node? β Keep IOMMU enabled
Running SR-IOV (VFs for Pods)? β Keep IOMMU enabled
Multi-tenant with device isolation? β Keep IOMMU enabled
Bare-metal, single-tenant, GPUs only? β DISABLE IOMMU entirely
Need SR-IOV + GPU-Direct P2P? β IOMMU on + ACS overrideOption 1: Disable Virtualization Technology Entirely (Simplest)
If the node is bare-metal, dedicated to GPU training, no SR-IOV needed:
BIOS Settings β Disable All Virtualization:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Intel:
β’ VT-d (Directed I/O): DISABLED
β’ VT-x (Virtualization Tech): Keep Enabled (for containers)
β’ SR-IOV: DISABLED (if not using VFs)
β’ ACS: N/A (no IOMMU = no ACS enforcement)
AMD:
β’ AMD-Vi (IOMMU): DISABLED
β’ SVM (Secure Virtual Machine): Keep Enabled (for containers)
β’ SR-IOV: DISABLED
β’ ACS: N/A
Result: All DMA is direct, no translation, no ACS enforcement.
GPUDirect P2P and RDMA work at full speed immediately.# Kernel parameters (no IOMMU at all):
# Simply omit intel_iommu/amd_iommu parameters, or explicitly:
GRUB_CMDLINE_LINUX="intel_iommu=off"
# or just don't set any iommu parameter
# Verify after reboot:
dmesg | grep -i iommu
# Should show: nothing, or "DMAR: IOMMU disabled"
cat /proc/cmdline
# No iommu parameters presentOpenShift MachineConfig (disable IOMMU):
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-gpu-worker-no-iommu
labels:
machineconfiguration.openshift.io/role: gpu-worker
spec:
kernelArguments:
- intel_iommu=off
- pci=reallocTalos Linux:
machine:
install:
extraKernelArgs:
- intel_iommu=off
- pci=reallocOption 2: Keep IOMMU + Disable ACS (Need SR-IOV + GPU-Direct)
When you need both SR-IOV (for RDMA VFs) and GPU-Direct P2P:
# Kernel parameter to override ACS on all PCIe bridges:
pcie_acs_override=downstream,multifunction
# Full kernel args for SR-IOV + GPU-Direct:
intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction pci=reallocOpenShift MachineConfig:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-gpu-worker-acs-override
labels:
machineconfiguration.openshift.io/role: gpu-worker
spec:
kernelArguments:
- intel_iommu=on
- iommu=pt
- pcie_acs_override=downstream,multifunction
- pci=reallocTalos Linux:
machine:
install:
extraKernelArgs:
- intel_iommu=on
- iommu=pt
- pcie_acs_override=downstream,multifunction
- pci=reallocOption 3: Disable ACS in BIOS Only (Some Vendors)
Some server BIOS expose ACS as a toggle:
BIOS Location (vendor-specific):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Dell:
System BIOS β Integrated Devices β PCIe ACS: Disabled
HPE:
System Configuration β BIOS β PCIe β ACS Control: Disabled
Supermicro:
Advanced β PCIe/PCI/PnP β Access Control Services: Disabled
Lenovo:
UEFI β Devices and I/O β PCIe ACS: Disabled
Note: Not all BIOS versions expose this. If not available,
use kernel parameter override instead.Verify ACS Status
# Check if ACS is active on PCIe bridges
for bridge in $(lspci -d ::0604 | awk '{print $1}'); do
acs=$(setpci -s "$bridge" ECAP_ACS+6.w 2>/dev/null)
if [ -n "$acs" ] && [ "$acs" != "0000" ]; then
echo "β οΈ ACS ACTIVE on bridge $bridge: control=$acs"
lspci -s "$bridge"
fi
done
# If no output β ACS disabled/overridden β
# If bridges listed β ACS still blocking P2P βVerify GPU-Direct P2P Works
# Check P2P connectivity matrix
nvidia-smi topo -m
# Expected output (P2P enabled):
# GPU0 GPU1 GPU2 GPU3 mlx5_0
# GPU0 X NV12 NV12 NV12 SYS
# GPU1 NV12 X NV12 NV12 SYS
# GPU2 NV12 NV12 X NV12 NODE
# GPU3 NV12 NV12 NV12 X NODE
# Legend:
# NV12 = NVLink (best)
# PIX = PCIe switch (good, means P2P works)
# NODE = Same NUMA node via PCIe (good)
# SYS = Cross-NUMA (works but slower)
# X = Same device
# If you see "Connection not supported" β ACS is blocking
# Test P2P bandwidth directly
/usr/local/cuda/samples/bin/p2pBandwidthLatencyTest
# or
cuda-samples p2pBandwidthLatencyTest
# Expected: P2P bandwidth ~25 GB/s per direction (PCIe 4.0 x16)
# If P2P disabled: shows 0 or "P2P not supported"NCCL Transport Verification After ACS Disable
export NCCL_DEBUG=INFO
export NCCL_P2P_LEVEL=NVL # Use NVLink for intra-node
export NCCL_NET_GDR_LEVEL=5 # GPU-Direct RDMA for inter-node
export NCCL_IB_HCA=mlx5
# Run all_reduce benchmark
all_reduce_perf -b 8 -e 1G -f 2 -g 8
# Look for:
# "P2P/CUMEM" or "P2P/IPC" in channel info β P2P active β
# "SHM" β Shared memory (fallback, slower) β οΈ
# "NET/Socket" β TCP (worst case, ACS or RDMA broken) βComparison: Performance Impact
Configuration All-Reduce BW Impact
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
IOMMU off + no ACS ~380 Gb/s Baseline (best)
IOMMU pt + ACS override ~370 Gb/s -3% (negligible)
IOMMU pt + ACS enabled ~180 Gb/s -53% (P2P blocked)
IOMMU strict + ACS enabled ~120 Gb/s -68% (worst)
(8Γ A100/H100 + 4Γ ConnectX-7, all-reduce across 2 nodes)Quick Decision Flowchart
Do you run VMs or need device isolation?
βββ YES β Keep IOMMU on
β Do you need SR-IOV?
β βββ YES β iommu=pt + pcie_acs_override=downstream,multifunction
β βββ NO β iommu=pt (ACS won't matter without VFs)
β
βββ NO (bare-metal GPU training only)
β DISABLE VT-d/AMD-Vi in BIOS
Simplest. Best performance. No ACS issues.
(You lose: SR-IOV VFs, VM device passthrough)Common Issues
βP2P not supportedβ in nvidia-smi topo after ACS override
- Cause: Kernel compiled without ACS override support (some distros strip it)
- Fix: Check
grep ACS /boot/config-$(uname -r); use BIOS disable instead
SR-IOV fails after disabling IOMMU
- Cause: SR-IOV VFs require IOMMU for address translation
- Fix: Canβt use SR-IOV without IOMMU; use Option 2 (IOMMU + ACS override)
ACS override in kernel but setpci still shows active
- Cause:
pcie_acs_overridedoesnβt change hardware register β it tells kernel to ignore ACS - Fix: This is expected; IOMMU grouping changes even if setpci shows ACS bits
Node wonβt boot after removing IOMMU
- Cause: Some hyperconverged setups depend on IOMMU for storage
- Fix: Only disable IOMMU on dedicated GPU compute nodes, not infra nodes
Best Practices
- Bare-metal AI clusters: just disable VT-d β simplest, fastest, no ACS issues
- Mixed clusters: per-MachineConfigPool β gpu-worker pool has different kernel args
- Document the decision β why IOMMU is off (team will forget in 6 months)
- Test after every BIOS update β updates can reset VT-d to Enabled
- Verify with
nvidia-smi topo -mβ the ground truth for P2P connectivity - One config per node role β donβt apply GPU kernel args to infra nodes
- Cold reboot after BIOS changes β PCIe topology enumerated at POST only
Key Takeaways
- Simplest fix: Disable VT-d/AMD-Vi in BIOS entirely (if no VMs, no SR-IOV needed)
- If SR-IOV required: keep IOMMU on +
iommu=pt+pcie_acs_override=downstream,multifunction - ACS blocks GPU-to-GPU and GPU-to-NIC peer-to-peer DMA (53%+ bandwidth loss)
pcie_acs_overridetells kernel to ignore ACS on bridges (hardware unchanged)- Some BIOS have explicit ACS toggle (Dell, HPE, Supermicro) β disable there
- Verify with:
nvidia-smi topo -m(P2P matrix) +NCCL_DEBUG=INFO(transport selection) - Performance: IOMMU off β IOMMU pt + ACS override >> ACS enabled (-53%)
- Decision: bare-metal single-tenant β disable VT-d; multi-tenant/SR-IOV β keep + override

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
