Switch GPUDirect RDMA from nvidia-peermem to DMA-BUF
Migrate from the legacy nvidia-peermem kernel module to the recommended DMA-BUF GPUDirect RDMA path using the NVIDIA GPU Operator.
π‘ Quick Answer: Do not set
driver.rdma.enabled=trueβ that activates the legacynvidia-peermempath. Instead, setdriver.kernelModuleType=openand leave RDMA disabled to use the recommended DMA-BUF GPUDirect RDMA transport.
NVIDIA recommends DMA-BUF over the legacy nvidia-peermem kernel module for GPUDirect RDMA. DMA-BUF avoids a separate kernel module and is more future-proof.
Prerequisites Comparison
| Requirement | DMA-BUF | Legacy nvidia-peermem |
|---|---|---|
| GPU Driver | Open Kernel Module | Any |
| CUDA | 11.7+ | No minimum |
| GPU | Turing+ data center | All data center |
| MOFED | Optional | Required |
| Linux Kernel | 5.12+ | No minimum |
Step 1 β Verify Prerequisites
# Kernel version must be 5.12+
uname -r
# Check GPU architecture
nvidia-smi --query-gpu=gpu_name,compute_cap --format=csv
# Verify current module state
lsmod | grep peermemStep 2 β Install GPU Operator for DMA-BUF
For new installations, simply omit driver.rdma.enabled=true:
# With Network Operator managing NIC drivers
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--version=v25.10.1
# With host-installed MOFED
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--version=v25.10.1 \
--set driver.rdma.useHostMofed=trueStep 3 β Migrate Existing Installation
If you previously had driver.rdma.enabled=true, update the ClusterPolicy:
oc edit clusterpolicy gpu-cluster-policyspec:
driver:
kernelModuleType: open
rdma:
enabled: false # Disables legacy nvidia-peermemRestart driver pods:
oc delete pod -n gpu-operator -l app=nvidia-driver-daemonsetStep 4 β Verify DMA-BUF is Active
Confirm nvidia-peermem-ctr container is absent:
kubectl get ds -n gpu-operator nvidia-driver-daemonset -o yaml | grep -i peermem
# Expected: no outputCheck node annotations:
oc get nodes -o json | jq '.items[].metadata.annotations["nvidia.com/gpudirect-dmabuf"]'Step 5 β Validate with NCCL
NCCL_DEBUG=INFO NCCL_IB_HCA=mlx5_0 NCCL_NET_GDR_LEVEL=5 all_reduce_testLook for GPUDirect RDMA DMA-BUF enabled and confirm no using peer memory driver fallback.
Why This Matters
DMA-BUF is the modern, NVIDIA-recommended path that eliminates the nvidia-peermem kernel module dependency, reduces kernel version incompatibilities, and provides better long-term support.

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
