πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Networking advanced ⏱ 45 minutes K8s 1.28+

Switch GPUDirect RDMA from nvidia-peermem to DMA-BUF

Migrate from the legacy nvidia-peermem kernel module to the recommended DMA-BUF GPUDirect RDMA path using the NVIDIA GPU Operator.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Do not set driver.rdma.enabled=true β€” that activates the legacy nvidia-peermem path. Instead, set driver.kernelModuleType=open and leave RDMA disabled to use the recommended DMA-BUF GPUDirect RDMA transport.

NVIDIA recommends DMA-BUF over the legacy nvidia-peermem kernel module for GPUDirect RDMA. DMA-BUF avoids a separate kernel module and is more future-proof.

Prerequisites Comparison

RequirementDMA-BUFLegacy nvidia-peermem
GPU DriverOpen Kernel ModuleAny
CUDA11.7+No minimum
GPUTuring+ data centerAll data center
MOFEDOptionalRequired
Linux Kernel5.12+No minimum

Step 1 β€” Verify Prerequisites

# Kernel version must be 5.12+
uname -r

# Check GPU architecture
nvidia-smi --query-gpu=gpu_name,compute_cap --format=csv

# Verify current module state
lsmod | grep peermem

Step 2 β€” Install GPU Operator for DMA-BUF

For new installations, simply omit driver.rdma.enabled=true:

# With Network Operator managing NIC drivers
helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator \
  --version=v25.10.1

# With host-installed MOFED
helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator \
  --version=v25.10.1 \
  --set driver.rdma.useHostMofed=true

Step 3 β€” Migrate Existing Installation

If you previously had driver.rdma.enabled=true, update the ClusterPolicy:

oc edit clusterpolicy gpu-cluster-policy
spec:
  driver:
    kernelModuleType: open
    rdma:
      enabled: false    # Disables legacy nvidia-peermem

Restart driver pods:

oc delete pod -n gpu-operator -l app=nvidia-driver-daemonset

Step 4 β€” Verify DMA-BUF is Active

Confirm nvidia-peermem-ctr container is absent:

kubectl get ds -n gpu-operator nvidia-driver-daemonset -o yaml | grep -i peermem
# Expected: no output

Check node annotations:

oc get nodes -o json | jq '.items[].metadata.annotations["nvidia.com/gpudirect-dmabuf"]'

Step 5 β€” Validate with NCCL

NCCL_DEBUG=INFO NCCL_IB_HCA=mlx5_0 NCCL_NET_GDR_LEVEL=5 all_reduce_test

Look for GPUDirect RDMA DMA-BUF enabled and confirm no using peer memory driver fallback.

Why This Matters

DMA-BUF is the modern, NVIDIA-recommended path that eliminates the nvidia-peermem kernel module dependency, reduces kernel version incompatibilities, and provides better long-term support.

#nvidia #gpu #rdma #dma-buf #gpudirect #networking
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens