πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Networking advanced ⏱ 60 minutes K8s 1.28+

Configure GPUDirect RDMA with the NVIDIA GPU Operator

Set up GPUDirect RDMA on Kubernetes using the NVIDIA GPU Operator with either DMA-BUF or legacy nvidia-peermem, including Network Operator integration.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Install the GPU Operator and Network Operator together. DMA-BUF is the default RDMA transport; add --set driver.rdma.enabled=true only if you need the legacy nvidia-peermem path.

GPUDirect RDMA enables direct data transfer between GPUs and network devices over PCI Express, bypassing CPU memory copies. The GPU Operator and Network Operator work together to configure the full stack.

Platform Support

GPUDirect RDMA is supported on:

  • Kubernetes on bare metal
  • vSphere VMs with GPU passthrough and vGPU
  • VMware vSphere with Tanzu
  • Red Hat OpenShift (via NVIDIA AI Enterprise)

Installation

helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator \
  --version=v25.10.1

With Host MOFED

helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator \
  --version=v25.10.1 \
  --set driver.rdma.useHostMofed=true

Legacy nvidia-peermem Mode

helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator \
  --version=v25.10.1 \
  --set driver.rdma.enabled=true

Verify the Installation

Check the driver DaemonSet structure:

kubectl describe ds -n gpu-operator nvidia-driver-daemonset

Look for:

  • mofed-validation init container β€” waits for network drivers
  • nvidia-driver-ctr β€” main driver container
  • nvidia-peermem-ctr β€” present only when driver.rdma.enabled=true

If using legacy mode, verify the module loaded:

kubectl logs -n gpu-operator ds/nvidia-driver-daemonset -c nvidia-peermem-ctr

Expected output:

successfully loaded nvidia-peermem module

Set Up a Test Network

Create a secondary macvlan network for RDMA traffic:

apiVersion: mellanox.com/v1alpha1
kind: MacvlanNetwork
metadata:
  name: rdma-test-network
spec:
  networkNamespace: "default"
  master: "ens64np1"   # Replace with your IB interface
  mode: "bridge"
  mtu: 1500
  ipam: |
    {
      "type": "whereabouts",
      "range": "192.168.2.225/28"
    }
kubectl apply -f rdma-test-network.yaml
kubectl get macvlannetworks rdma-test-network

Why This Matters

GPUDirect RDMA eliminates CPU-staged copies for GPU-to-GPU network communication, enabling near line-rate throughput for distributed training and HPC workloads.

#nvidia #gpu #rdma #gpudirect #networking #gpu-operator
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens