NCCL DMABUF Enable for GPUDirect RDMA on Kubernetes
Enable NCCL DMA-BUF support for GPUDirect RDMA in Kubernetes GPU clusters. Covers NCCL_DMABUF_ENABLE=1, kernel requirements, nvidia-peermem vs dmabuf, GPU
π‘ Quick Answer: Set
NCCL_DMABUF_ENABLE=1to enable DMA-BUF based GPUDirect RDMA in NCCL 2.17+. This is the modern replacement for nvidia-peermem kernel module registration. Requires Linux kernel β₯ 5.12, CUDA β₯ 12.0, and MOFED β₯ 5.5. Verify withNCCL_DEBUG=INFOβ look for βGPU Direct RDMA Enabledβ and βDMA-BUFβ in logs.
The Problem
- Legacy GPUDirect RDMA relied on
nvidia-peermemkernel module for GPUβNIC DMA - nvidia-peermem has kernel version compatibility issues and requires module loading
- DMA-BUF is the upstream Linux kernel standard for cross-device DMA (no out-of-tree modules)
- Need to enable DMA-BUF in NCCL without breaking fallback to peermem
- Must verify the correct path is active in production
The Solution
Enabling DMA-BUF in NCCL
# In MPIJob worker and launcher env:
env:
- name: NCCL_DMABUF_ENABLE
value: "1" # Enable DMA-BUF for GPUDirect RDMA
- name: NCCL_NET_GDR_LEVEL
value: "PHB" # Control which GPU-NIC pairs use RDMA
- name: NCCL_DEBUG
value: "INFO" # Verify DMA-BUF is activeDMA-BUF vs nvidia-peermem
Feature β nvidia-peermem β DMA-BUF (dmabuf)
ββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
Kernel module β nvidia-peermem.ko β None (in-kernel)
Min kernel version β Any (out-of-tree) β 5.12+
Registration β Module load + ib_registerβ Automatic via fd
NCCL support β NCCL 2.x (default) β NCCL 2.17+ (opt-in)
GPU Operator β driver.rdma.enabled β No config needed
Stability β Can break on upgrades β Upstream-stable
CUDA requirement β CUDA 11.x+ β CUDA 12.0+
ββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ
DMA-BUF is preferred when kernel and CUDA versions support it.
nvidia-peermem remains as fallback for older kernels.Verifying DMA-BUF in NCCL Logs
# With NCCL_DMABUF_ENABLE=1 and NCCL_DEBUG=INFO:
# Success β DMA-BUF active:
NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 0 (distance 9 <= 9), read 1 mode Default
NCCL INFO Using DMA-BUF for GPU Direct RDMA
# Fallback to peermem (DMA-BUF not available):
NCCL INFO GPU Direct RDMA Enabled for GPU 0 / HCA 0, using nvidia-peermem
# No RDMA at all (distance too far or disabled):
NCCL INFO Channel 0/0 : 0[0] -> 2[0] [send] via NET/IB/0 # No GDRDMA suffixKernel Requirements Check
# Verify kernel version (need >= 5.12)
uname -r
# Expected: 5.14.0-xxx or higher (RHEL 9 / OpenShift 4.12+)
# Check DMA-BUF support in kernel config
grep CONFIG_DMA_SHARED_BUFFER /boot/config-$(uname -r)
# Expected: CONFIG_DMA_SHARED_BUFFER=y
# Verify nvidia-peermem is loaded (backup path)
lsmod | grep nvidia_peermem
# nvidia_peermem 16384 0
# Check if MOFED supports DMA-BUF
ofed_info -s
# Expected: MLNX_OFED_LINUX-5.5-x.x.x or higherGPU Operator ClusterPolicy Configuration
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: gpu-cluster-policy
spec:
driver:
enabled: true
rdma:
enabled: true # Loads nvidia-peermem (backup for DMA-BUF)
useHostMOFED: true # Use host MOFED instead of container MOFED
gds:
enabled: true # GPUDirect Storage (nvidia-fs)
# DMA-BUF doesn't need GPU Operator configuration β
# it's enabled at the NCCL level via environment variablePod Spec with DMA-BUF
containers:
- name: worker
image: registry.example.com/nccl-validator:v6
env:
- name: NCCL_DMABUF_ENABLE
value: "1"
- name: NCCL_NET_GDR_LEVEL
value: "SYS"
- name: NCCL_IB_DISABLE
value: "0"
resources:
limits:
nvidia.com/gpu: 2
openshift.io/mellanoxnics: 1 # Provides RDMA VF
securityContext:
capabilities:
add:
- NET_RAW # For RDMA verbs
volumeMounts:
- name: dshm
mountPath: /dev/shmRelationship to Other NCCL Settings
NCCL_DMABUF_ENABLE=1 β Enables DMA-BUF path for GPU memory registration
NCCL_NET_GDR_LEVEL=PHB β Controls which GPU-NIC pairs can use GPUDirect
NCCL_IB_DISABLE=0 β Enables InfiniBand/RoCE transport (required)
NCCL_NET_PLUGIN=none β Disables IB plugin (falls back to socket)
β Remove this to enable RDMA!
NCCL_NET_GDR_READ=0/1 β Allow remote-side GDR reads (advanced)
Flow:
NCCL_IB_DISABLE=0 β IB transport enabled
β NCCL_NET_GDR_LEVEL=PHB β check PCIe distance
β distance OK β NCCL_DMABUF_ENABLE=1 β register GPU memory via DMA-BUF
β GPUDirect RDMA active (GPU β NIC β wire β NIC β GPU, zero CPU copies)Common Issues
DMA-BUF not activating despite NCCL_DMABUF_ENABLE=1
- Cause: Kernel too old (< 5.12) or CUDA < 12.0
- Fix: Upgrade to RHEL 9 / OpenShift 4.12+ and CUDA 12.x container
nvidia-peermem and DMA-BUF both loaded
- Cause: Normal. NCCL prefers DMA-BUF when available, falls back to peermem
- Fix: No action needed. Both can coexist.
NCCL_NET_PLUGIN=none disables RDMA entirely
- Cause: βnoneβ means no network plugin β socket transport only
- Fix: Remove
NCCL_NET_PLUGINenv var to let NCCL auto-detect IB plugin
Performance same with and without DMABUF_ENABLE
- Cause: nvidia-peermem already providing GPUDirect path
- Fix: Both paths achieve similar performance. DMA-BUF advantage is stability, not speed.
Best Practices
- Always set
NCCL_DMABUF_ENABLE=1on CUDA 12+ with kernel 5.12+ - Keep nvidia-peermem loaded as fallback β it doesnβt conflict
- Donβt set
NCCL_NET_PLUGIN=nonein production β that disables RDMA - Use
NCCL_DEBUG=INFOto verify which path is active - Test with and without to confirm RDMA is actually improving bandwidth
- Check
NCCL_IB_DISABLE=0β DMA-BUF is useless if IB transport is off
Key Takeaways
NCCL_DMABUF_ENABLE=1is the modern GPUDirect RDMA registration method- Requires kernel β₯ 5.12, CUDA β₯ 12.0, MOFED β₯ 5.5
- Coexists with nvidia-peermem (NCCL prefers DMA-BUF when both available)
- No GPU Operator configuration needed β purely NCCL environment variable
NCCL_NET_PLUGIN=nonedisables RDMA β remove it for production- Verify via NCCL_DEBUG=INFO: βGPU Direct RDMA Enabledβ + βDMA-BUFβ
- Combined with
NCCL_NET_GDR_LEVELto control which pairs use GPUDirect

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
