πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Storage intermediate ⏱ 15 minutes K8s 1.28+

NFSoRDMA Persistent Volume for Kubernetes

Create PersistentVolumes and StorageClasses for NFSoRDMA storage with RDMA transport, optimized mount options, and ReadWriteMany access.

By Luca Berton β€’ β€’ πŸ“– 6 min read

πŸ’‘ Quick Answer: Create a PersistentVolume with nfs.server pointing to the RDMA IP and mountOptions: [proto=rdma, port=20049, vers=4.1] to get NFS storage over RDMA with ReadWriteMany access.

The Problem

Standard NFS over TCP wastes CPU on protocol overhead and maxes out at 3-4 GB/s even on 100GbE. AI training workloads need shared storage with near-line-rate throughput and low latency. You need to expose NFSoRDMA storage as Kubernetes PersistentVolumes that pods can consume transparently.

The Solution

Create PVs with RDMA mount options that bypass TCP/IP, delivering 10-12 GB/s throughput with microsecond latency. Use static PVs for dedicated shares or the NFS CSI driver for dynamic provisioning.

Static PV with RDMA Mount Options

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfsordma-data
  labels:
    storage-type: rdma
    purpose: ai-training
spec:
  capacity:
    storage: 10Ti
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: 10.100.0.1      # RDMA interface IP of NFS server
    path: /export/ai-data
  mountOptions:
    - proto=rdma
    - port=20049             # NFSoRDMA port (not standard 2049)
    - vers=4.1
    - rsize=1048576          # 1MB read blocks
    - wsize=1048576          # 1MB write blocks
    - hard
    - timeo=600
    - retrans=2
    - nconnect=8             # 8 parallel RDMA connections
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ai-training-data
  namespace: ai-workloads
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Ti
  selector:
    matchLabels:
      storage-type: rdma
      purpose: ai-training

Multiple PVs for Different Workloads

# Dataset storage β€” large sequential reads
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfsordma-datasets
  labels:
    storage-type: rdma
    purpose: datasets
spec:
  capacity:
    storage: 50Ti
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: 10.100.0.1
    path: /export/datasets
  mountOptions:
    - proto=rdma
    - port=20049
    - vers=4.1
    - rsize=1048576
    - wsize=1048576
    - hard
    - nconnect=16            # More connections for parallel reads
    - noatime                # Skip access time updates
---
# Checkpoint storage β€” frequent small writes
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfsordma-checkpoints
  labels:
    storage-type: rdma
    purpose: checkpoints
spec:
  capacity:
    storage: 5Ti
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: 10.100.0.1
    path: /export/checkpoints
  mountOptions:
    - proto=rdma
    - port=20049
    - vers=4.1
    - rsize=1048576
    - wsize=1048576
    - hard
    - sync                   # Sync writes for checkpoint integrity
    - nconnect=4
---
# Model output storage
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfsordma-models
  labels:
    storage-type: rdma
    purpose: models
spec:
  capacity:
    storage: 2Ti
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: 10.100.0.1
    path: /export/models
  mountOptions:
    - proto=rdma
    - port=20049
    - vers=4.1
    - rsize=1048576
    - wsize=1048576
    - hard
    - nconnect=8

PVCs for AI Training Pods

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: datasets
  namespace: ai-workloads
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Ti
  selector:
    matchLabels:
      storage-type: rdma
      purpose: datasets
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: checkpoints
  namespace: ai-workloads
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Ti
  selector:
    matchLabels:
      storage-type: rdma
      purpose: checkpoints
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: models
  namespace: ai-workloads
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 2Ti
  selector:
    matchLabels:
      storage-type: rdma
      purpose: models

Pod Using NFSoRDMA Volumes

apiVersion: v1
kind: Pod
metadata:
  name: training-job
  namespace: ai-workloads
spec:
  nodeSelector:
    feature.node.kubernetes.io/rdma: "true"
  containers:
    - name: trainer
      image: nvcr.io/nvidia/pytorch:24.03-py3
      command: ["torchrun", "--nproc_per_node=8", "train.py"]
      resources:
        limits:
          nvidia.com/gpu: 8
      volumeMounts:
        - name: datasets
          mountPath: /data
          readOnly: true
        - name: checkpoints
          mountPath: /checkpoints
        - name: models
          mountPath: /models
  volumes:
    - name: datasets
      persistentVolumeClaim:
        claimName: datasets
    - name: checkpoints
      persistentVolumeClaim:
        claimName: checkpoints
    - name: models
      persistentVolumeClaim:
        claimName: models

Dynamic Provisioning with NFS CSI Driver

# Install NFS CSI driver
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs \
  --namespace kube-system \
  --set driver.mountPermissions=0777
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfsordma-dynamic
provisioner: nfs.csi.k8s.io
parameters:
  server: 10.100.0.1
  share: /export/dynamic
mountOptions:
  - proto=rdma
  - port=20049
  - vers=4.1
  - rsize=1048576
  - wsize=1048576
  - hard
  - nconnect=8
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
---
# Dynamic PVC β€” CSI driver creates subdirectory automatically
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-rdma-storage
  namespace: ai-workloads
spec:
  storageClassName: nfsordma-dynamic
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 500Gi

Verify RDMA Mount

# Check mount is using RDMA transport
kubectl exec -n ai-workloads training-job -- mount | grep nfs
# 10.100.0.1:/export/datasets on /data type nfs4 (ro,proto=rdma,port=20049,...)

# Verify RDMA connections
kubectl exec -n ai-workloads training-job -- cat /proc/mounts | grep rdma

# Performance test from inside the pod
kubectl exec -n ai-workloads training-job -- \
  dd if=/data/test-file of=/dev/null bs=1M count=4096 iflag=direct
# Expected: 8-12 GB/s with RDMA + jumbo frames

# Check NFS stats
kubectl exec -n ai-workloads training-job -- nfsstat -c
graph TD
    A[PersistentVolume] -->|proto=rdma port=20049| B[NFS Server RDMA IP]
    C[PVC datasets] --> A
    D[PVC checkpoints] --> E[PV checkpoints]
    E -->|proto=rdma| B
    F[Training Pod] -->|/data| C
    F -->|/checkpoints| D
    F -->|/models| G[PVC models]
    
    H[StorageClass nfsordma-dynamic] -->|CSI Driver| I[Dynamic PV]
    I -->|proto=rdma| B
    
    J[Node Selector rdma=true] -->|Ensures| K[Pod on RDMA-capable node]

Common Issues

  • Mount fails with proto=rdma β€” node must have RDMA kernel modules loaded (rdma_cm, ib_core); verify with lsmod | grep rdma
  • PVC stuck Pending β€” check label selectors match between PV and PVC; verify PV is Available not Released
  • Slow performance despite RDMA β€” check nconnect value; verify jumbo frames (MTU 9000) end-to-end; check noatime for read-heavy workloads
  • Permission denied on mount β€” NFS server exports must allow the node IPs; check /etc/exports on NFS server
  • Pod scheduled on non-RDMA node β€” add nodeSelector with feature.node.kubernetes.io/rdma: "true" to ensure RDMA-capable placement

Best Practices

  • Use proto=rdma,port=20049 β€” standard NFS port 2049 uses TCP
  • Set nconnect=8 minimum, nconnect=16 for dataset reads with many workers
  • Always use hard mount β€” soft mounts can cause silent data corruption
  • Separate PVs by purpose (datasets, checkpoints, models) for different mount options
  • Use noatime for read-heavy workloads to reduce metadata overhead
  • Use sync for checkpoint PVs to ensure write durability
  • Label PVs with storage-type: rdma for selector-based PVC binding
  • Add nodeSelector to pods to ensure scheduling on RDMA-capable nodes

Key Takeaways

  • NFSoRDMA PVs use proto=rdma,port=20049 mount options for RDMA transport
  • ReadWriteMany access enables shared storage across multi-node training jobs
  • Static PVs for dedicated shares, NFS CSI driver for dynamic provisioning
  • nconnect multiplies parallelism, rsize/wsize=1048576 optimizes block size
  • Always pair with jumbo frames (MTU 9000) and RDMA node selectors
#nfsordma #rdma #persistent-volume #nfs #storage
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens