NFSoRDMA Troubleshooting and Performance
Troubleshoot NFS over RDMA connectivity issues, diagnose TCP fallback, tune performance, and benchmark RDMA throughput on Kubernetes workers.
π‘ Quick Answer: Check
mountstatsforxprt: rdma(nottcp). If RDMA fails, verify: 1)xprtrdmamodule loaded, 2) NFS server port 20049 RDMA enabled, 3) no VLAN tagging on the interface, 4) MTU matches end-to-end.
The Problem
NFSoRDMA fails silently β NFS mounts succeed but fall back to TCP without any error. You see normal NFS operation but at TCP speeds instead of RDMA speeds. Common scenarios:
- Mount works but uses TCP β most common issue, hard to detect
- RDMA connection refused β NFS server not configured for RDMA
- Intermittent RDMA failures β MTU mismatch, switch misconfiguration
- Performance below expectations β wrong rsize/wsize, congestion, buffer issues
The Solution
Step 1: Diagnostic Checklist
Run through this checklist systematically:
# 1. Is xprtrdma module loaded?
oc debug node/worker-0 -- chroot /host lsmod | grep rdma
# Expected: xprtrdma, ib_core, mlx5_ib, rdma_cm
# 2. Are RDMA devices present?
oc debug node/worker-0 -- chroot /host rdma link show
# Expected: link mlx5_0/1 state ACTIVE netdev ens3f0
# 3. Is the NIC RDMA-capable and up?
oc debug node/worker-0 -- chroot /host ibstat
# Expected: State: Active, Physical state: LinkUp
# 4. Is there a VLAN sub-interface? (should NOT exist)
oc debug node/worker-0 -- chroot /host ip -d link show | grep vlan
# Expected: nothing on RDMA interfaces
# 5. Does MTU match?
oc debug node/worker-0 -- chroot /host ip link show ens3f0 | grep mtu
# Expected: mtu 9000
# 6. Is the NFS server listening on RDMA?
oc debug node/worker-0 -- chroot /host \
rpcinfo -T rdma 10.90.0.1 nfs 4
# Expected: program 100003 version 4 ready and waiting
# 7. Can RDMA reach the server?
oc debug node/worker-0 -- chroot /host \
ib_write_lat -d mlx5_0 10.90.0.1
# Expected: latency ~1-2 microsecondsStep 2: Detect TCP Fallback
The most critical check β is NFS actually using RDMA?
# Check mountstats for transport type
oc debug node/worker-0 -- chroot /host \
cat /proc/self/mountstats | grep -A20 "10.90.0.1"
# Look for this line:
# xprt: rdma 0 0 ... β RDMA is working
# xprt: tcp 0 ... β fell back to TCP!
# Quick one-liner
oc debug node/worker-0 -- chroot /host \
grep -E "xprt:" /proc/self/mountstatsStep 3: Fix Common TCP Fallback Causes
# Cause 1: Module not loaded
oc debug node/worker-0 -- chroot /host modprobe xprtrdma
# Permanent fix: MachineConfig (see nfsordma-worker-node-setup)
# Cause 2: Wrong port
# Must use port=20049 for RDMA, NOT default 2049
mount -t nfs4 -o rdma,port=20049 10.90.0.1:/export /mnt
# Cause 3: VLAN interface instead of dedicated NIC
# Remove any VLAN sub-interface on the RDMA NIC
ip link delete ens3f0.90 2>/dev/null
# Use switch access mode instead
# Cause 4: NFS server not configured for RDMA
# On NFS server, check /etc/nfs.conf:
# [nfsd]
# rdma=y
# rdma-port=20049Step 4: Performance Benchmarking
# Raw RDMA bandwidth test
# Server side:
ib_write_bw -d mlx5_0 --report_gbits
# Client side:
oc debug node/worker-0 -- chroot /host \
ib_write_bw -d mlx5_0 --report_gbits 10.90.0.1
# Expected: 90-100 Gbps for ConnectX-6, 40-50 for ConnectX-5
# Raw RDMA latency test
# Server: ib_write_lat -d mlx5_0
# Client:
oc debug node/worker-0 -- chroot /host \
ib_write_lat -d mlx5_0 10.90.0.1
# Expected: 1-2 microseconds
# NFS over RDMA throughput (sequential write)
oc debug node/worker-0 -- chroot /host \
dd if=/dev/zero of=/mnt/nfsordma/bench bs=1M count=4096 oflag=direct
# Expected: 2-5 GB/s (vs 0.5-1 GB/s over TCP)
# NFS over RDMA throughput (fio random read)
oc debug node/worker-0 -- chroot /host \
fio --name=randread --ioengine=libaio --direct=1 \
--bs=4k --iodepth=64 --numjobs=4 \
--rw=randread --size=1G \
--directory=/mnt/nfsordma \
--group_reportingStep 5: Performance Tuning
# Optimal NFS mount options for RDMA
mount -t nfs4 -o \
rdma,\
port=20049,\
vers=4.2,\
rsize=1048576,\
wsize=1048576,\
hard,\
nointr,\
nconnect=8 \
10.90.0.1:/exports/data /mnt/nfsordma
# nconnect=8 creates multiple RDMA connections
# rsize/wsize=1M maximizes per-operation transfer size# PV with optimized mount options
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfsordma-tuned
spec:
capacity:
storage: 1Ti
accessModes:
- ReadWriteMany
nfs:
server: 10.90.0.1
path: /exports/data
mountOptions:
- rdma
- port=20049
- vers=4.2
- rsize=1048576
- wsize=1048576
- hard
- nconnect=8Step 6: Monitor RDMA Health
# Check RDMA error counters
oc debug node/worker-0 -- chroot /host \
rdma statistic show link mlx5_0/1
# Check for packet drops
oc debug node/worker-0 -- chroot /host \
ethtool -S ens3f0 | grep -E "drop|err|discard"
# Monitor NFS RDMA statistics
oc debug node/worker-0 -- chroot /host \
nfsstat -c | head -20
# Watch for RDMA connection resets
oc debug node/worker-0 -- chroot /host \
dmesg | grep -i rdma | tail -20Troubleshooting Decision Tree
flowchart TD
A[NFS mount works but slow] --> B{Check mountstats xprt}
B -->|xprt tcp| C[TCP fallback - not using RDMA]
B -->|xprt rdma| D[RDMA active - tune performance]
C --> E{xprtrdma loaded?}
E -->|No| F[modprobe xprtrdma]
E -->|Yes| G{Port 20049?}
G -->|No using 2049| H[Remount with port=20049]
G -->|Yes| I{VLAN on NIC?}
I -->|Yes| J[Remove VLAN - use switch access mode]
I -->|No| K{Server RDMA enabled?}
K -->|No| L[Enable rdma=y in nfs.conf]
K -->|Yes| M[Check MTU and switch config]
D --> N{Speed as expected?}
N -->|No| O[Tune rsize wsize nconnect]
N -->|Yes| P[Working correctly]Common Issues
RDMA connection resets under load
# Check for RNR (Receiver Not Ready) retries
oc debug node/worker-0 -- chroot /host \
rdma statistic show link mlx5_0/1 | grep rnr
# Increase RNR retry count
oc debug node/worker-0 -- chroot /host \
sysctl -w net.rdma.rnr_retry=7
# Check for congestion
oc debug node/worker-0 -- chroot /host \
mlnx_qos -i ens3f0nconnect not supported
# nconnect requires kernel 5.3+ and NFSv4.x
# Check kernel version
oc debug node/worker-0 -- chroot /host uname -r
# If nconnect is not available, increase rsize/wsize instead
# and use multiple PV mounts from different export pathsRDMA works node-to-node but not to NFS server
# Switch may have different VLAN config for server ports
# Verify server port is also in access mode for the same VLAN
# Check ARP resolution
oc debug node/worker-0 -- chroot /host \
arping -I ens3f0 10.90.0.1
# Check for firewall blocking RDMA
# RDMA uses different ports than TCP NFS
# Ensure port 20049 (NFS RDMA) is openBest Practices
- Always verify with mountstats β the only reliable way to confirm RDMA transport
- Benchmark before and after β measure TCP NFS first, then RDMA, to quantify improvement
- Use
nconnect=8for multi-stream parallelism β multiplies throughput for concurrent I/O - Set
rsize=1048576andwsize=1048576β 1MB buffers maximize RDMA transfer efficiency - Monitor RDMA error counters β
rdma statistic showcatches hardware issues early - Check
dmesgfor RDMA errors β silent failures often log kernel messages - Keep a TCP NFS backup path β if RDMA fails, workloads can fall back to TCP NFS on the management network
Key Takeaways
- NFSoRDMA silently falls back to TCP β always verify with
/proc/self/mountstats - The diagnostic checklist: module loaded β RDMA devices present β no VLAN tagging β MTU match β server port 20049
nconnect=8with largersize/wsizemaximizes RDMA throughput- Raw RDMA should deliver 90-100 Gbps (ConnectX-6); NFS over RDMA achieves 2-5 GB/s at the application level
- Monitor
rdma statistic showandethtool -Sfor hardware-level errors - TCP fallback is the most common problem β caused by missing module, wrong port, VLAN tagging, or server misconfiguration

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
