Diagnose GPU Peer-to-Peer Latency NCCL Tests

Use NCCL point-to-point and collective tests to isolate GPU peer-to-peer latency issues between GPU pairs in multi-node Kubernetes clusters.

By Luca Berton • February 17, 2026 • 📖 5 min read

💡 Quick Answer: Compare latency with small-message runs such as all_reduce_perf -b 8 -e 8M -f 2 -g 1 across different GPU pairs and nodes to identify outliers.

High latency usually points to topology or transport path issues.

Fast Latency Test

all_reduce_perf -b 8 -e 8M -f 2 -g 1

Isolation Strategy

Test within one node first.
Test cross-node with same pod specs.
Repeat with pinned nodes and interfaces.

Correlate With Topology

Inside each pod:

nvidia-smi topo -m

Use topology distance to explain expected latency differences.

Common Root Causes

Wrong data interface selected
RDMA disabled or unavailable
Mixed firmware/driver versions across nodes

#nccl #latency #p2p #gpu #troubleshooting

Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

🌐 💼 💻

🎓 Deepen Your Skills — Hands-on Courses

🧪

MLflow on Kubernetes — MLOps

Master ML lifecycle management with MLflow on Kubernetes — tracking, registry, and deployment.

Start Learning →

Courses by CopyPasteLearn.com — Learn IT by Doing

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Get the Book → ← More Troubleshooting Recipes

← Back to All Recipes

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens