Validate GPU & NIC Topology Before NCCL Ben...
Inspect node-level GPU, NIC, and PCI topology on Kubernetes workers to predict and explain NCCL benchmark performance before running tests.
π‘ Quick Answer: Run
nvidia-smi topo -mandlspcimapping checks first; poor physical topology often explains low NCCL bandwidth without any software bug.
Topology awareness prevents false conclusions during NCCL troubleshooting.
Commands to Run
nvidia-smi topo -m
lspci | grep -Ei 'NVIDIA|Mellanox|Ethernet|Infiniband'What to Confirm
- GPUs used by your pod are local to the expected PCI root complex.
- High-speed NICs are attached to suitable CPU/PCI paths.
- Node hardware is homogeneous across benchmark participants.
Practical Outcome
Use topology results to define realistic performance targets for intra-node and inter-node NCCL tests.
Validate GPU Topology for NCCL
Verify GPU topology and NVLink connectivity before running distributed training to ensure optimal NCCL performance.
Check GPU Topology
# Run nvidia-smi topo in a K8s pod
kubectl exec -it gpu-worker -- nvidia-smi topo -m
# Expected output for 8ΓH100 with NVSwitch:
# GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
# GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18
# GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18
# ...
# NV18 = NVLink (18 links, full mesh via NVSwitch)
# SYS = PCIe + QPI (cross-socket, worst case)
# PHB = PCIe (same host bridge)Validate NVLink Bandwidth
apiVersion: batch/v1
kind: Job
metadata:
name: nvlink-bandwidth-test
spec:
template:
spec:
containers:
- name: test
image: nvcr.io/nvidia/pytorch:25.11-py3
command: ["bash", "-c"]
args:
- |
# P2P bandwidth test between all GPU pairs
/usr/local/cuda/extras/demo_suite/p2pBandwidthLatencyTest
# NCCL all_reduce test (measures collective bandwidth)
cd /opt/nccl-tests && ./build/all_reduce_perf -b 1M -e 1G -f 2 -g 8
resources:
limits:
nvidia.com/gpu: 8
restartPolicy: NeverInterpret NCCL Test Results
# Good result (H100 NVSwitch): ~450 GB/s bus bandwidth
# size time algbw busbw
# 1073741824 2.56 419.43 733.00
# Bad result (PCIe fallback): ~25 GB/s
# Check: is NVLink actually connected? Run nvidia-smi nvlink -s
kubectl exec gpu-worker -- nvidia-smi nvlink -sTopology Validation DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: gpu-topology-validator
spec:
selector:
matchLabels:
app: gpu-topology-validator
template:
spec:
nodeSelector:
nvidia.com/gpu.present: "true"
containers:
- name: validator
image: nvcr.io/nvidia/pytorch:25.11-py3
command: ["bash", "-c"]
args:
- |
TOPO=$(nvidia-smi topo -m)
if echo "$TOPO" | grep -q "SYS"; then
echo "WARNING: Cross-socket GPU communication detected"
echo "$TOPO"
exit 1
fi
echo "GPU topology OK β all NVLink connected"
sleep infinity
resources:
limits:
nvidia.com/gpu: 1
Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
