Compare NCCL Intra-Node vs Inter-Node Performance
Build a repeatable comparison between local and cross-node NCCL throughput to validate GPU cluster interconnect scaling and identify bottlenecks early.
π‘ Quick Answer: Run the same
all_reduce_perfprofile in single-node and dual-node scenarios, then compare bandwidth ratios to detect network bottlenecks.
This comparison shows whether your network fabric is limiting distributed workloads.
Standard Test Profile
all_reduce_perf -b 8 -e 1G -f 2 -g 1Test Design
- Intra-node: two or more GPUs on one node
- Inter-node: same GPU count split across two nodes
Analysis
- Record top
algbwfor both scenarios - Compute inter/intra ratio
- Investigate large drops with topology and network checks
Target Outcome
Clear and documented baseline for expected communication penalty when crossing nodes.

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
