πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
ai intermediate ⏱ 20 minutes K8s 1.28+

Compare NCCL Intra-Node vs Inter-Node Performance

Build a repeatable comparison between local and cross-node NCCL throughput to validate GPU cluster interconnect scaling and identify bottlenecks early.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Run the same all_reduce_perf profile in single-node and dual-node scenarios, then compare bandwidth ratios to detect network bottlenecks.

This comparison shows whether your network fabric is limiting distributed workloads.

Standard Test Profile

all_reduce_perf -b 8 -e 1G -f 2 -g 1

Test Design

  • Intra-node: two or more GPUs on one node
  • Inter-node: same GPU count split across two nodes

Analysis

  • Record top algbw for both scenarios
  • Compute inter/intra ratio
  • Investigate large drops with topology and network checks

Target Outcome

Clear and documented baseline for expected communication penalty when crossing nodes.

#nccl #intra-node #inter-node #benchmarking #gpu
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens