πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Networking advanced ⏱ 15 minutes K8s 1.28+

NCCL_SOCKET_IFNAME Environment Variable Guide

Configure NCCL_SOCKET_IFNAME for multi-node GPU training on Kubernetes. Network interface selection, bonding, InfiniBand, and troubleshooting NCCL timeouts.

By Luca Berton β€’ β€’ πŸ“– 6 min read

πŸ’‘ Quick Answer: `NCCL_SOCKET_IFNAME` tells NCCL which network interface to use for GPU-to-GPU communication. Set it to your high-speed interface (e.g., `eth0`, `bond0`, `ib0`) to avoid NCCL using a slow or wrong interface. Prefix with `^{name}` to exclude interfaces.

The Problem

Multi-node GPU training on Kubernetes uses NCCL (NVIDIA Collective Communications Library) for GPU-to-GPU data transfers. By default, NCCL auto-detects the network interface β€” but on nodes with multiple NICs (management, storage, high-speed fabric), it often picks the wrong one, causing:

  • NCCL timeouts (`NCCL WARN Timeout`)
  • Extremely slow all-reduce operations
  • Training jobs hanging at initialization
  • Non-deterministic failures (works sometimes, not others)
flowchart LR
    subgraph Node["GPU Node (Multiple NICs)"]
        GPU["GPUs"]
        ETH0["eth0 (1Gbps mgmt)"]
        BOND0["bond0 (100Gbps)"]
        IB0["ib0 (200Gbps IB)"]
        LO["lo (loopback)"]
    end
    GPU -->|"❌ NCCL picks eth0"| ETH0
    GPU -->|"βœ… Want NCCL to use"| IB0

The Solution

Basic Usage

env:
  # Use a specific interface
  - name: NCCL_SOCKET_IFNAME
    value: "eth0"

  # Use InfiniBand
  - name: NCCL_SOCKET_IFNAME
    value: "ib0"

  # Use a bonded interface
  - name: NCCL_SOCKET_IFNAME
    value: "bond0"

  # Prefix match (any interface starting with "eth")
  - name: NCCL_SOCKET_IFNAME
    value: "eth"

  # Exclude interfaces (use ^ prefix)
  - name: NCCL_SOCKET_IFNAME
    value: "^lo,docker0,veth"

Common Configurations

Cluster TypeNCCL_SOCKET_IFNAMENotes
Cloud (AWS/GCP/Azure)`eth0` or `ens5`Primary high-speed NIC
On-prem InfiniBand`ib0`IB fabric for RDMA
Bonded NICs`bond0`Aggregated link
SR-IOV`net1`Multus secondary interface
Exclusion list`^lo,docker0,veth,flannel,cni0`Block known slow/CNI interfaces

Kubernetes Deployment Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: distributed-training
spec:
  template:
    spec:
      containers:
        - name: trainer
          image: nvcr.io/nvidia/pytorch:24.04-py3
          env:
            # Network interface selection
            - name: NCCL_SOCKET_IFNAME
              value: "bond0"

            # Complementary NCCL settings
            - name: NCCL_IB_DISABLE
              value: "0"           # Enable InfiniBand (0=enabled)
            - name: NCCL_IB_HCA
              value: "mlx5"        # Use Mellanox HCA
            - name: NCCL_NET_GDR_LEVEL
              value: "5"           # GPUDirect RDMA level
            - name: NCCL_DEBUG
              value: "INFO"        # Enable debug logging

            # Timeout (increase for large clusters)
            - name: NCCL_TIMEOUT
              value: "1800000"     # 30 minutes in ms
          resources:
            limits:
              nvidia.com/gpu: 8

How to Find the Right Interface

# On a GPU node, check available interfaces
kubectl exec -it gpu-pod -- ip addr show

# Look for high-speed interfaces:
# - ib0/ib1: InfiniBand (100-400 Gbps)
# - bond0: Bonded NICs
# - eth0/ens5: Primary ethernet
# - net1/net2: SR-IOV/Multus secondary interfaces

# Check interface speed
kubectl exec -it gpu-pod -- ethtool eth0 | grep Speed
# Speed: 100000Mb/s  ← 100 Gbps, good for NCCL

kubectl exec -it gpu-pod -- ethtool eth1 | grep Speed
# Speed: 1000Mb/s    ← 1 Gbps, too slow for NCCL

# For InfiniBand
kubectl exec -it gpu-pod -- ibstat | grep -A5 "Port 1"
# Rate: 200 Gbps

Using Downward API for Dynamic Configuration

# If interface name varies by node, use an init container
initContainers:
  - name: detect-interface
    image: busybox
    command:
      - sh
      - -c
      - |
        # Find the fastest non-loopback interface
        IFACE=$(ip route get 10.0.0.1 | awk '{print $5; exit}')
        echo "NCCL_SOCKET_IFNAME=$IFACE" > /config/nccl.env
    volumeMounts:
      - name: config
        mountPath: /config

containers:
  - name: trainer
    envFrom:
      - configMapRef:
          name: nccl-config
    # Or read from the init container output

PyTorch Distributed Training Job

apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: multi-node-training
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      template:
        spec:
          containers:
            - name: pytorch
              image: nvcr.io/nvidia/pytorch:24.04-py3
              env:
                - name: NCCL_SOCKET_IFNAME
                  value: "^lo,docker0"
                - name: NCCL_DEBUG
                  value: "INFO"
              resources:
                limits:
                  nvidia.com/gpu: 8
    Worker:
      replicas: 3
      template:
        spec:
          containers:
            - name: pytorch
              image: nvcr.io/nvidia/pytorch:24.04-py3
              env:
                - name: NCCL_SOCKET_IFNAME
                  value: "^lo,docker0"
                - name: NCCL_DEBUG
                  value: "INFO"
              resources:
                limits:
                  nvidia.com/gpu: 8
VariablePurposeExample
`NCCL_SOCKET_IFNAME`Network interface for sockets`bond0` or `^lo,docker0`
`NCCL_IB_DISABLE`Disable InfiniBand`0` (enabled) or `1` (disabled)
`NCCL_IB_HCA`InfiniBand HCA device`mlx5_0` or `mlx5`
`NCCL_NET_GDR_LEVEL`GPUDirect RDMA level`5` (max)
`NCCL_TIMEOUT`Operation timeout (ms)`1800000` (30 min)
`NCCL_DEBUG`Debug logging level`INFO`, `WARN`, `TRACE`
`NCCL_TOPO_DUMP_FILE`Dump topology to file`/tmp/nccl-topo.xml`
`NCCL_P2P_LEVEL`P2P communication level`NVL` (NVLink)
`NCCL_ALGO`Algorithm selection`Ring`, `Tree`, `CollNet`

Debug NCCL Interface Selection

# Set NCCL_DEBUG=INFO to see which interface NCCL picks
# In pod logs you'll see:
# NCCL INFO NET/Socket : Using [0]bond0:10.0.1.5<0>
# NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB

# If you see the wrong interface:
# NCCL INFO NET/Socket : Using [0]eth1:192.168.1.5<0>  ← management NIC!
# Fix: set NCCL_SOCKET_IFNAME=bond0 (or ^eth1)

# Dump topology for analysis
env:
  - name: NCCL_TOPO_DUMP_FILE
    value: "/tmp/nccl-topo.xml"

Common Issues

IssueCauseFix
`NCCL WARN Timeout`Wrong interface selectedSet `NCCL_SOCKET_IFNAME` explicitly
Slow all-reduceUsing 1Gbps management NICPoint to high-speed fabric interface
`No usable listening socket`Interface doesn’t exist in podCheck `ip addr` inside pod; may need Multus for secondary NICs
Works single-node, fails multiLoopback used intra-nodeExclude `lo`: `NCCL_SOCKET_IFNAME=^lo`
`Call to ibv_modify_qp failed`IB interface wrongSet `NCCL_IB_HCA=mlx5_0` explicitly
Intermittent timeoutsInterface flappingUse bonded interface; increase `NCCL_TIMEOUT`

Best Practices

  • Always set `NCCL_SOCKET_IFNAME` explicitly β€” never rely on auto-detection in production
  • Use exclusion (`^`) for flexibility β€” `^lo,docker0,veth` works across node types
  • Enable `NCCL_DEBUG=INFO` during setup β€” verify interface selection, disable in production
  • Match interface across all nodes β€” all workers must use the same interface name
  • Increase `NCCL_TIMEOUT` for large clusters β€” default may be too short for 32+ nodes
  • Test with `nccl-tests` first β€” validate networking before training

Key Takeaways

  • `NCCL_SOCKET_IFNAME` controls which network interface NCCL uses for GPU communication
  • Prefix match (`eth`) selects any interface starting with that name
  • Exclusion (`^lo,docker0`) blocks specific interfaces β€” often more portable
  • Always verify with `NCCL_DEBUG=INFO` to confirm the right interface is selected
  • On InfiniBand clusters, also set `NCCL_IB_HCA` and `NCCL_IB_DISABLE=0`
  • Wrong interface selection is the #1 cause of multi-node NCCL timeouts
#nccl #gpu-training #distributed-training #environment-variables #infiniband
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens