πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
ai advanced ⏱ 25 minutes K8s 1.28+

Physical AI and Robotics Orchestration

Orchestrate physical AI and robotics fleets with Kubernetes. ROS 2 on K8s, robot fleet management, edge-cloud hybrid, NVIDIA Isaac.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Physical AI deploys AI models into robots, drones, and autonomous vehicles. Kubernetes orchestrates the cloud side: model training, simulation (NVIDIA Isaac Sim), fleet OTA updates, telemetry collection, and edge node management. ROS 2 workloads run as pods with DDS networking, while K3s/MicroK8s runs on edge devices for local inference.

The Problem

AI is moving off screens into the physical world β€” warehouse robots, delivery drones, autonomous forklifts, and smart factories. These systems need: cloud infrastructure for training and simulation, edge computing for real-time inference, fleet management for hundreds of robots, OTA model updates, and telemetry pipelines. Kubernetes provides the orchestration layer across cloud and edge.

flowchart TB
    subgraph CLOUD["☁️ Cloud Kubernetes"]
        TRAIN["Model Training<br/>(GPU cluster)"]
        SIM["NVIDIA Isaac Sim<br/>(digital twin)"]
        FLEET["Fleet Manager<br/>(OTA updates)"]
        TEL["Telemetry Pipeline<br/>(data lake)"]
    end
    
    subgraph EDGE["πŸ€– Edge (K3s per robot)"]
        INF["Local Inference<br/>(Jetson/GPU)"]
        ROS["ROS 2 Stack<br/>(navigation, perception)"]
        SENSOR["Sensors<br/>(cameras, LiDAR)"]
    end
    
    CLOUD <-->|"GitOps OTA<br/>Model sync"| EDGE
    EDGE -->|"Telemetry<br/>MQTT/gRPC"| TEL
    TRAIN -->|"New model"| FLEET
    SIM -->|"Validated model"| FLEET
    FLEET -->|"Push update"| EDGE

The Solution

ROS 2 on Kubernetes

# ROS 2 navigation stack as a Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: robot-navigation
spec:
  template:
    spec:
      hostNetwork: true              # DDS needs multicast
      dnsPolicy: ClusterFirstWithHostNet
      containers:
        - name: nav2
          image: myorg/ros2-nav2:humble
          env:
            - name: ROS_DOMAIN_ID
              value: "42"
            - name: RMW_IMPLEMENTATION
              value: "rmw_cyclonedds_cpp"
            - name: CYCLONEDDS_URI
              value: "/config/cyclonedds.xml"
          volumeMounts:
            - name: dds-config
              mountPath: /config
          resources:
            requests:
              cpu: "2"
              memory: "4Gi"
        
        # Perception with GPU inference
        - name: perception
          image: myorg/ros2-perception:humble
          env:
            - name: MODEL_PATH
              value: "/models/yolov8-robotics.onnx"
          resources:
            limits:
              nvidia.com/gpu: 1      # Jetson or discrete GPU
          volumeMounts:
            - name: models
              mountPath: /models
      volumes:
        - name: dds-config
          configMap:
            name: cyclonedds-config
        - name: models
          persistentVolumeClaim:
            claimName: perception-models

NVIDIA Isaac Sim on Cloud K8s

# Digital twin simulation for robot testing
apiVersion: batch/v1
kind: Job
metadata:
  name: isaac-sim-test
spec:
  template:
    spec:
      containers:
        - name: isaac-sim
          image: nvcr.io/nvidia/isaac-sim:4.2.0
          env:
            - name: ACCEPT_EULA
              value: "Y"
            - name: SCENARIO
              value: "/scenarios/warehouse-navigation.usd"
          resources:
            limits:
              nvidia.com/gpu: 1
          ports:
            - containerPort: 8211    # Streaming
      restartPolicy: Never

Fleet OTA Updates via GitOps

# FluxCD syncs model updates to edge clusters
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: robot-fleet-config
spec:
  interval: 5m
  url: https://github.com/myorg/robot-fleet
  ref:
    branch: production
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: robot-models
spec:
  interval: 10m
  sourceRef:
    kind: GitRepository
    name: robot-fleet-config
  path: ./models/v2.3
  prune: true
  # Staged rollout: canary robot first
  patches:
    - target:
        kind: Deployment
        name: perception
      patch: |
        - op: replace
          path: /spec/template/spec/containers/0/image
          value: myorg/perception:v2.3

Edge Device Management (K3s)

# Install K3s on each robot/edge device
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="agent" \
  K3S_URL=https://fleet-server:6443 \
  K3S_TOKEN=<token> sh -

# Label edge nodes by robot type
kubectl label node robot-agv-001 \
  robot-type=agv \
  location=warehouse-a \
  hardware=jetson-orin

Telemetry Pipeline

# MQTT broker for robot telemetry
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mqtt-broker
spec:
  template:
    spec:
      containers:
        - name: mosquitto
          image: eclipse-mosquitto:2
          ports:
            - containerPort: 1883
---
# Telemetry collector: MQTT β†’ Kafka β†’ Data Lake
apiVersion: apps/v1
kind: Deployment
metadata:
  name: telemetry-collector
spec:
  template:
    spec:
      containers:
        - name: collector
          image: myorg/telemetry-collector:v1.0
          env:
            - name: MQTT_BROKER
              value: "mqtt://mqtt-broker:1883"
            - name: KAFKA_BROKERS
              value: "kafka:9092"
            - name: TOPICS
              value: "robot/+/sensors,robot/+/status,robot/+/diagnostics"

Common Issues

IssueCauseFix
DDS multicast not workingNetworkPolicy blocks multicastUse hostNetwork or Zenoh bridge
Edge node disconnectsUnreliable networkK3s works offline; sync when connected
Model too large for edgeJetson has limited RAMQuantize model (INT8/FP16 for TensorRT)
OTA update bricking robotBad model deployedStaged rollout with health checks, auto-rollback
Sensor data overwhelming cloudHigh-frequency telemetryEdge preprocessing, send only anomalies/summaries

Best Practices

  • K3s for edge, full K8s for cloud β€” K3s is lightweight enough for robots
  • GitOps for fleet updates β€” version-controlled, auditable, rollback-capable
  • Staged rollouts β€” canary one robot before fleet-wide update
  • Edge preprocessing β€” run inference and filtering locally, send results to cloud
  • Simulate before deploying β€” NVIDIA Isaac Sim validates models in digital twin
  • Monitor robot health β€” Prometheus + Grafana for fleet-wide metrics

Key Takeaways

  • Physical AI deploys AI into robots, drones, and autonomous vehicles
  • Kubernetes orchestrates cloud training, simulation, fleet management, and telemetry
  • ROS 2 workloads run as pods with DDS networking (hostNetwork for multicast)
  • K3s on edge devices provides Kubernetes API for robot software management
  • GitOps enables versioned, staged OTA updates across robot fleets
  • 2026 trend: AI moving from screens to warehouses, factories, and roads
#physical-ai #robotics #ros2 #edge-computing #nvidia-isaac
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens