πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
ai beginner ⏱ 15 minutes K8s 1.28+

CUDA Version Compatibility K8s Guide

Match CUDA versions with GPU drivers and container images on Kubernetes. Forward compatibility, driver requirements, and container toolkit matrix.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Match CUDA versions with GPU drivers and container images on Kubernetes. Forward compatibility, driver requirements, and container toolkit matrix.

The Problem

Match CUDA versions with GPU drivers and container images on Kubernetes. Without proper setup, GPU workloads on Kubernetes suffer from wasted resources, failed scheduling, or degraded inference performance.

The Solution

Prerequisites

# Verify GPU nodes are available
kubectl get nodes -l nvidia.com/gpu.present=true
kubectl describe node <gpu-node> | grep -A5 "Allocatable"

# Check NVIDIA driver and CUDA
kubectl exec -it <gpu-pod> -- nvidia-smi

Configuration

# CUDA Version Compatibility K8s Guide β€” production configuration
apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
  namespace: gpu-inference
spec:
  containers:
  - name: inference
    image: nvcr.io/nvidia/pytorch:24.07-py3
    resources:
      limits:
        nvidia.com/gpu: 1
      requests:
        nvidia.com/gpu: 1
    env:
    - name: NVIDIA_VISIBLE_DEVICES
      value: "all"
    - name: NVIDIA_DRIVER_CAPABILITIES
      value: "compute,utility"
  nodeSelector:
    nvidia.com/gpu.present: "true"
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

Deployment

# Apply GPU workload
kubectl apply -f gpu-workload.yaml

# Verify GPU allocation
kubectl describe pod gpu-workload | grep -A3 "Limits"

# Monitor GPU utilization
kubectl exec -it gpu-workload -- nvidia-smi dmon -s pucvmet -d 5

Verification

# Check GPU is accessible inside the pod
kubectl exec -it gpu-workload -- python3 -c "
import torch
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'GPU count: {torch.cuda.device_count()}')
print(f'GPU name: {torch.cuda.get_device_name(0)}')
print(f'Memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB')
"
graph TD
    A[GPU Node] --> B[NVIDIA Driver]
    B --> C[Container Toolkit]
    C --> D[Device Plugin]
    D --> E[Pod GPU Access]
    E --> F{Inference / Training}
    F --> G[Monitor with nvidia-smi]
    G --> H[Scale with HPA/KEDA]

Common Issues

GPU not visible inside pod

Check that the NVIDIA device plugin DaemonSet is running on the node. Verify with kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-daemonset. If missing, the GPU Operator may need reinstalling.

CUDA version mismatch

The container CUDA version must be compatible with the host driver. Use nvidia-smi on the node to check driver version, then select a compatible container image from NVIDIA NGC catalog.

Out of memory on GPU

Reduce batch size, enable gradient checkpointing for training, or use model quantization (AWQ/GPTQ) for inference. Monitor with nvidia-smi to track peak memory usage.

Best Practices

  • Always set resources.limits for nvidia.com/gpu β€” without it, pods won’t get GPU access
  • Use node selectors or affinity to target specific GPU types (A100, H100, etc.)
  • Monitor GPU utilization with DCGM Exporter + Prometheus β€” idle GPUs waste expensive resources
  • Pin CUDA container versions β€” don’t use latest tags in production
  • Enable GPU health checks with liveness probes that verify CUDA functionality

Key Takeaways

  • CUDA Version Compatibility K8s Guide is critical for production GPU workloads on Kubernetes
  • Proper resource configuration prevents scheduling failures and resource waste
  • Monitor GPU utilization to right-size allocations and reduce cloud costs
  • Use NVIDIA GPU Operator for automated driver and toolkit lifecycle management
  • Combine with KEDA or custom metrics HPA for GPU-aware autoscaling
#cuda #compatibility #driver-version #container
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens