GPU Node Provisioning Kubernetes
Automate GPU node provisioning for Kubernetes with Karpenter, Cluster Autoscaler, and cloud-specific node pools for AI and ML workloads.
π‘ Quick Answer: Use Karpenter with GPU-aware NodePools to automatically provision the right GPU instance type based on pod requirements. Define constraints for instance families (p5, g6), GPU count, and spot vs on-demand. Implement scale-to-zero for dev/staging GPU nodes.
The Problem
GPU nodes are expensive ($2-100/hour) and manually managing node pools for different GPU types (T4, A10, A100, H100) wastes money. You need automatic provisioning that matches GPU requests to the cheapest available instance, scales to zero when idle, and uses spot instances where possible.
The Solution
Karpenter NodePool for GPUs
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: gpu-inference
spec:
template:
spec:
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["g6", "g5", "p4d"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: nvidia.com/gpu.count
operator: In
values: ["1", "2", "4"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: gpu-nodeclass
limits:
nvidia.com/gpu: 32
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 10m
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: gpu-training-spot
spec:
template:
spec:
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["p4d", "p5"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: nvidia.com/gpu.count
operator: In
values: ["8"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: gpu-nodeclass
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
limits:
nvidia.com/gpu: 64
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 5mEC2NodeClass for GPU AMI
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: gpu-nodeclass
spec:
amiFamily: AL2
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 200Gi
volumeType: gp3
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
userData: |
#!/bin/bash
# Install NVIDIA drivers
yum install -y nvidia-driver-latestCluster Autoscaler Alternative
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-config
data:
config: |
nodeGroups:
- name: gpu-a100
minSize: 0
maxSize: 8
scaleDownUtilizationThreshold: 0.3
scaleDownUnneededTime: 10mGPU Instance Decision Matrix
| Workload | GPU | Instance (AWS) | Spot? | Cost/hr |
|---|---|---|---|---|
| Small inference | T4 | g4dn.xlarge | Yes | $0.16 |
| Medium inference | L4 | g6.xlarge | Yes | $0.24 |
| Large inference | A10G | g5.xlarge | On-demand | $1.01 |
| Training (single) | A100 | p4d.24xlarge | Spot | $9.83 |
| Training (multi) | H100 | p5.48xlarge | Spot | $32.77 |
graph TD
POD[GPU Pod Created<br/>requests: nvidia.com/gpu: 1] --> KARP[Karpenter<br/>Find cheapest instance]
KARP -->|Inference pod| G6[g6.xlarge<br/>L4 GPU, $0.80/hr]
KARP -->|Training pod| P4D[p4d.24xlarge<br/>8Γ A100, spot $9.83/hr]
IDLE[No GPU pods<br/>for 10 min] --> SCALE[Scale to zero<br/>Node terminated]
SCALE --> SAVE[π° $0/hr]Common Issues
GPU node provisioning slow (5-10 minutes)
GPU instances take longer to launch than CPU instances. Pre-warm by keeping minSize: 1 for critical node pools, or use Karpenterβs consolidationPolicy: WhenEmpty instead of WhenUnderutilized.
Spot GPU instance reclaimed during training
Use checkpointing + Karpenter interruption handling. Set terminationGracePeriodSeconds: 120 to save checkpoint before node termination.
Best Practices
- Separate NodePools for inference and training β different instance types and pricing
- Spot for training, on-demand for inference β training can tolerate interruption
- Scale-to-zero for dev/staging β no GPU cost when idle
- GPU taints β prevent non-GPU pods from accidentally scheduling on expensive nodes
- 200GB root volume β GPU drivers and model cache need space
Key Takeaways
- Karpenter automatically provisions the cheapest GPU instance matching pod requirements
- Separate NodePools for inference (on-demand, small GPUs) and training (spot, large GPUs)
- Scale-to-zero saves 100% cost when no GPU workloads are running
- GPU taints prevent non-GPU pods from wasting expensive GPU nodes
- Spot instances save 60-80% for training with checkpoint-based fault tolerance

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
