Karpenter Node Autoscaling for Kubernetes
Replace Cluster Autoscaler with Karpenter for faster, smarter node provisioning. Right-sized instances, spot fallback, consolidation, and GPU-aware scaling.
π‘ Quick Answer: Replace Cluster Autoscaler with Karpenter for faster, smarter node provisioning. Right-sized instances, spot fallback, consolidation, and GPU-aware scaling.
The Problem
Cluster Autoscaler scales node groups β Karpenter provisions individual nodes. It picks the optimal instance type, size, and purchase option (on-demand vs spot) for each pending pod. Nodes launch in seconds, not minutes.
The Solution
Step 1: Install Karpenter
# AWS EKS
export KARPENTER_VERSION="0.37.0"
export CLUSTER_NAME="my-cluster"
helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version "$KARPENTER_VERSION" \
--namespace kube-system \
--set "settings.clusterName=$CLUSTER_NAME" \
--set "settings.interruptionQueue=$CLUSTER_NAME" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1GiStep 2: Create NodePool
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"] # Consider ARM for cost savings
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # Prefer spot, fallback to on-demand
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"] # Compute, general, memory optimized
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"] # Only gen 5+ instances
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
limits:
cpu: 1000 # Max 1000 vCPUs total
memory: 4000Gi
disruption:
consolidationPolicy: WhenUnderutilized # Auto-consolidate idle nodes
expireAfter: 720h # Replace nodes every 30 days
---
# GPU NodePool β separate pool for GPU workloads
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: gpu
spec:
template:
spec:
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["p4d", "p5", "g5", "g6"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"] # GPUs: on-demand only
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: gpu-nodes
limits:
nvidia.com/gpu: 32 # Max 32 GPUs totalStep 3: EC2NodeClass
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
encrypted: trueKarpenter vs Cluster Autoscaler
| Feature | Cluster Autoscaler | Karpenter |
|---|---|---|
| Scaling unit | Node group | Individual node |
| Instance selection | Fixed per group | Dynamic per pod |
| Provisioning speed | 2-5 minutes | 30-60 seconds |
| Spot handling | Per node group | Per node with fallback |
| Consolidation | Limited | Built-in |
| GPU awareness | Basic | Advanced |
| Multi-arch | Separate groups | Automatic |
graph TD
A[Pending Pod: 4 CPU, 8Gi, GPU] --> B[Karpenter Controller]
B --> C{Find optimal instance}
C --> D[Check spot availability]
D -->|Spot available| E[Launch p5.xlarge spot]
D -->|No spot| F[Launch p5.xlarge on-demand]
E --> G[Node ready in 30s]
F --> G
G --> H[Pod scheduled]
I[Consolidation loop] -->|Node <50% utilized| J[Cordon + drain + terminate]Best Practices
- Start small and iterate β donβt over-engineer on day one
- Monitor and measure β you canβt improve what you donβt measure
- Automate repetitive tasks β reduce human error and toil
- Document your decisions β future you will thank present you
Key Takeaways
- This is essential knowledge for production Kubernetes operations
- Start with the simplest approach that solves your problem
- Monitor the impact of every change you make
- Share knowledge across your team with internal runbooks

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
