K8s HPA: Autoscale on CPU and Memory
Configure Kubernetes HorizontalPodAutoscaler to scale on CPU and memory utilization. Target utilization, minReplicas, maxReplicas, and scaling behavior.
π‘ Quick Answer:
kubectl autoscale deployment web --cpu-percent=70 --min=2 --max=10creates an HPA that scales between 2-10 replicas targeting 70% CPU utilization. Pods MUST have CPUrequestsset β HPA calculates utilization ascurrent_usage / request. Formula:desiredReplicas = ceil(currentReplicas Γ (currentMetric / targetMetric)).
The Problem
Fixed replica counts waste resources or canβt handle traffic spikes:
- 10 replicas at 3 AM = wasted compute
- 2 replicas during Black Friday = outage
- Manual scaling is slow and error-prone
- Need to balance cost vs performance automatically
The Solution
Create HPA
# Imperative (quick)
kubectl autoscale deployment web-app \
--cpu-percent=70 \
--min=2 \
--max=20
# Check status
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# web-app Deployment/web-app 23%/70% 2 20 3HPA YAML (autoscaling/v2)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 20
metrics:
# CPU utilization target
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # 70% of CPU request
# Memory utilization target
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # 80% of memory request
# Scaling behavior
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 60s before scaling up
policies:
- type: Percent
value: 100 # Double replicas at most
periodSeconds: 60
- type: Pods
value: 4 # Add at most 4 pods
periodSeconds: 60
selectPolicy: Max # Use whichever adds more
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
policies:
- type: Percent
value: 10 # Remove 10% at a time
periodSeconds: 60Prerequisites
# Pods MUST have resource requests β HPA needs them!
containers:
- name: web
image: myapp:v2
resources:
requests:
cpu: 200m # HPA: 70% of 200m = 140m target
memory: 256Mi # HPA: 80% of 256Mi = ~205Mi target
limits:
cpu: 500m
memory: 512Mi# metrics-server must be running
kubectl get pods -n kube-system | grep metrics-server
# If missing:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify metrics
kubectl top podsHow HPA Calculates Replicas
Current: 3 replicas, each using 180m CPU, request = 200m
Average utilization: 180m / 200m = 90%
Target: 70%
desiredReplicas = ceil(3 Γ (90 / 70)) = ceil(3.86) = 4
β Scale up to 4 replicasMonitor HPA
# Current state
kubectl get hpa web-app-hpa
# NAME TARGETS MINPODS MAXPODS REPLICAS
# web-app-hpa 45%/70%,60%/80% 2 20 5
# Detailed status
kubectl describe hpa web-app-hpa
# Events:
# Normal SuccessfulRescale 4m horizontal-pod-autoscaler New size: 5; reason: cpu above target
# HPA conditions
kubectl get hpa web-app-hpa -o yaml | grep -A10 conditionsScaling on Multiple Metrics
# HPA evaluates ALL metrics and picks the highest replica count
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# If CPU says 5 replicas and memory says 8 replicas β 8 replicasCommon Issues
HPA shows <unknown>/70% for targets
Pods donβt have resource requests set, or metrics-server is not running. Add resources.requests to all containers.
HPA not scaling down
Default stabilizationWindowSeconds for scale-down is 300s (5 min). HPA waits to avoid flapping. Check behavior.scaleDown.
HPA keeps scaling up and down (flapping)
Set stabilizationWindowSeconds higher (5-10 min for scale-down). Or increase the gap between target and actual utilization.
Memory-based HPA doesnβt scale down
Memory often doesnβt decrease after load drops (JVM, Go GC). CPU-based HPA is more reliable for scale-down. Use memory HPA as a ceiling only.
Best Practices
- Always set CPU requests on pods β HPA canβt work without them
- Target 70% CPU utilization β leaves headroom for spikes
- Use
behaviorto control scale speed β fast up, slow down - Donβt HPA on memory alone β memory rarely decreases, causes thrashing
- Combine with Cluster Autoscaler β HPA scales pods, CA scales nodes
Key Takeaways
- HPA scales pod replicas based on CPU/memory utilization or custom metrics
- Pods must have
resources.requestsset for utilization calculation - metrics-server is required for CPU/memory based HPA
- Use
behaviorto control scaling speed and prevent flapping - Scale up fast (60s window), scale down slow (300s window) for stability

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
