Vertical Pod Autoscaler (VPA) Guide
Configure Kubernetes Vertical Pod Autoscaler to automatically right-size container CPU and memory requests based on actual usage. Covers
π‘ Quick Answer: VPA monitors Pod resource usage over time and automatically adjusts CPU/memory requests to match actual consumption, eliminating over-provisioning (waste) and under-provisioning (OOM/throttling).
The Problem
Static resource requests are almost always wrong:
- Over-provisioned: wasting 40-60% of cluster resources (industry average)
- Under-provisioned: OOMKilled or CPU-throttled under load
- Manual tuning doesnβt scale (hundreds of workloads)
- Resource usage changes over time (traffic patterns, code changes)
The Solution
Install VPA
# Clone VPA repo
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
# Install VPA components
./hack/vpa-up.sh
# Or via Helm
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
--namespace vpa --create-namespaceVPA Modes
Mode Behavior
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
Off Only generates recommendations (read-only)
Initial Sets requests on Pod creation only (no restart)
Auto Updates running Pods (evicts + recreates with new requests)
Recreate Same as Auto (deprecated alias)Basic VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
namespace: default
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsOnlyRecommendation-Only Mode (Safe Start)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Only recommend, don't act# Read recommendations
kubectl describe vpa my-app-vpa
# Output:
# Recommendation:
# Container Recommendations:
# Container Name: my-app
# Lower Bound: Cpu: 25m, Memory: 128Mi
# Target: Cpu: 100m, Memory: 256Mi
# Uncapped Target: Cpu: 100m, Memory: 256Mi
# Upper Bound: Cpu: 500m, Memory: 1GiVPA with HPA (Combined)
# VPA controls memory, HPA controls CPU scaling
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: my-app
controlledResources: ["memory"] # VPA only manages memory
# CPU managed by HPA (replica scaling)
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Production VPA Pattern
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Auto"
minReplicas: 2 # Don't evict if < 2 replicas ready
resourcePolicy:
containerPolicies:
- containerName: api-server
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4000m
memory: 8Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
- containerName: sidecar
mode: "Off" # Don't touch sidecar resourcesMonitor VPA Decisions
# Check all VPA recommendations
kubectl get vpa -A -o custom-columns=\
'NAME:.metadata.name,MODE:.spec.updatePolicy.updateMode,CPU:.status.recommendation.containerRecommendations[0].target.cpu,MEM:.status.recommendation.containerRecommendations[0].target.memory'
# Watch for VPA evictions
kubectl get events --field-selector reason=EvictedByVPA -A
# Prometheus metrics
# vpa_recommender_recommendation_latency_seconds
# vpa_updater_evictions_total
# vpa_status_recommendation{resource="cpu|memory",bound="target|lower|upper"}Common Issues
VPA keeps evicting Pods during peak traffic
- Cause: Auto mode evicts to apply new requests
- Fix: Use
minReplicasin updatePolicy; or use βInitialβ mode + rolling restart
VPA and HPA conflict on CPU
- Cause: Both trying to manage CPU β HPA adds replicas, VPA increases per-Pod CPU
- Fix: Let HPA manage CPU (horizontal), VPA manage memory only
Recommendations stuck at initial values
- Cause: Not enough metrics history (VPA needs 8+ hours of data)
- Fix: Wait 24-48h for stable recommendations; check metrics-server is running
VPA sets requests too low
- Cause: Low traffic period skewed the recommendation
- Fix: Set
minAllowedto prevent requests going below safe threshold
Best Practices
- Start with βOffβ mode β read recommendations for 1 week before enabling Auto
- Set minAllowed/maxAllowed β prevent extreme values
- Use VPA for memory, HPA for CPU β best of both worlds
- Exclude sidecars β
mode: "Off"for istio-proxy, logging sidecars - minReplicas: 2 in updatePolicy β prevents evicting last Pod
- controlledValues: RequestsOnly β let limits float with LimitRange ratios
Key Takeaways
- VPA right-sizes containers based on actual usage (target = P90 usage + buffer)
- Three modes: Off (recommend only), Initial (set on create), Auto (evict + recreate)
- Combine with HPA: VPAβmemory, HPAβCPU for optimal scaling
- Needs 24-48h of metrics history for stable recommendations
- minAllowed/maxAllowed prevent VPA from setting extreme values
- Production: start Off, validate recommendations, then enable Auto
- Saves 30-50% cluster cost by eliminating over-provisioning

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
