πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Autoscaling intermediate ⏱ 15 minutes K8s 1.28+

Kubernetes Vertical Pod Autoscaler VPA Guide

Deploy and configure the Vertical Pod Autoscaler (VPA) on Kubernetes. Auto-adjust CPU and memory requests based on actual usage, right-size

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: The Vertical Pod Autoscaler (VPA) automatically adjusts pod CPU and memory requests/limits based on historical usage. Install with ./hack/vpa-up.sh from the autoscaler repo, create a VPA resource targeting your Deployment, and set updateMode: "Auto" for automatic adjustment or "Off" for recommendations only.

The Problem

  • Developers guess resource requests β€” too low causes OOMKilled, too high wastes cluster capacity
  • Manual right-sizing requires constant monitoring and adjustment
  • Applications’ resource needs change over time (traffic patterns, data growth)
  • Over-provisioned clusters waste 40-60% of allocated resources
  • Under-provisioned pods get throttled (CPU) or killed (memory)

The Solution

Install VPA

# Clone the autoscaler repo
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Install VPA components (admission controller, recommender, updater)
./hack/vpa-up.sh

# Verify
kubectl get pods -n kube-system | grep vpa
# vpa-admission-controller-xxx   1/1   Running   0   1m
# vpa-recommender-xxx            1/1   Running   0   1m
# vpa-updater-xxx                1/1   Running   0   1m

VPA in Recommendation Mode (Safe Start)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"    # Only recommend, don't change pods
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: "50m"
          memory: "64Mi"
        maxAllowed:
          cpu: "4"
          memory: "8Gi"
        controlledResources: ["cpu", "memory"]
# Check recommendations
kubectl describe vpa my-app-vpa -n production
# Recommendation:
#   Container Recommendations:
#     Container Name: my-app
#     Lower Bound:    Cpu: 100m, Memory: 128Mi
#     Target:         Cpu: 250m, Memory: 384Mi   ← Use this
#     Upper Bound:    Cpu: 1000m, Memory: 1Gi
#     Uncapped Target: Cpu: 250m, Memory: 384Mi

VPA in Auto Mode

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"       # Automatically adjust + restart pods
    minReplicas: 2           # Don't evict if fewer than 2 replicas
  resourcePolicy:
    containerPolicies:
      - containerName: app
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "2"
          memory: "4Gi"
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits  # or RequestsOnly

Update Modes

Mode       β”‚ Behavior
───────────┼─────────────────────────────────────────────────────────
Off        β”‚ Only generates recommendations (no pod changes)
           β”‚ Safe for production exploration
───────────┼─────────────────────────────────────────────────────────
Initial    β”‚ Sets resources only at pod creation (no evictions)
           β”‚ Good for Jobs and one-shot pods
───────────┼─────────────────────────────────────────────────────────
Auto       β”‚ Evicts and recreates pods with new resource values
           β”‚ Full automation (ensure PDB protects availability)
───────────┴─────────────────────────────────────────────────────────

VPA with Multiple Containers

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: multi-container-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: app
        minAllowed:
          cpu: "200m"
          memory: "256Mi"
        maxAllowed:
          cpu: "4"
          memory: "8Gi"
      - containerName: sidecar
        mode: "Off"    # Don't touch sidecar resources
      - containerName: init-container
        mode: "Off"    # Don't touch init containers

Common Issues

VPA evicting pods too frequently

  • Cause: Small resource changes trigger unnecessary restarts
  • Fix: Set minReplicas > 1; use PodDisruptionBudget; widen min/max range

VPA and HPA conflict

  • Cause: Both trying to adjust the same deployment β€” VPA changes requests, HPA scales replicas
  • Fix: Use VPA for CPU/memory right-sizing + HPA for replica scaling on custom metrics (not CPU)

β€œcannot use VPA with HPA on CPU”

  • Cause: HPA targets CPU utilization percentage β€” VPA changes the denominator
  • Fix: Use HPA with custom/external metrics (not cpu/memory) alongside VPA

Recommendations show 0 after VPA creation

  • Cause: VPA needs 24-48h of metrics history to generate reliable recommendations
  • Fix: Wait for data collection; ensure metrics-server is running

Best Practices

  1. Start with Off mode β€” observe recommendations before enabling auto-adjustment
  2. Set min/max bounds β€” prevent recommendations from going too low (OOM) or too high (waste)
  3. Use PodDisruptionBudgets β€” protect availability when VPA evicts pods
  4. Don’t combine VPA + HPA on CPU β€” they conflict; use custom metrics with HPA
  5. controlledValues: RequestsOnly β€” let limits float with a ratio to requests
  6. Monitor recommendation stability β€” erratic recommendations indicate variable workloads (use HPA instead)

Key Takeaways

  • VPA auto-adjusts CPU/memory requests based on actual historical usage
  • Install with ./hack/vpa-up.sh from kubernetes/autoscaler repo
  • Three modes: Off (recommend only), Initial (set at creation), Auto (evict + update)
  • Recommendations include lower bound, target, and upper bound
  • VPA + HPA: safe only when HPA uses custom metrics (not CPU/memory)
  • Set minAllowed/maxAllowed to prevent extreme values
  • Auto mode evicts pods β€” use PDB to maintain availability during restarts
#vpa #autoscaling #resource-management #right-sizing #vertical-scaling
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens