πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Autoscaling intermediate ⏱ 15 minutes K8s 1.28+

Vertical Pod Autoscaler Deep Dive

Configure VPA for automatic memory and CPU right-sizing in Kubernetes. Recommendation modes, update policies, VPA with HPA coexistence, and GPU workload tuning.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Deploy VPA in Off mode first to get recommendations without changes, then switch to Auto for memory only (let HPA handle CPU scaling). Set minAllowed and maxAllowed to prevent extreme values. VPA and HPA can coexist if they control different resources.

The Problem

Developers guess resource requests/limits at deploy time β€” usually too high (wasting cluster capacity) or too low (causing OOM kills and throttling). VPA observes actual usage and adjusts requests automatically, but misconfiguration can cause pod restarts or conflict with HPA.

The Solution

Step 1: Recommendation Mode (Safe Start)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"

Check recommendations:

kubectl describe vpa my-app-vpa
# Recommendation:
#   Container: app
#     Lower Bound:  Cpu: 25m, Memory: 128Mi
#     Target:       Cpu: 100m, Memory: 256Mi
#     Upper Bound:  Cpu: 500m, Memory: 1Gi
#     Uncapped:     Cpu: 100m, Memory: 256Mi

Step 2: Auto Mode with Bounds

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: app
        minAllowed:
          cpu: 50m
          memory: 128Mi
        maxAllowed:
          cpu: "4"
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

VPA + HPA Coexistence

# VPA controls memory only
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: app
        controlledResources: ["memory"]
        minAllowed:
          memory: 128Mi
        maxAllowed:
          memory: 8Gi
---
# HPA controls CPU scaling (replica count)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
graph TD
    VPA[VPA<br/>controlledResources: memory] -->|Adjust memory<br/>requests/limits| POD[Pod]
    HPA[HPA<br/>metric: CPU utilization] -->|Scale replicas| DEPLOY[Deployment<br/>replicas: 2-10]
    DEPLOY --> POD
    
    VPA -.->|❌ Does NOT control| CPU[CPU resources]
    HPA -.->|❌ Does NOT control| MEM[Memory resources]

Update Modes

ModeBehaviorUse Case
OffRecommendations only, no changesInitial assessment
InitialSet resources on pod creation onlyAvoid mid-lifecycle restarts
RecreateEvict and recreate pods to apply changesStateless services
AutoSame as Recreate (in-place update planned)Production workloads

Common Issues

VPA keeps restarting pods

VPA evicts pods to apply new resource values (in-place resize is not yet GA). Use updateMode: Initial to only set resources at pod creation.

VPA and HPA fighting over CPU

Never let both VPA and HPA control the same resource. Use controlledResources: ["memory"] on VPA when HPA targets CPU.

VPA recommends extremely low values

Set minAllowed to prevent VPA from under-provisioning. This is common for services with spiky traffic.

Best Practices

  • Start with Off mode β€” observe recommendations for a week before enabling Auto
  • Set minAllowed and maxAllowed β€” prevent extreme right-sizing
  • VPA for memory, HPA for CPU β€” the safe coexistence pattern
  • Use Initial mode for StatefulSets β€” avoid evicting database pods
  • Monitor VPA events β€” kubectl get events --field-selector reason=EvictedByVPA

Key Takeaways

  • VPA right-sizes resource requests based on observed usage
  • Start in Off mode to see recommendations without changes
  • VPA + HPA coexistence: VPA controls memory, HPA controls CPU/replicas
  • minAllowed/maxAllowed are essential guardrails
  • VPA evicts pods to apply changes (no in-place resize yet in GA)
  • Initial mode is safest for stateful workloads
#vpa #autoscaling #right-sizing #resource-optimization
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens