📚Book Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) — free book giveaway!RSVP Booking.com Event
Autoscaling intermediate ⏱ 20 minutes K8s 1.25+

HPA in Kubernetes: Horizontal Pod Autoscaler Guide

Configure HPA in Kubernetes for automatic horizontal pod autoscaling based on CPU, memory, and custom metrics. Step-by-step HPA examples and best practices.

By Luca Berton 📖 6 min read

💡 Quick Answer: Create an HPA with kubectl autoscale deployment <name> --cpu-percent=80 --min=2 --max=10. Ensure metrics-server is installed (kubectl top pods should work). For custom metrics, install prometheus-adapter. Set resources.requests on your pods—HPA uses these to calculate utilization percentage.

The Problem

Your application traffic varies throughout the day. Running too few pods causes performance issues during peak times, while running too many wastes resources during quiet periods.

The Solution

Use Horizontal Pod Autoscaler (HPA) to automatically scale your pods based on observed metrics like CPU utilization, memory usage, or custom application metrics.

Prerequisites: Install metrics-server

HPA requires metrics-server to get resource metrics:

# Check if metrics-server is installed
kubectl get deployment metrics-server -n kube-system

# If not installed, install it
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify it’s working:

kubectl top nodes
kubectl top pods

Basic HPA: Scale on CPU

Step 1: Create a Deployment with Resource Requests

HPA needs resource requests to calculate utilization:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: my-app:1.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m      # Required for HPA!
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi

Step 2: Create HPA

Using kubectl:

kubectl autoscale deployment my-app \
  --min=2 \
  --max=10 \
  --cpu-percent=70

Or using YAML:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

HPA with Multiple Metrics

Scale based on both CPU and memory:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    # Scale up if CPU > 70%
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    # OR if memory > 80%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Note: HPA uses the metric that results in the highest replica count.

Scale Based on Custom Metrics

For advanced scenarios, scale based on application metrics like requests per second.

Using Prometheus Adapter

First, install Prometheus and the Prometheus Adapter:

# Add Prometheus community charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack

# Install prometheus-adapter
helm install prometheus-adapter prometheus-community/prometheus-adapter

HPA with Custom Metrics

Scale based on HTTP requests per second:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 50
  metrics:
    # Scale based on requests per second per pod
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"  # 100 RPS per pod

Scaling Behavior Configuration

Control how fast HPA scales up and down:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 10           # Scale down max 10% at a time
          periodSeconds: 60
        - type: Pods
          value: 2            # Or max 2 pods at a time
          periodSeconds: 60
      selectPolicy: Min       # Use the policy that removes fewer pods
    scaleUp:
      stabilizationWindowSeconds: 0    # Scale up immediately
      policies:
        - type: Percent
          value: 100          # Can double pods
          periodSeconds: 15
        - type: Pods
          value: 4            # Or add 4 pods at a time
          periodSeconds: 15
      selectPolicy: Max       # Use the policy that adds more pods

Monitoring HPA

Check HPA status:

kubectl get hpa my-app-hpa

# Output:
# NAME         REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
# my-app-hpa   Deployment/my-app   45%/70%   2         10        3          5m

Detailed view:

kubectl describe hpa my-app-hpa

Watch scaling events:

kubectl get hpa my-app-hpa -w

Testing HPA

Generate load to trigger scaling:

# Run a load generator
kubectl run load-generator --image=busybox -- /bin/sh -c "while true; do wget -q -O- http://my-app-service; done"

# Watch HPA react
kubectl get hpa my-app-hpa -w

# Clean up
kubectl delete pod load-generator

Common Issues

HPA shows “unknown” for metrics

kubectl get hpa
# NAME         TARGETS       MINPODS   MAXPODS
# my-app-hpa   <unknown>/70%  2         10

Causes:

  1. metrics-server not installed
  2. No resource requests defined on containers
  3. Pods haven’t started yet

HPA not scaling up

Check if your Deployment has reached maxReplicas:

kubectl describe hpa my-app-hpa | grep -A5 Conditions

Scaling too aggressively

Adjust the behavior section to add stabilization windows and limit scale velocity.

Complete Production Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: production-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 100
  metrics:
    # Primary: CPU utilization
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    # Secondary: Memory utilization
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 50
          periodSeconds: 30
        - type: Pods
          value: 5
          periodSeconds: 30
      selectPolicy: Max

Summary

You’ve learned how to:

  1. Set up metrics-server for resource metrics
  2. Create HPA for CPU-based scaling
  3. Configure multi-metric scaling
  4. Control scaling behavior
  5. Troubleshoot common HPA issues

Key takeaway: Always define resource requests on your containers for HPA to work correctly.

References


📘 Go Further with Kubernetes Recipes

Love this recipe? There’s so much more! This is just one of 100+ hands-on recipes in our comprehensive Kubernetes Recipes book.

Inside the book, you’ll master:

  • ✅ Production-ready deployment strategies
  • ✅ Advanced networking and security patterns
  • ✅ Observability, monitoring, and troubleshooting
  • ✅ Real-world best practices from industry experts

“The practical, recipe-based approach made complex Kubernetes concepts finally click for me.”

👉 Get Your Copy Now — Start building production-grade Kubernetes skills today!

Frequently Asked Questions

What is HPA in Kubernetes?

HPA (Horizontal Pod Autoscaler) automatically scales the number of pod replicas based on observed CPU utilization, memory usage, or custom metrics. When demand increases, HPA adds more pods; when demand decreases, it scales down to save resources.

How does HPA work with VPA?

HPA scales horizontally (more replicas) while VPA scales vertically (more CPU/memory per pod). Use HPA for stateless workloads that scale out, and VPA for stateful workloads or when adding replicas isn’t practical. Don’t use both on the same CPU/memory metric simultaneously.

What metrics can HPA use?

HPA supports three metric types: Resource metrics (CPU, memory via metrics-server), Custom metrics (application-specific like requests-per-second via Prometheus Adapter), and External metrics (cloud provider metrics like SQS queue depth).

#hpa #autoscaling #metrics #cpu #memory #scaling
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens