πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Autoscaling advanced ⏱ 20 minutes K8s 1.28+

HPA Behavior and Scaling Policies

Configure HPA scaling behavior with stabilization windows, scaling policies, and rate limiting. Fine-tune scale-up and scale-down speed.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Use spec.behavior in HPA to control scaling speed. Set stabilizationWindowSeconds to prevent flapping, policies to limit scale rate (e.g., max 4 pods per 60s), and selectPolicy: Min for conservative scaling. Scale-up and scale-down have independent configs.

The Problem

Default HPA scaling can be too aggressive (adding too many pods at once) or too slow (not responding fast enough to traffic spikes). You need fine-grained control over how fast pods scale up and down.

The Solution

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
        - type: Pods
          value: 4
          periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
      selectPolicy: Min

Behavior Configuration Explained

Scale-Up Policies

behavior:
  scaleUp:
    # No stabilization β€” react immediately to load
    stabilizationWindowSeconds: 0
    policies:
      # Allow doubling replicas per minute
      - type: Percent
        value: 100
        periodSeconds: 60
      # OR add up to 4 pods per minute
      - type: Pods
        value: 4
        periodSeconds: 60
    # Use whichever policy allows MORE pods
    selectPolicy: Max

Scale-Down Policies

behavior:
  scaleDown:
    # Wait 5 minutes before scaling down (prevent flapping)
    stabilizationWindowSeconds: 300
    policies:
      # Remove max 10% of pods per minute
      - type: Percent
        value: 10
        periodSeconds: 60
    # Conservative: use policy allowing FEWER removals
    selectPolicy: Min

Disable Scale-Down Entirely

behavior:
  scaleDown:
    selectPolicy: Disabled

Real-World Patterns

Aggressive Scale-Up, Conservative Scale-Down

Best for web applications with bursty traffic:

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
      - type: Percent
        value: 200
        periodSeconds: 30
  scaleDown:
    stabilizationWindowSeconds: 600
    policies:
      - type: Pods
        value: 1
        periodSeconds: 300

Gradual Scaling for Stateful Workloads

behavior:
  scaleUp:
    stabilizationWindowSeconds: 120
    policies:
      - type: Pods
        value: 2
        periodSeconds: 120
  scaleDown:
    stabilizationWindowSeconds: 600
    policies:
      - type: Pods
        value: 1
        periodSeconds: 600

Monitoring HPA Behavior

# Check current scaling decisions
kubectl describe hpa webapp-hpa

# Watch scaling events
kubectl get events --field-selector reason=SuccessfulRescale

# View HPA conditions
kubectl get hpa webapp-hpa -o jsonpath='{.status.conditions[*].message}'

Common Issues

IssueCauseFix
Flapping (scale up/down loop)stabilizationWindow too shortIncrease to 300-600s for scale-down
Slow response to spikesstabilizationWindow too long on scale-upSet to 0 for scale-up
Over-provisioningselectPolicy: Max on scale-upUse Percent policy with lower values
Pods never scale downselectPolicy: DisabledRemove or change to Min

Best Practices

  1. Always set stabilization for scale-down β€” 300s minimum prevents flapping
  2. Use Percent for scale-up β€” Scales proportionally regardless of current size
  3. Use Pods for scale-down β€” Predictable, gradual reduction
  4. Monitor with events β€” SuccessfulRescale events show actual scaling decisions
  5. Test with load generators β€” Verify behavior before production

Key Takeaways

  • behavior.scaleUp and behavior.scaleDown are configured independently
  • stabilizationWindowSeconds prevents rapid oscillation
  • selectPolicy: Max = aggressive (more pods), Min = conservative (fewer changes)
  • Multiple policies can be defined; selectPolicy picks which to apply
  • Set scale-up stabilization to 0 for immediate response to traffic spikes
#hpa #autoscaling #scaling-policies #stabilization
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens