K8s Horizontal Scaling: Manual and Auto
Scale Kubernetes workloads horizontally with kubectl scale, HPA, and KEDA. Covers replica management and event-driven scaling strategies.
π‘ Quick Answer: Manual:
kubectl scale deployment/app --replicas=5. Auto: HPA scales on CPU/memory (kubectl autoscale deployment/app --min=2 --max=10 --cpu-percent=70). Event-driven: KEDA scales on queue length, cron, Prometheus metrics. Combine HPA + PDB for safe scaling. For scale-to-zero, use KEDA β HPA minimum is 1.
The Problem
Static replica counts waste resources or cause outages:
- Fixed replicas β over-provisioned during low traffic
- Fixed replicas β under-provisioned during peak
- Manual scaling β human latency, error-prone
- Need event-driven scaling (queue depth, scheduled traffic)
The Solution
Manual Scaling
# Scale deployment
kubectl scale deployment/web --replicas=5
# Scale statefulset
kubectl scale statefulset/db --replicas=3
# Scale replicaset directly (not recommended)
kubectl scale rs/web-abc123 --replicas=5
# Conditional scale (only if current replicas match)
kubectl scale deployment/web --replicas=5 --current-replicas=3
# Scale to zero (for non-production)
kubectl scale deployment/web --replicas=0HPA (CPU/Memory)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 20
metrics:
# CPU target
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Memory target
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100 # Double replicas
periodSeconds: 60
- type: Pods
value: 4 # Or add 4 pods
periodSeconds: 60
selectPolicy: Max # Use whichever adds more
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
policies:
- type: Percent
value: 25 # Remove 25% of replicas
periodSeconds: 60
selectPolicy: Min # Conservative scale-down# Quick HPA creation
kubectl autoscale deployment/web --min=2 --max=10 --cpu-percent=70
# Monitor HPA
kubectl get hpa -w
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# web-hpa Deployment/web 45%/70% 2 20 3
# Detailed status
kubectl describe hpa web-hpaCustom Metrics HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 50
metrics:
# Prometheus custom metric (via prometheus-adapter)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100 # Scale when >100 rps per pod
# External metric (e.g., queue length)
- type: External
external:
metric:
name: sqs_queue_length
selector:
matchLabels:
queue: orders
target:
type: Value
value: 50 # Scale when queue > 50KEDA (Event-Driven Autoscaling)
# Install KEDA
# helm repo add kedacore https://kedacore.github.io/charts
# helm install keda kedacore/keda -n keda-system
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor
spec:
scaleTargetRef:
name: order-processor # Deployment name
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 50
cooldownPeriod: 300
triggers:
# Scale on RabbitMQ queue length
- type: rabbitmq
metadata:
queueName: orders
host: amqp://rabbitmq.default:5672
queueLength: "10" # 1 pod per 10 messages
# Scale on Prometheus metric
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_total
query: sum(rate(http_requests_total{app="web"}[2m]))
threshold: "100"
# Scale on cron schedule
- type: cron
metadata:
timezone: UTC
start: "0 8 * * 1-5" # Weekdays 8 AM
end: "0 18 * * 1-5" # Weekdays 6 PM
desiredReplicas: "10" # Business hours scaling
---
# Scale Jobs (for batch processing)
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: batch-processor
spec:
jobTargetRef:
template:
spec:
containers:
- name: processor
image: batch:v1
restartPolicy: Never
triggers:
- type: rabbitmq
metadata:
queueName: batch-jobs
queueLength: "1"
maxReplicaCount: 20
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 3Scaling + PDB (Safe Scaling)
# PodDisruptionBudget β protect during scale-down
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
selector:
matchLabels:
app: web
minAvailable: 2 # Always keep 2 running
# Or: maxUnavailable: 1 # Allow 1 down at a timeScaling Strategy Comparison
Manual (kubectl scale):
β
Simple, immediate
β Requires human intervention
β Use for: one-time adjustments, emergencies
HPA (CPU/Memory):
β
Built-in, no extra components
β Min 1 replica, lag on sudden spikes
β Use for: steady web traffic
HPA (Custom Metrics):
β
Scale on business metrics
β Requires metrics pipeline (Prometheus + adapter)
β Use for: RPS-based scaling
KEDA:
β
Scale to zero, 50+ event sources
β Extra component to manage
β Use for: event-driven, queue-based, scheduled scalingCommon Issues
HPA shows βunknownβ targets
Metrics server not installed or pod has no resource requests. HPA needs requests to calculate utilization.
Scaling too aggressive (flapping)
Add stabilization windows in behavior. Scale-down stabilization of 300s prevents rapid fluctuations.
KEDA not scaling to zero
Check minReplicaCount: 0 and cooldownPeriod. KEDA needs the trigger source to report 0 events.
Best Practices
- Always set resource requests β HPA requires them for CPU/memory scaling
- Use stabilization windows β prevent flapping on bursty traffic
- Combine HPA + PDB β scale safely without dropping below minimum
- KEDA for event-driven β queues, cron, external metrics
- Monitor actual vs desired β
kubectl get hpa -wto verify behavior
Key Takeaways
- Manual:
kubectl scalefor immediate, one-time changes - HPA: auto-scale on CPU/memory or custom metrics (min 1 replica)
- KEDA: event-driven scaling with scale-to-zero support (50+ triggers)
- Always set stabilization windows to prevent rapid scaling oscillation
- PDB ensures minimum availability during scale-down operations

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
