How to Scale Based on Custom Metrics
Configure Horizontal Pod Autoscaler with custom and external metrics. Learn to scale on application-specific metrics like queue depth and request latency.
How to Scale Based on Custom Metrics
Scale your workloads based on application-specific metrics like queue depth, request latency, or business KPIs using the custom metrics API with Prometheus Adapter.
Architecture Overview
flowchart LR
HPA[HPA] --> PA[Prometheus Adapter]
PA --> P[Prometheus]
PA --> API["custom.metrics.k8s.io<br/>external.metrics.k8s.io"]Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus.monitoring.svc \
--set prometheus.port=9090Configure Custom Metrics Rules
# prometheus-adapter-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-adapter
namespace: monitoring
data:
config.yaml: |
rules:
# HTTP requests per second
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
# Request latency (p99)
- seriesQuery: 'http_request_duration_seconds_bucket{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_bucket$"
as: "${1}_p99"
metricsQuery: 'histogram_quantile(0.99, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (le, <<.GroupBy>>))'
# Queue depth
- seriesQuery: 'rabbitmq_queue_messages{namespace!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
name:
matches: "^(.*)$"
as: "queue_messages"
metricsQuery: 'sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
# Active connections
- seriesQuery: 'app_active_connections{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)$"
as: "active_connections"
metricsQuery: 'sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'Verify Custom Metrics Available
# List available custom metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
# Query specific metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq .
# List external metrics
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .HPA with Custom Metrics
# hpa-custom-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
# Scale on requests per second per pod
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 1000 # 1000 req/s per pod
# Scale on latency
- type: Pods
pods:
metric:
name: http_request_duration_p99
target:
type: AverageValue
averageValue: 500m # 500ms p99 latency target
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: MaxHPA with External Metrics
# hpa-external-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-worker
minReplicas: 1
maxReplicas: 50
metrics:
# Scale based on queue depth (external metric)
- type: External
external:
metric:
name: rabbitmq_queue_messages
selector:
matchLabels:
queue: tasks
target:
type: AverageValue
averageValue: 10 # 10 messages per podApplication Exposing Custom Metrics
# app-with-metrics.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
template:
spec:
containers:
- name: api
image: api-server:v1
ports:
- containerPort: 8080
name: http
- containerPort: 9090
name: metricsSample metrics endpoint output:
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api/users",status="200"} 15234
http_requests_total{method="POST",path="/api/orders",status="201"} 3421
# HELP http_request_duration_seconds HTTP request latency
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1"} 14532
http_request_duration_seconds_bucket{le="0.5"} 15100
http_request_duration_seconds_bucket{le="1.0"} 15200
http_request_duration_seconds_bucket{le="+Inf"} 15234
# HELP app_active_connections Current active connections
# TYPE app_active_connections gauge
app_active_connections 42Multiple Metrics Combined
# hpa-multiple-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 100
metrics:
# CPU as baseline
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Custom requests metric
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 500
# Memory as safety
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Object Metrics (Ingress-based)
# hpa-object-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ingress-based-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 50
metrics:
- type: Object
object:
metric:
name: requests_per_second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: web-ingress
target:
type: Value
value: 10000 # Total 10k req/s across all podsDebug HPA Scaling
# Check HPA status
kubectl describe hpa api-hpa
# View HPA events
kubectl get events --field-selector involvedObject.name=api-hpa
# Check current metric values
kubectl get hpa api-hpa -o yaml | grep -A 20 "status:"
# Verify adapter is serving metrics
kubectl logs -n monitoring -l app=prometheus-adapterCustom Metric ServiceMonitor
# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: api-server
namespace: monitoring
spec:
selector:
matchLabels:
app: api-server
namespaceSelector:
matchNames:
- default
endpoints:
- port: metrics
interval: 15s
path: /metricsSummary
Custom metrics HPA enables scaling based on business-relevant metrics. Deploy Prometheus Adapter to bridge Prometheus metrics to the Kubernetes metrics API. Configure rules to expose application metrics like request rate, latency, and queue depth for intelligent autoscaling.
📘 Go Further with Kubernetes Recipes
Love this recipe? There’s so much more! This is just one of 100+ hands-on recipes in our comprehensive Kubernetes Recipes book.
Inside the book, you’ll master:
- ✅ Production-ready deployment strategies
- ✅ Advanced networking and security patterns
- ✅ Observability, monitoring, and troubleshooting
- ✅ Real-world best practices from industry experts
“The practical, recipe-based approach made complex Kubernetes concepts finally click for me.”
👉 Get Your Copy Now — Start building production-grade Kubernetes skills today!
📘 Get All 100+ Recipes in One Book
Stop searching — get every production-ready pattern with detailed explanations, best practices, and copy-paste YAML.
Want More Kubernetes Recipes?
This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.