Kubernetes HPA Custom Metrics Prometheus Adapter
Configure Kubernetes Horizontal Pod Autoscaler with custom Prometheus metrics via the Prometheus Adapter. Scale on request latency, queue depth, GPU
π‘ Quick Answer: The Prometheus Adapter exposes Prometheus metrics as Kubernetes custom metrics API, enabling HPA to scale on any metric. Install the adapter with Helm, configure metric rules to map PromQL queries to the
custom.metrics.k8s.ioAPI, then reference metrics in your HPA spec withtype: Podsortype: Object.
The Problem
- Default HPA only scales on CPU and memory β insufficient for many workloads
- Need to scale on: request latency (P99), queue depth, active connections, GPU utilization
- Prometheus has the metrics but HPA canβt access them directly
- Custom metrics API bridge is complex to configure
- Business metrics (orders/sec, active users) should drive scaling
The Solution
Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc \
--set prometheus.port=9090 \
--values adapter-values.yamlConfigure Metric Rules
# adapter-values.yaml
rules:
default: false
custom:
# Rule 1: HTTP requests per second per pod
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
# Rule 2: Request latency P99
- seriesQuery: 'http_request_duration_seconds_bucket{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: ".*"
as: "http_request_duration_p99"
metricsQuery: 'histogram_quantile(0.99, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (le, <<.GroupBy>>))'
# Rule 3: Queue depth (external metric)
- seriesQuery: 'rabbitmq_queue_messages{namespace!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
name:
matches: "^(.*)$"
as: "queue_messages"
metricsQuery: 'sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
# Rule 4: GPU utilization per pod
- seriesQuery: 'DCGM_FI_DEV_GPU_UTIL{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)$"
as: "gpu_utilization"
metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
external:
# External metrics (not associated with K8s objects)
- seriesQuery: 'sqs_queue_depth{queue_name!=""}'
name:
matches: "^(.*)$"
as: "sqs_queue_depth"
metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'Verify Custom Metrics Available
# Check custom metrics API
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
# List available metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .
# Check external metrics
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .HPA with Custom Metrics
# Scale on requests per second
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
# Scale on RPS per pod (target 100 req/s per pod)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
# Also consider CPU (but not as primary)
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 50
periodSeconds: 30HPA with External Metrics (Queue-Based)
# Scale workers based on queue depth
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-worker-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-worker
minReplicas: 1
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: sqs_queue_depth
selector:
matchLabels:
queue_name: "orders-queue"
target:
type: Value
value: "30" # Scale up when >30 messages per replicaHPA with Object Metrics (Ingress RPS)
# Scale based on Ingress requests per second
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: frontend-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
minReplicas: 3
maxReplicas: 30
metrics:
- type: Object
object:
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: frontend-ingress
metric:
name: requests_per_second
target:
type: Value
value: "1000" # Scale when total ingress RPS > 1000Common Issues
βunable to fetch metrics from custom metrics APIβ
- Cause: Prometheus Adapter not running, or rules misconfigured
- Fix: Check adapter pod logs; test with
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
Metrics return 0 or missing
- Cause: PromQL query returns no data; label matchers donβt match
- Fix: Test the query directly in Prometheus; verify pod/namespace labels exist
HPA not scaling despite high metric value
- Cause:
stabilizationWindowSecondspreventing rapid changes; or metric below target - Fix: Check
kubectl describe hpa; reduce stabilization window; verify target value
βno matches for kind HorizontalPodAutoscaler in version autoscaling/v2β
- Cause: Kubernetes version too old (v2 stable since 1.23)
- Fix: Use
autoscaling/v2beta2for K8s <1.23
Best Practices
- Test PromQL before configuring adapter β verify query returns expected data
- Use rate() for counter metrics β raw counters arenβt useful for scaling
- Set stabilization windows β prevent flapping (5min down, 30s up is common)
- Combine multiple metrics β HPA uses the highest recommendation
- Scale on leading indicators β queue depth > CPU (scales before overload)
- Set appropriate
averageValueβ represents per-pod target, not total - Monitor HPA decisions β
kubectl describe hpashows scaling rationale
Key Takeaways
- Prometheus Adapter bridges Prometheus metrics β Kubernetes custom metrics API
- HPA can scale on any Prometheus metric: RPS, latency, queue depth, GPU util
- Three metric types:
Pods(per-pod average),Object(single resource),External(non-K8s) - Rules map PromQL queries to API-compatible metric names
behaviorfield controls scale-up/down speed and stabilization- Leading indicators (queue depth, latency) are better scaling signals than CPU
- Always verify metrics with
kubectl get --rawbefore creating HPA

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
