Custom Metrics with Prometheus Adapter
Expose application metrics to Kubernetes HPA via Prometheus Adapter. Configure custom.metrics.k8s.io for HTTP requests per second, queue depth.
π‘ Quick Answer: Deploy Prometheus Adapter, configure metric rules to map Prometheus queries to the
custom.metrics.k8s.ioAPI, then create an HPA targeting your custom metric (e.g.,http_requests_per_second). The adapter bridges Prometheus and the Kubernetes metrics API.
The Problem
HPAβs built-in CPU/memory metrics donβt reflect application-level load. An API server might be at 20% CPU but have 1000 queued requests. GPU inference services need to scale on request concurrency, not CPU. Custom metrics from Prometheus enable HPA to scale on what actually matters.
The Solution
Install Prometheus Adapter
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc \
--set prometheus.port=9090Configure Metric Rules
# values.yaml for prometheus-adapter
rules:
custom:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
- seriesQuery: 'gpu_utilization_percent{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
as: "gpu_utilization"
metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>})'
- seriesQuery: 'request_queue_depth{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
as: "request_queue_depth"
metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>})'Verify Custom Metrics API
# List available custom metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'
# Query specific metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .HPA with Custom Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 120GPU Inference Scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: inference-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: inference-server
minReplicas: 1
maxReplicas: 8
metrics:
- type: Pods
pods:
metric:
name: request_queue_depth
target:
type: AverageValue
averageValue: "5"
- type: Pods
pods:
metric:
name: gpu_utilization
target:
type: AverageValue
averageValue: "80"graph LR
APP[Application<br/>exports metrics] -->|Scrape| PROM[Prometheus]
PROM -->|PromQL query| ADAPTER[Prometheus Adapter<br/>custom.metrics.k8s.io]
ADAPTER -->|Metrics API| HPA[HPA Controller]
HPA -->|Scale| DEPLOY[Deployment<br/>replicas: 2-20]Common Issues
Custom metric returns βnot foundβ
Check that Prometheus has the metric: curl prometheus:9090/api/v1/query?query=http_requests_total. Then verify adapter rules match the series name.
HPA shows βunable to fetch metricsβ
The adapter might not be running or the APIService isnβt registered:
kubectl get apiservice v1beta1.custom.metrics.k8s.ioBest Practices
- Use
rate()for counter metrics β raw counters always increase; rate gives meaningful per-second values - Set
stabilizationWindowSecondsβ prevents flapping (scale-down should be slower than scale-up) - Multiple metrics in HPA β HPA uses the metric that recommends the HIGHEST replica count
- Test with
kubectl get --rawβ verify metrics are available before creating HPA
Key Takeaways
- Prometheus Adapter bridges Prometheus metrics to Kubernetes custom metrics API
- HPA can scale on any Prometheus metric β HTTP RPS, queue depth, GPU utilization
- Metric rules transform Prometheus queries into the
custom.metrics.k8s.ioformat - Multiple metrics per HPA: highest recommendation wins
- Scale-down should be slower than scale-up to prevent flapping

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
