πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Autoscaling advanced ⏱ 20 minutes K8s 1.28+

Custom Metrics with Prometheus Adapter

Expose application metrics to Kubernetes HPA via Prometheus Adapter. Configure custom.metrics.k8s.io for HTTP requests per second, queue depth.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Deploy Prometheus Adapter, configure metric rules to map Prometheus queries to the custom.metrics.k8s.io API, then create an HPA targeting your custom metric (e.g., http_requests_per_second). The adapter bridges Prometheus and the Kubernetes metrics API.

The Problem

HPA’s built-in CPU/memory metrics don’t reflect application-level load. An API server might be at 20% CPU but have 1000 queued requests. GPU inference services need to scale on request concurrency, not CPU. Custom metrics from Prometheus enable HPA to scale on what actually matters.

The Solution

Install Prometheus Adapter

helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus-server.monitoring.svc \
  --set prometheus.port=9090

Configure Metric Rules

# values.yaml for prometheus-adapter
rules:
  custom:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
    
    - seriesQuery: 'gpu_utilization_percent{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        as: "gpu_utilization"
      metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>})'
    
    - seriesQuery: 'request_queue_depth{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        as: "request_queue_depth"
      metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>})'

Verify Custom Metrics API

# List available custom metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'

# Query specific metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .

HPA with Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 120

GPU Inference Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: inference-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: inference-server
  minReplicas: 1
  maxReplicas: 8
  metrics:
    - type: Pods
      pods:
        metric:
          name: request_queue_depth
        target:
          type: AverageValue
          averageValue: "5"
    - type: Pods
      pods:
        metric:
          name: gpu_utilization
        target:
          type: AverageValue
          averageValue: "80"
graph LR
    APP[Application<br/>exports metrics] -->|Scrape| PROM[Prometheus]
    PROM -->|PromQL query| ADAPTER[Prometheus Adapter<br/>custom.metrics.k8s.io]
    ADAPTER -->|Metrics API| HPA[HPA Controller]
    HPA -->|Scale| DEPLOY[Deployment<br/>replicas: 2-20]

Common Issues

Custom metric returns β€œnot found”

Check that Prometheus has the metric: curl prometheus:9090/api/v1/query?query=http_requests_total. Then verify adapter rules match the series name.

HPA shows β€œunable to fetch metrics”

The adapter might not be running or the APIService isn’t registered:

kubectl get apiservice v1beta1.custom.metrics.k8s.io

Best Practices

  • Use rate() for counter metrics β€” raw counters always increase; rate gives meaningful per-second values
  • Set stabilizationWindowSeconds β€” prevents flapping (scale-down should be slower than scale-up)
  • Multiple metrics in HPA β€” HPA uses the metric that recommends the HIGHEST replica count
  • Test with kubectl get --raw β€” verify metrics are available before creating HPA

Key Takeaways

  • Prometheus Adapter bridges Prometheus metrics to Kubernetes custom metrics API
  • HPA can scale on any Prometheus metric β€” HTTP RPS, queue depth, GPU utilization
  • Metric rules transform Prometheus queries into the custom.metrics.k8s.io format
  • Multiple metrics per HPA: highest recommendation wins
  • Scale-down should be slower than scale-up to prevent flapping
#prometheus #custom-metrics #hpa #autoscaling #prometheus-adapter
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens