πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Observability intermediate ⏱ 20 minutes K8s 1.28+

K8s Pod Resource Monitoring with Grafana

Monitor Kubernetes pod CPU and memory with Grafana dashboards. Prometheus queries for resource usage, request vs limit tracking.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Use Prometheus + Grafana to monitor pod CPU/memory. Key queries: container_cpu_usage_seconds_total for actual CPU, container_memory_working_set_bytes for memory, compared against kube_pod_container_resource_requests and kube_pod_container_resource_limits. Import Grafana dashboard 15759 for a ready-made pod resource overview.

The Problem

kubectl top shows a snapshot, but you need historical trends to right-size pods, catch memory leaks, and prevent OOMKills. Grafana dashboards with Prometheus data show: actual vs requested resources (are you over/under-provisioning?), trends over time, per-namespace costs, and alerts when containers approach limits.

flowchart LR
    PODS["Pods"] -->|"cAdvisor metrics"| PROM["Prometheus"]
    KSM["kube-state-metrics"] -->|"Request/limit metadata"| PROM
    PROM -->|"PromQL queries"| GRAFANA["Grafana Dashboards"]
    GRAFANA --> ALERTS["Alert Rules"]

The Solution

Essential PromQL Queries

# CPU usage rate (cores) per pod
sum(rate(container_cpu_usage_seconds_total{
  namespace="my-app", container!=""
}[5m])) by (pod)

# Memory usage (bytes) per pod
sum(container_memory_working_set_bytes{
  namespace="my-app", container!=""
}) by (pod)

# CPU usage vs requests (percentage)
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, namespace)
/
sum(kube_pod_container_resource_requests{resource="cpu"}) by (pod, namespace)
* 100

# Memory usage vs limits (OOMKill risk)
sum(container_memory_working_set_bytes{container!=""}) by (pod, namespace)
/
sum(kube_pod_container_resource_limits{resource="memory"}) by (pod, namespace)
* 100

# Pods with no resource requests set (bad practice)
kube_pod_container_resource_requests{resource="cpu"} == 0

# Top 10 memory consumers
topk(10,
  sum(container_memory_working_set_bytes{container!=""}) by (pod, namespace)
)

# CPU throttling percentage
sum(rate(container_cpu_cfs_throttled_periods_total[5m])) by (pod)
/
sum(rate(container_cpu_cfs_periods_total[5m])) by (pod)
* 100

Grafana Dashboard JSON

{
  "title": "Pod Resource Monitor",
  "panels": [
    {
      "title": "CPU Usage vs Requests",
      "type": "timeseries",
      "targets": [{
        "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=~\"$namespace\",container!=\"\"}[5m])) by (pod)",
        "legendFormat": "{{pod}} usage"
      }, {
        "expr": "sum(kube_pod_container_resource_requests{namespace=~\"$namespace\",resource=\"cpu\"}) by (pod)",
        "legendFormat": "{{pod}} request"
      }]
    },
    {
      "title": "Memory Usage vs Limits",
      "type": "timeseries",
      "targets": [{
        "expr": "sum(container_memory_working_set_bytes{namespace=~\"$namespace\",container!=\"\"}) by (pod)",
        "legendFormat": "{{pod}} usage"
      }, {
        "expr": "sum(kube_pod_container_resource_limits{namespace=~\"$namespace\",resource=\"memory\"}) by (pod)",
        "legendFormat": "{{pod}} limit"
      }]
    },
    {
      "title": "CPU Throttling %",
      "type": "gauge",
      "targets": [{
        "expr": "avg(rate(container_cpu_cfs_throttled_periods_total{namespace=~\"$namespace\"}[5m]) / rate(container_cpu_cfs_periods_total{namespace=~\"$namespace\"}[5m]) * 100)"
      }],
      "thresholds": [
        {"color": "green", "value": 0},
        {"color": "yellow", "value": 25},
        {"color": "red", "value": 50}
      ]
    },
    {
      "title": "OOMKill Events",
      "type": "stat",
      "targets": [{
        "expr": "sum(increase(kube_pod_container_status_restarts_total{namespace=~\"$namespace\"}[24h]))"
      }]
    }
  ]
}

Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: pod-resource-alerts
spec:
  groups:
    - name: pod-resources
      rules:
        - alert: PodMemoryNearLimit
          expr: |
            (container_memory_working_set_bytes{container!=""}
            / on(pod,namespace,container) kube_pod_container_resource_limits{resource="memory"})
            > 0.9
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} using >90% memory limit"
            
        - alert: PodCPUThrottled
          expr: |
            rate(container_cpu_cfs_throttled_periods_total[5m])
            / rate(container_cpu_cfs_periods_total[5m])
            > 0.5
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} CPU throttled >50%"

        - alert: PodOverProvisionedCPU
          expr: |
            (sum(rate(container_cpu_usage_seconds_total{container!=""}[24h])) by (pod,namespace)
            / sum(kube_pod_container_resource_requests{resource="cpu"}) by (pod,namespace))
            < 0.1
          for: 24h
          labels:
            severity: info
          annotations:
            summary: "Pod {{ $labels.pod }} using <10% of CPU request β€” consider reducing"

Import Pre-Built Dashboards

# Popular community dashboards:
# 15759 β€” Kubernetes Pod Resources Overview
# 6417  β€” Node Exporter Full (already covered in our guide)
# 3119  β€” Kubernetes Cluster Monitoring
# 1860  β€” Node Exporter for Prometheus

# Import via Grafana UI: + β†’ Import β†’ Enter dashboard ID
# Or via API:
curl -X POST http://grafana:3000/api/dashboards/import \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GRAFANA_TOKEN" \
  -d '{"dashboard": {"id": 15759}, "overwrite": true, "inputs": [{"name": "DS_PROMETHEUS", "value": "Prometheus"}]}'

Common Issues

IssueCauseFix
No container metricscAdvisor not scrapingCheck Prometheus targets for kubelet
kube_pod_container_resource_* missingkube-state-metrics not installedInstall kube-state-metrics
Dashboard shows no dataWrong Prometheus datasourceVerify datasource in Grafana
CPU throttling high but usage lowCPU limit too lowIncrease limits or remove CPU limits
Memory shows rss not working_setWrong metricUse container_memory_working_set_bytes (OOMKill basis)

Best Practices

  • Use working_set_bytes not rss β€” Kubernetes OOMKiller uses working set
  • Track requests AND limits β€” requests affect scheduling; limits affect throttling/OOM
  • Alert on >90% memory limit β€” gives time to react before OOMKill
  • Monitor CPU throttling β€” high throttling means CPU limits are too low
  • Review over-provisioned pods weekly β€” <10% usage means wasted resources
  • Use namespace-scoped dashboards β€” team owners monitor their own workloads

Key Takeaways

  • Prometheus + Grafana provides historical pod resource monitoring
  • Key metrics: cpu_usage, memory_working_set, cpu_throttled, resource_requests/limits
  • Dashboard 15759 is a great starting point for pod resource overview
  • Alert on memory approaching limits (>90%) and high CPU throttling (>50%)
  • Track usage vs requests to right-size pods and reduce waste
#grafana #prometheus #resource-monitoring #dashboards #alerting
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens