πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Observability intermediate ⏱ 15 minutes K8s 1.28+

Grafana Kubernetes Monitoring Dashboards Guide

Deploy and configure Grafana dashboards for Kubernetes monitoring including dashboard 6417 for pod metrics, dashboard 315 for cluster overview, and custom

By Luca Berton β€’ β€’ πŸ“– 6 min read

πŸ’‘ Quick Answer: Grafana dashboard 6417 (β€œKubernetes Pods”) and dashboard 315 (β€œKubernetes Cluster Monitoring”) are the most popular community dashboards. Import them via ID in Grafana UI or provision as ConfigMaps with the kube-prometheus-stack Helm chart. Both require a Prometheus data source scraping kube-state-metrics and kubelet/cAdvisor.

The Problem

  • Kubernetes generates thousands of metrics but no built-in visualization
  • Setting up dashboards from scratch is time-consuming and error-prone
  • Community dashboards (6417, 315) require specific Prometheus labels to work
  • Dashboard provisioning must survive pod restarts (GitOps-friendly)
  • GPU, storage, and networking metrics need additional dashboards beyond defaults
  • kube-prometheus-stack ships dashboards but they may not cover all use cases

The Solution

Install Grafana with kube-prometheus-stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword="admin" \
  --set grafana.persistence.enabled=true \
  --set grafana.persistence.size=10Gi

Dashboard 6417: Kubernetes Pods Monitoring

Dashboard 6417 provides per-pod CPU, memory, network, and filesystem metrics.

# Import via Grafana ConfigMap provisioning
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboard-k8s-pods
  namespace: monitoring
  labels:
    grafana_dashboard: "1"    # Sidecar picks up ConfigMaps with this label
data:
  k8s-pods.json: |-
    {
      "__inputs": [{
        "name": "DS_PROMETHEUS",
        "type": "datasource",
        "pluginId": "prometheus"
      }],
      "id": null,
      "uid": "k8s-pods-6417",
      "title": "Kubernetes Pods",
      "tags": ["kubernetes", "pods"],
      "timezone": "browser",
      "panels": []
    }

Import by ID (easiest method):

1. Open Grafana β†’ Dashboards β†’ Import
2. Enter dashboard ID: 6417
3. Select Prometheus data source
4. Click Import

Required metrics (from kube-state-metrics + cAdvisor):
β€’ container_cpu_usage_seconds_total
β€’ container_memory_working_set_bytes
β€’ container_network_receive_bytes_total
β€’ container_network_transmit_bytes_total
β€’ container_fs_usage_bytes
β€’ kube_pod_info
β€’ kube_pod_container_resource_requests
β€’ kube_pod_container_resource_limits

Dashboard 315: Kubernetes Cluster Monitoring

Dashboard 315 provides cluster-level overview: node status, total pods, CPU/memory pressure, and namespace breakdown.

Import by ID: 315
Name: "Kubernetes Cluster Monitoring (via Prometheus)"

Required metrics:
β€’ node_cpu_seconds_total (node-exporter)
β€’ node_memory_MemAvailable_bytes (node-exporter)
β€’ kubelet_running_pods
β€’ kube_node_status_condition
β€’ kube_namespace_status_phase
β€’ kube_deployment_status_replicas

Provision Dashboards via Helm Values

# kube-prometheus-stack values for dashboard provisioning
grafana:
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
        - name: default
          orgId: 1
          folder: "Kubernetes"
          type: file
          disableDeletion: false
          editable: true
          options:
            path: /var/lib/grafana/dashboards/default

  dashboards:
    default:
      # Import community dashboards by ID
      kubernetes-pods:
        gnetId: 6417
        revision: 1
        datasource: Prometheus
      cluster-monitoring:
        gnetId: 315
        revision: 3
        datasource: Prometheus
      node-exporter-full:
        gnetId: 1860
        revision: 33
        datasource: Prometheus
      nginx-ingress:
        gnetId: 9614
        revision: 1
        datasource: Prometheus

  # Sidecar for ConfigMap-based dashboards
  sidecar:
    dashboards:
      enabled: true
      label: grafana_dashboard
      labelValue: "1"
      searchNamespace: ALL
      folderAnnotation: grafana_folder
      provider:
        foldersFromFilesStructure: true

Essential Kubernetes Dashboards

ID     | Name                                    | Focus
───────┼─────────────────────────────────────────┼──────────────────────
6417   | Kubernetes Pods                         | Per-pod CPU/mem/net/fs
315    | Kubernetes Cluster Monitoring           | Cluster overview
1860   | Node Exporter Full                      | Per-node system metrics
9614   | NGINX Ingress Controller                | Ingress traffic/errors
7249   | Kubernetes Cluster (Prometheus)         | Namespace breakdown
14205  | Kubernetes PVC (Volumes)                | PV/PVC utilization
12006  | Kubernetes apiserver                    | API server performance
13332  | kube-state-metrics v2                   | KSM full metrics
14981  | CoreDNS                                 | DNS query rates/errors
12239  | ETCD                                    | etcd cluster health
───────┴─────────────────────────────────────────┴──────────────────────

GPU-Specific:
12239  | NVIDIA DCGM Exporter                    | GPU utilization/temp/mem
18462  | NVIDIA GPU Operator                     | Operator health + GPUs

Custom Dashboard: Namespace Resource Usage

{
  "title": "Namespace Resource Usage",
  "uid": "ns-resources",
  "panels": [
    {
      "title": "CPU Usage by Namespace",
      "type": "timeseries",
      "targets": [{
        "expr": "sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",container!=\"\"}[5m])) by (namespace)",
        "legendFormat": "{{namespace}}"
      }]
    },
    {
      "title": "Memory Usage by Namespace",
      "type": "timeseries",
      "targets": [{
        "expr": "sum(container_memory_working_set_bytes{namespace!=\"\",container!=\"\"}) by (namespace)",
        "legendFormat": "{{namespace}}"
      }]
    },
    {
      "title": "Pod Count by Namespace",
      "type": "stat",
      "targets": [{
        "expr": "count(kube_pod_info) by (namespace)",
        "legendFormat": "{{namespace}}"
      }]
    }
  ]
}

Useful PromQL Queries for Kubernetes Dashboards

# Top 10 pods by CPU
topk(10, sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, namespace))

# Pods exceeding memory requests
sum(container_memory_working_set_bytes{container!=""}) by (pod, namespace)
/
sum(kube_pod_container_resource_requests{resource="memory"}) by (pod, namespace) > 1

# OOMKilled pods in last hour
increase(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[1h]) > 0

# Node CPU saturation (>80%)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80

# PVC usage percentage
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100 > 80

# API server request latency P99
histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{verb!="WATCH"}[5m])) by (le, verb))

# Pod restart rate
sum(increase(kube_pod_container_status_restarts_total[1h])) by (pod, namespace) > 3

GitOps Dashboard Provisioning with ArgoCD

# Store dashboards in Git, deploy via ArgoCD
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboard-gpu-cluster
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
  annotations:
    grafana_folder: "GPU Monitoring"
data:
  gpu-cluster.json: |
    {
      "title": "GPU Cluster Overview",
      "panels": [
        {
          "title": "GPU Utilization",
          "targets": [{
            "expr": "avg(DCGM_FI_DEV_GPU_UTIL) by (gpu, Hostname)"
          }]
        },
        {
          "title": "GPU Memory Used",
          "targets": [{
            "expr": "DCGM_FI_DEV_FB_USED / (DCGM_FI_DEV_FB_USED + DCGM_FI_DEV_FB_FREE) * 100"
          }]
        }
      ]
    }

Common Issues

Dashboard shows β€œNo Data” after import

  • Cause: Prometheus data source not selected, or metric names differ between versions
  • Fix: Verify data source in dashboard settings; check metric names in Prometheus explore

Dashboard 6417 missing some pods

  • Cause: kube-state-metrics not scraping all namespaces
  • Fix: Ensure KSM has cluster-wide RBAC; check --namespaces flag isn’t set

Grafana sidecar not picking up ConfigMap dashboards

  • Cause: Wrong label (grafana_dashboard: "1" required) or sidecar not enabled
  • Fix: Check label matches sidecar.dashboards.label in Helm values; verify sidecar container is running

High cardinality causing slow dashboard load

  • Cause: Queries with too many label dimensions (e.g., all pods across all namespaces)
  • Fix: Add namespace filter variable; use topk() to limit series; set appropriate time range

Dashboard metrics missing after kube-prometheus-stack upgrade

  • Cause: Metric names changed between Prometheus/KSM versions (e.g., kube_pod_container_resource_requests_cpu_cores β†’ kube_pod_container_resource_requests{resource="cpu"})
  • Fix: Update dashboard panels to new metric names; import latest dashboard revision

Best Practices

  1. Provision dashboards as ConfigMaps β€” survives pod restarts, GitOps-friendly
  2. Use dashboard folders β€” organize by team/service via grafana_folder annotation
  3. Set refresh intervals wisely β€” 30s for ops dashboards, 5m for capacity planning
  4. Add alerting rules alongside dashboards β€” Grafana alerts or PrometheusRules
  5. Use template variables β€” namespace, pod, node selectors for interactive filtering
  6. Version dashboards in Git β€” track changes, review before deploying
  7. Limit time ranges β€” default to 6h/12h; long ranges cause high query load
  8. Test PromQL in Explore first β€” verify queries return expected data before adding to panels
  9. Pin dashboard revisions β€” community dashboards update; pin to tested revision
  10. Separate operational vs. capacity dashboards β€” different refresh rates and time ranges

Key Takeaways

  • Dashboard 6417 (Kubernetes Pods) and 315 (Cluster Monitoring) are the most popular Grafana dashboards for K8s
  • Import by ID in Grafana UI or provision via kube-prometheus-stack Helm values (gnetId)
  • ConfigMap + sidecar pattern enables GitOps dashboard management
  • Required stack: Prometheus + kube-state-metrics + node-exporter + cAdvisor (all included in kube-prometheus-stack)
  • Custom dashboards use PromQL β€” master sum, rate, topk, histogram_quantile for K8s metrics
  • GPU monitoring needs DCGM Exporter metrics (DCGM_FI_DEV_GPU_UTIL, DCGM_FI_DEV_FB_USED)
  • Label grafana_dashboard: "1" on ConfigMaps for automatic sidecar pickup
  • Always pin community dashboard revisions in production Helm values
#grafana #prometheus #monitoring #dashboards #metrics
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens