πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Observability intermediate ⏱ 15 minutes K8s 1.28+

Kubernetes Monitoring with Prometheus and Grafana

Set up Kubernetes monitoring with Prometheus and Grafana. Covers kube-prometheus-stack, custom dashboards, alerting rules, and key metrics to monitor.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Set up Kubernetes monitoring with Prometheus and Grafana. Covers kube-prometheus-stack, custom dashboards, alerting rules, and key metrics to monitor.

The Problem

This is one of the most searched Kubernetes topics. A comprehensive, well-structured guide helps engineers of all levels quickly find actionable solutions.

The Solution

Detailed implementation with production-ready examples below.

Install kube-prometheus-stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.adminPassword=admin \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

Access Dashboards

# Grafana
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Open http://localhost:3000 (admin/admin)

# Prometheus
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090

# AlertManager
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-alertmanager 9093:9093

Key Metrics to Monitor

MetricQueryAlert Threshold
CPU usagerate(container_cpu_usage_seconds_total[5m])>80% sustained
Memory usagecontainer_memory_working_set_bytes>85% of limit
Pod restartskube_pod_container_status_restarts_total>5 in 1h
Node disknode_filesystem_avail_bytes<10% free
API server latencyapiserver_request_duration_secondsp99 >1s

Custom Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: custom-alerts
  namespace: monitoring
spec:
  groups:
    - name: pod-alerts
      rules:
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} is crash looping"
        - alert: HighMemoryUsage
          expr: container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.9
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.pod }} memory >90% of limit"
graph LR
    A[Kubernetes Cluster] --> B[Prometheus - scrapes metrics]
    B --> C[AlertManager - routes alerts]
    B --> D[Grafana - dashboards]
    C --> E[Slack/PagerDuty/Email]
    D --> F[Pre-built K8s dashboards]

Frequently Asked Questions

What’s included in kube-prometheus-stack?

Prometheus, Grafana, AlertManager, node-exporter, kube-state-metrics, and pre-configured dashboards + alerts for Kubernetes. It’s the standard monitoring stack.

Common Issues

Check kubectl describe and kubectl get events first β€” most issues have clear error messages pointing to the root cause.

Best Practices

  • Follow least privilege β€” only grant the access that’s needed
  • Test in staging before applying to production
  • Monitor and alert on key metrics
  • Document your runbooks for the team

Key Takeaways

  • Essential knowledge for Kubernetes operations
  • Start simple and evolve your approach
  • Automation reduces human error
  • Share knowledge with your team
#monitoring #prometheus #grafana #alerting #metrics #kubernetes
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens