Kubernetes Monitoring with Prometheus and Grafana
Set up Kubernetes monitoring with Prometheus and Grafana. Covers kube-prometheus-stack, custom dashboards, alerting rules, and key metrics to monitor.
π‘ Quick Answer: Set up Kubernetes monitoring with Prometheus and Grafana. Covers kube-prometheus-stack, custom dashboards, alerting rules, and key metrics to monitor.
The Problem
This is one of the most searched Kubernetes topics. A comprehensive, well-structured guide helps engineers of all levels quickly find actionable solutions.
The Solution
Detailed implementation with production-ready examples below.
Install kube-prometheus-stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.adminPassword=admin \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50GiAccess Dashboards
# Grafana
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Open http://localhost:3000 (admin/admin)
# Prometheus
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090
# AlertManager
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-alertmanager 9093:9093Key Metrics to Monitor
| Metric | Query | Alert Threshold |
|---|---|---|
| CPU usage | rate(container_cpu_usage_seconds_total[5m]) | >80% sustained |
| Memory usage | container_memory_working_set_bytes | >85% of limit |
| Pod restarts | kube_pod_container_status_restarts_total | >5 in 1h |
| Node disk | node_filesystem_avail_bytes | <10% free |
| API server latency | apiserver_request_duration_seconds | p99 >1s |
Custom Alert Rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: custom-alerts
namespace: monitoring
spec:
groups:
- name: pod-alerts
rules:
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 1h
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is crash looping"
- alert: HighMemoryUsage
expr: container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.9
for: 5m
labels:
severity: critical
annotations:
summary: "Pod {{ $labels.pod }} memory >90% of limit"graph LR
A[Kubernetes Cluster] --> B[Prometheus - scrapes metrics]
B --> C[AlertManager - routes alerts]
B --> D[Grafana - dashboards]
C --> E[Slack/PagerDuty/Email]
D --> F[Pre-built K8s dashboards]Frequently Asked Questions
Whatβs included in kube-prometheus-stack?
Prometheus, Grafana, AlertManager, node-exporter, kube-state-metrics, and pre-configured dashboards + alerts for Kubernetes. Itβs the standard monitoring stack.
Common Issues
Check kubectl describe and kubectl get events first β most issues have clear error messages pointing to the root cause.
Best Practices
- Follow least privilege β only grant the access thatβs needed
- Test in staging before applying to production
- Monitor and alert on key metrics
- Document your runbooks for the team
Key Takeaways
- Essential knowledge for Kubernetes operations
- Start simple and evolve your approach
- Automation reduces human error
- Share knowledge with your team

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
