Prometheus Alerting Rules Kubernetes
Write effective Prometheus alerting rules for Kubernetes. Alertmanager routing, inhibition, silence, and production-ready alert templates for CPU, memory.
π‘ Quick Answer: Create
PrometheusRuleresources with meaningful alerts. Use recording rules for expensive queries, route alerts through Alertmanager with severity-based routing, and implement inhibition rules to prevent alert storms. Start with the 5 essential K8s alerts: PodCrashing, NodeNotReady, PVCFull, CertExpiring, and HighErrorRate.
The Problem
Default Prometheus installations come with hundreds of alerts β most teams disable them all because of alert fatigue. The result: no alerts at all, and issues are discovered by users. You need a curated set of actionable alerts with proper routing and severity.
The Solution
Essential Kubernetes Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kubernetes-essential
namespace: monitoring
spec:
groups:
- name: kubernetes.essential
rules:
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 1h
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
- alert: NodeNotReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} is not ready"
- alert: PVCNearlyFull
expr: kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes > 0.85
for: 15m
labels:
severity: warning
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} is {{ $value | humanizePercentage }} full"
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/ sum(rate(http_requests_total[5m])) by (service) > 0.05
for: 10m
labels:
severity: critical
annotations:
summary: "{{ $labels.service }} has {{ $value | humanizePercentage }} error rate"
- alert: CertificateExpiringSoon
expr: certmanager_certificate_expiration_timestamp_seconds - time() < 7 * 86400
labels:
severity: warningAlertmanager Routing
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: routing
spec:
route:
groupBy: ['alertname', 'namespace']
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: default
routes:
- matchers:
- name: severity
value: critical
receiver: pagerduty
- matchers:
- name: severity
value: warning
receiver: slack
receivers:
- name: pagerduty
pagerdutyConfigs:
- routingKey:
name: pagerduty-secret
key: routing-key
- name: slack
slackConfigs:
- channel: '#alerts'
apiURL:
name: slack-secret
key: webhook-urlgraph TD
PROM[Prometheus<br/>Evaluate rules] -->|Firing alerts| AM[Alertmanager<br/>Route by severity]
AM -->|critical| PD[PagerDuty<br/>Wake someone up]
AM -->|warning| SLACK[Slack<br/>#alerts channel]
AM -->|info| LOG[Log only]
AM -->|Inhibition| SUPPRESS[Suppress PodCrash<br/>if NodeNotReady]Common Issues
Alert fatigue β too many alerts: Start with 5-10 essential alerts. Every alert must have a clear action. If the response is βlook at it later,β it should be a warning, not critical.
Alerts firing during maintenance: Use Alertmanager silences: amtool silence add alertname=NodeNotReady --duration=2h.
Best Practices
- Every alert must have a runbook β link in annotation
- Critical = wake someone up β use sparingly
- Warning = investigate during business hours
- Group by namespace β reduces alert spam
- Inhibition: NodeNotReady suppresses pod alerts on that node
Key Takeaways
- Start with 5 essential alerts: PodCrashing, NodeNotReady, PVCFull, CertExpiring, HighErrorRate
- Route critical alerts to PagerDuty, warnings to Slack
- Inhibition rules prevent cascading alert storms
- Every alert must have a clear action β if you canβt act on it, delete it
- Group alerts by namespace and alertname to reduce noise

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
