Monitor OpenClaw with Prometheus and Grafana
Set up monitoring for OpenClaw AI gateway on Kubernetes with Prometheus metrics, Grafana dashboards, and alerting for uptime, message throughput, and.
π‘ Quick Answer: OpenClaw exposes a Control UI on port 18789. Monitor its health with blackbox-exporter HTTP probes, pod metrics via kube-state-metrics, and container resource usage via cAdvisor. Set up alerts for pod restarts, OOM kills, and service downtime.
Key concept: OpenClaw is a single-process gateway β monitoring focuses on availability, resource usage, and restart frequency rather than request throughput.
Gotcha: OpenClaw doesnβt expose a
/metricsPrometheus endpoint natively. Use blackbox-exporter for HTTP health checks and kube-state-metrics for pod-level metrics.
The Problem
- No visibility into whether the AI assistant is online and responding
- Pod crashes go unnoticed until users complain
- No resource usage trending to right-size the deployment
- Channel disconnections (WhatsApp session expiry) arenβt detected
The Solution
Combine Kubernetes-native metrics (kube-state-metrics, cAdvisor) with HTTP health probes to build a comprehensive monitoring stack.
Monitoring Setup
# openclaw-monitoring.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: openclaw-health
namespace: openclaw
spec:
selector:
matchLabels:
app: openclaw
podMetricsEndpoints: []
---
# Blackbox exporter probe for HTTP health
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: openclaw-http
namespace: openclaw
spec:
interval: 30s
module: http_2xx
prober:
url: blackbox-exporter.monitoring:9115
targets:
staticConfig:
static:
- openclaw.openclaw.svc.cluster.local:80
---
# Alerting rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: openclaw-alerts
namespace: openclaw
spec:
groups:
- name: openclaw.rules
rules:
- alert: OpenClawDown
expr: probe_success{job="probe/openclaw/openclaw-http"} == 0
for: 3m
labels:
severity: critical
annotations:
summary: "OpenClaw gateway is unreachable"
description: "HTTP health check failed for 3+ minutes"
- alert: OpenClawCrashLooping
expr: increase(kube_pod_container_status_restarts_total{namespace="openclaw",container="openclaw"}[1h]) > 3
labels:
severity: warning
annotations:
summary: "OpenClaw is crash-looping ({{ $value }} restarts/hour)"
- alert: OpenClawOOM
expr: kube_pod_container_status_last_terminated_reason{namespace="openclaw",reason="OOMKilled"} == 1
labels:
severity: warning
annotations:
summary: "OpenClaw was OOM killed β increase memory limits"
- alert: OpenClawHighMemory
expr: |
container_memory_working_set_bytes{namespace="openclaw",container="openclaw"} /
container_spec_memory_limit_bytes{namespace="openclaw",container="openclaw"} > 0.85
for: 10m
labels:
severity: warning
annotations:
summary: "OpenClaw memory usage above 85%"
- alert: OpenClawPVCFull
expr: |
kubelet_volume_stats_used_bytes{namespace="openclaw",persistentvolumeclaim=~"openclaw.*"} /
kubelet_volume_stats_capacity_bytes > 0.9
labels:
severity: critical
annotations:
summary: "OpenClaw PVC 90% full"Grafana Dashboard
{
"title": "OpenClaw Gateway",
"panels": [
{
"title": "Gateway Health",
"type": "stat",
"targets": [{"expr": "probe_success{job='probe/openclaw/openclaw-http'}"}]
},
{
"title": "Pod Restarts (24h)",
"type": "stat",
"targets": [{"expr": "increase(kube_pod_container_status_restarts_total{namespace='openclaw'}[24h])"}]
},
{
"title": "Memory Usage",
"type": "timeseries",
"targets": [{"expr": "container_memory_working_set_bytes{namespace='openclaw',container='openclaw'}"}]
},
{
"title": "CPU Usage",
"type": "timeseries",
"targets": [{"expr": "rate(container_cpu_usage_seconds_total{namespace='openclaw',container='openclaw'}[5m])"}]
},
{
"title": "PVC Usage",
"type": "gauge",
"targets": [{"expr": "kubelet_volume_stats_used_bytes{namespace='openclaw'} / kubelet_volume_stats_capacity_bytes"}]
}
]
}Common Issues
Issue 1: Blackbox exporter not reaching OpenClaw
# Verify service DNS resolution
kubectl exec -n monitoring deploy/blackbox-exporter -- \
wget -qO- http://openclaw.openclaw.svc.cluster.local:80/
# Check NetworkPolicy isn't blocking cross-namespace trafficBest Practices
- Alert on downtime, not just restarts β Use blackbox-exporter HTTP probes
- Track PVC usage β Session data grows over time; alert before itβs full
- Monitor memory trends β Right-size limits based on actual usage
- Set up PagerDuty/Slack alerts β Critical alerts should reach you immediately
- Dashboard rotation β Include OpenClaw panel in your NOC dashboard
Key Takeaways
- Blackbox-exporter provides HTTP health monitoring for OpenClaw
- kube-state-metrics tracks pod restarts, OOM kills, and lifecycle events
- cAdvisor provides CPU and memory usage for right-sizing
- PVC monitoring prevents session data from filling up storage
- Alerting ensures you know when your AI assistant goes offline

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
