πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Observability intermediate ⏱ 20 minutes K8s 1.28+

Monitor OpenClaw with Prometheus and Grafana

Set up monitoring for OpenClaw AI gateway on Kubernetes with Prometheus metrics, Grafana dashboards, and alerting for uptime, message throughput, and.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: OpenClaw exposes a Control UI on port 18789. Monitor its health with blackbox-exporter HTTP probes, pod metrics via kube-state-metrics, and container resource usage via cAdvisor. Set up alerts for pod restarts, OOM kills, and service downtime.

Key concept: OpenClaw is a single-process gateway β€” monitoring focuses on availability, resource usage, and restart frequency rather than request throughput.

Gotcha: OpenClaw doesn’t expose a /metrics Prometheus endpoint natively. Use blackbox-exporter for HTTP health checks and kube-state-metrics for pod-level metrics.

The Problem

  • No visibility into whether the AI assistant is online and responding
  • Pod crashes go unnoticed until users complain
  • No resource usage trending to right-size the deployment
  • Channel disconnections (WhatsApp session expiry) aren’t detected

The Solution

Combine Kubernetes-native metrics (kube-state-metrics, cAdvisor) with HTTP health probes to build a comprehensive monitoring stack.

Monitoring Setup

# openclaw-monitoring.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: openclaw-health
  namespace: openclaw
spec:
  selector:
    matchLabels:
      app: openclaw
  podMetricsEndpoints: []
---
# Blackbox exporter probe for HTTP health
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: openclaw-http
  namespace: openclaw
spec:
  interval: 30s
  module: http_2xx
  prober:
    url: blackbox-exporter.monitoring:9115
  targets:
    staticConfig:
      static:
        - openclaw.openclaw.svc.cluster.local:80
---
# Alerting rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: openclaw-alerts
  namespace: openclaw
spec:
  groups:
    - name: openclaw.rules
      rules:
        - alert: OpenClawDown
          expr: probe_success{job="probe/openclaw/openclaw-http"} == 0
          for: 3m
          labels:
            severity: critical
          annotations:
            summary: "OpenClaw gateway is unreachable"
            description: "HTTP health check failed for 3+ minutes"
        
        - alert: OpenClawCrashLooping
          expr: increase(kube_pod_container_status_restarts_total{namespace="openclaw",container="openclaw"}[1h]) > 3
          labels:
            severity: warning
          annotations:
            summary: "OpenClaw is crash-looping ({{ $value }} restarts/hour)"
        
        - alert: OpenClawOOM
          expr: kube_pod_container_status_last_terminated_reason{namespace="openclaw",reason="OOMKilled"} == 1
          labels:
            severity: warning
          annotations:
            summary: "OpenClaw was OOM killed β€” increase memory limits"
        
        - alert: OpenClawHighMemory
          expr: |
            container_memory_working_set_bytes{namespace="openclaw",container="openclaw"} /
            container_spec_memory_limit_bytes{namespace="openclaw",container="openclaw"} > 0.85
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "OpenClaw memory usage above 85%"
        
        - alert: OpenClawPVCFull
          expr: |
            kubelet_volume_stats_used_bytes{namespace="openclaw",persistentvolumeclaim=~"openclaw.*"} /
            kubelet_volume_stats_capacity_bytes > 0.9
          labels:
            severity: critical
          annotations:
            summary: "OpenClaw PVC 90% full"

Grafana Dashboard

{
  "title": "OpenClaw Gateway",
  "panels": [
    {
      "title": "Gateway Health",
      "type": "stat",
      "targets": [{"expr": "probe_success{job='probe/openclaw/openclaw-http'}"}]
    },
    {
      "title": "Pod Restarts (24h)",
      "type": "stat",
      "targets": [{"expr": "increase(kube_pod_container_status_restarts_total{namespace='openclaw'}[24h])"}]
    },
    {
      "title": "Memory Usage",
      "type": "timeseries",
      "targets": [{"expr": "container_memory_working_set_bytes{namespace='openclaw',container='openclaw'}"}]
    },
    {
      "title": "CPU Usage",
      "type": "timeseries",
      "targets": [{"expr": "rate(container_cpu_usage_seconds_total{namespace='openclaw',container='openclaw'}[5m])"}]
    },
    {
      "title": "PVC Usage",
      "type": "gauge",
      "targets": [{"expr": "kubelet_volume_stats_used_bytes{namespace='openclaw'} / kubelet_volume_stats_capacity_bytes"}]
    }
  ]
}

Common Issues

Issue 1: Blackbox exporter not reaching OpenClaw

# Verify service DNS resolution
kubectl exec -n monitoring deploy/blackbox-exporter -- \
  wget -qO- http://openclaw.openclaw.svc.cluster.local:80/

# Check NetworkPolicy isn't blocking cross-namespace traffic

Best Practices

  1. Alert on downtime, not just restarts β€” Use blackbox-exporter HTTP probes
  2. Track PVC usage β€” Session data grows over time; alert before it’s full
  3. Monitor memory trends β€” Right-size limits based on actual usage
  4. Set up PagerDuty/Slack alerts β€” Critical alerts should reach you immediately
  5. Dashboard rotation β€” Include OpenClaw panel in your NOC dashboard

Key Takeaways

  • Blackbox-exporter provides HTTP health monitoring for OpenClaw
  • kube-state-metrics tracks pod restarts, OOM kills, and lifecycle events
  • cAdvisor provides CPU and memory usage for right-sizing
  • PVC monitoring prevents session data from filling up storage
  • Alerting ensures you know when your AI assistant goes offline
#openclaw #prometheus #grafana #monitoring #alerting #observability #dashboards
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens