🎀Speaking at Red Hat Summit 2026GPUs take flight: Safety-first multi-tenant Platform Engineering with NVIDIA and OpenShift AILearn More
Observability intermediate ⏱ 20 minutes K8s 1.28+

Monitor OpenClaw with Prometheus and Grafana on Kubernetes

Set up monitoring for OpenClaw AI gateway on Kubernetes with Prometheus metrics, Grafana dashboards, and alerting for uptime, message throughput, and.

By Luca Berton β€’

πŸ’‘ Quick Answer: OpenClaw exposes a Control UI on port 18789. Monitor its health with blackbox-exporter HTTP probes, pod metrics via kube-state-metrics, and container resource usage via cAdvisor. Set up alerts for pod restarts, OOM kills, and service downtime.

Key concept: OpenClaw is a single-process gateway β€” monitoring focuses on availability, resource usage, and restart frequency rather than request throughput.

Gotcha: OpenClaw doesn’t expose a /metrics Prometheus endpoint natively. Use blackbox-exporter for HTTP health checks and kube-state-metrics for pod-level metrics.

The Problem

  • No visibility into whether the AI assistant is online and responding
  • Pod crashes go unnoticed until users complain
  • No resource usage trending to right-size the deployment
  • Channel disconnections (WhatsApp session expiry) aren’t detected

The Solution

Combine Kubernetes-native metrics (kube-state-metrics, cAdvisor) with HTTP health probes to build a comprehensive monitoring stack.

Monitoring Setup

# openclaw-monitoring.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: openclaw-health
  namespace: openclaw
spec:
  selector:
    matchLabels:
      app: openclaw
  podMetricsEndpoints: []
---
# Blackbox exporter probe for HTTP health
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: openclaw-http
  namespace: openclaw
spec:
  interval: 30s
  module: http_2xx
  prober:
    url: blackbox-exporter.monitoring:9115
  targets:
    staticConfig:
      static:
        - openclaw.openclaw.svc.cluster.local:80
---
# Alerting rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: openclaw-alerts
  namespace: openclaw
spec:
  groups:
    - name: openclaw.rules
      rules:
        - alert: OpenClawDown
          expr: probe_success{job="probe/openclaw/openclaw-http"} == 0
          for: 3m
          labels:
            severity: critical
          annotations:
            summary: "OpenClaw gateway is unreachable"
            description: "HTTP health check failed for 3+ minutes"
        
        - alert: OpenClawCrashLooping
          expr: increase(kube_pod_container_status_restarts_total{namespace="openclaw",container="openclaw"}[1h]) > 3
          labels:
            severity: warning
          annotations:
            summary: "OpenClaw is crash-looping ({{ $value }} restarts/hour)"
        
        - alert: OpenClawOOM
          expr: kube_pod_container_status_last_terminated_reason{namespace="openclaw",reason="OOMKilled"} == 1
          labels:
            severity: warning
          annotations:
            summary: "OpenClaw was OOM killed β€” increase memory limits"
        
        - alert: OpenClawHighMemory
          expr: |
            container_memory_working_set_bytes{namespace="openclaw",container="openclaw"} /
            container_spec_memory_limit_bytes{namespace="openclaw",container="openclaw"} > 0.85
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "OpenClaw memory usage above 85%"
        
        - alert: OpenClawPVCFull
          expr: |
            kubelet_volume_stats_used_bytes{namespace="openclaw",persistentvolumeclaim=~"openclaw.*"} /
            kubelet_volume_stats_capacity_bytes > 0.9
          labels:
            severity: critical
          annotations:
            summary: "OpenClaw PVC 90% full"

Grafana Dashboard

{
  "title": "OpenClaw Gateway",
  "panels": [
    {
      "title": "Gateway Health",
      "type": "stat",
      "targets": [{"expr": "probe_success{job='probe/openclaw/openclaw-http'}"}]
    },
    {
      "title": "Pod Restarts (24h)",
      "type": "stat",
      "targets": [{"expr": "increase(kube_pod_container_status_restarts_total{namespace='openclaw'}[24h])"}]
    },
    {
      "title": "Memory Usage",
      "type": "timeseries",
      "targets": [{"expr": "container_memory_working_set_bytes{namespace='openclaw',container='openclaw'}"}]
    },
    {
      "title": "CPU Usage",
      "type": "timeseries",
      "targets": [{"expr": "rate(container_cpu_usage_seconds_total{namespace='openclaw',container='openclaw'}[5m])"}]
    },
    {
      "title": "PVC Usage",
      "type": "gauge",
      "targets": [{"expr": "kubelet_volume_stats_used_bytes{namespace='openclaw'} / kubelet_volume_stats_capacity_bytes"}]
    }
  ]
}

Common Issues

Issue 1: Blackbox exporter not reaching OpenClaw

# Verify service DNS resolution
kubectl exec -n monitoring deploy/blackbox-exporter -- \
  wget -qO- http://openclaw.openclaw.svc.cluster.local:80/

# Check NetworkPolicy isn't blocking cross-namespace traffic

Best Practices

  1. Alert on downtime, not just restarts β€” Use blackbox-exporter HTTP probes
  2. Track PVC usage β€” Session data grows over time; alert before it’s full
  3. Monitor memory trends β€” Right-size limits based on actual usage
  4. Set up PagerDuty/Slack alerts β€” Critical alerts should reach you immediately
  5. Dashboard rotation β€” Include OpenClaw panel in your NOC dashboard

Key Takeaways

  • Blackbox-exporter provides HTTP health monitoring for OpenClaw
  • kube-state-metrics tracks pod restarts, OOM kills, and lifecycle events
  • cAdvisor provides CPU and memory usage for right-sizing
  • PVC monitoring prevents session data from filling up storage
  • Alerting ensures you know when your AI assistant goes offline
#openclaw #prometheus #grafana #monitoring #alerting #observability #dashboards

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens