OpenClaw Health Probes on Kubernetes
Configure liveness and readiness probes for OpenClaw on Kubernetes. Custom Node.js health checks against /healthz and /readyz endpoints with proper timing.
π‘ Quick Answer: OpenClaw exposes
/healthz(liveness) and/readyz(readiness) on loopback. Since the gateway binds to127.0.0.1, useexecprobes with inline Node.js HTTP checks instead ofhttpGetprobes.
The Problem
Standard Kubernetes httpGet probes donβt work when the application binds to loopback β the kubelet sends probes from the node network, not from inside the pod. OpenClawβs default bind is loopback for security, so you need exec-based probes that run inside the container.
The Solution
Exec-Based Health Probes
The official deployment uses inline Node.js to check health endpoints:
containers:
- name: gateway
livenessProbe:
exec:
command:
- node
- -e
- >-
require('http').get('http://127.0.0.1:18789/healthz',
r => process.exit(r.statusCode < 400 ? 0 : 1))
.on('error', () => process.exit(1))
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
readinessProbe:
exec:
command:
- node
- -e
- >-
require('http').get('http://127.0.0.1:18789/readyz',
r => process.exit(r.statusCode < 400 ? 0 : 1))
.on('error', () => process.exit(1))
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5Why These Timing Values
| Parameter | Liveness | Readiness | Reason |
|---|---|---|---|
initialDelaySeconds | 60 | 15 | Gateway needs ~45s to fully boot |
periodSeconds | 30 | 10 | Liveness is coarse; readiness is responsive |
timeoutSeconds | 10 | 5 | Allow for busy gateway under load |
Probe Endpoints
graph LR
A[kubelet] -->|exec| B[node -e 'http.get...']
B -->|HTTP GET| C[127.0.0.1:18789/healthz]
B -->|HTTP GET| D[127.0.0.1:18789/readyz]
C -->|200 OK| E[Pod is alive]
C -->|5xx/timeout| F[Kill + restart pod]
D -->|200 OK| G[Route traffic to pod]
D -->|5xx/timeout| H[Remove from Service endpoints]/healthzβ Is the process alive? If this fails, kubelet kills the pod/readyzβ Is the gateway ready to accept requests? If this fails, the Service stops routing traffic
Alternative: httpGet with Non-Loopback Bind
If you bind the gateway to 0.0.0.0 (e.g., for Ingress), you can use simpler httpGet probes:
# Only works when gateway.bind is NOT loopback
livenessProbe:
httpGet:
path: /healthz
port: 18789
initialDelaySeconds: 60
periodSeconds: 30
readinessProbe:
httpGet:
path: /readyz
port: 18789
initialDelaySeconds: 15
periodSeconds: 10Startup Probe for Slow Initialization
If the gateway takes longer than 60s to start (large workspace, many skills):
startupProbe:
exec:
command:
- node
- -e
- >-
require('http').get('http://127.0.0.1:18789/healthz',
r => process.exit(r.statusCode < 400 ? 0 : 1))
.on('error', () => process.exit(1))
failureThreshold: 30
periodSeconds: 10
# Allows up to 300s (5 min) for startupCommon Issues
Liveness Probe Killing Pod During Long Agent Sessions
Liveness probes check if the gateway process is healthy, not if an agent session is responsive. If the gateway itself becomes unresponsive (memory pressure, Node.js event loop blocked):
# Check if OOM is the cause
kubectl describe pod -n openclaw -l app=openclaw | grep -A5 "Last State"
# Increase memory limits if OOM
resources:
limits:
memory: 4GiReadiness Probe Failing After Deploy
The 15s initial delay may be too short on resource-constrained clusters:
kubectl describe pod -n openclaw -l app=openclaw | grep -A10 Conditions
# If Ready=False, increase initialDelaySeconds to 30Exec Probe Overhead
Each exec probe spawns a Node.js process (~50ms). With 10s period readiness, thatβs 6 processes/minute. Negligible for single pods, but consider for high-replica deployments.
Best Practices
- Use exec probes with loopback bind β the default and most secure approach
- Switch to httpGet with non-loopback β simpler and lower overhead
- Add startup probe for large workspaces β prevents premature liveness kills during init
- Donβt set aggressive liveness timing β 30s period is fine; 5s causes false restarts
- Monitor probe metrics β Prometheus
kube_pod_container_status_restarts_total
Key Takeaways
- OpenClawβs loopback bind requires exec-based probes, not httpGet
- Liveness checks
/healthz(kill if dead), readiness checks/readyz(route if ready) - Initial delay of 60s for liveness, 15s for readiness matches gateway boot time
- Add a startup probe if initialization exceeds 60 seconds
- Switch to httpGet probes when using non-loopback bind for Ingress

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
