Monitor and observe your clusters with Prometheus, Grafana, logging pipelines, distributed tracing, and alerting configurations.
19 recipes available
Implement effective logging strategies for Kubernetes containers. Configure log collection, aggregation, and analysis with various logging patterns.
Monitor cluster activity through Kubernetes events. Capture, filter, and alert on events for troubleshooting and operational visibility.
Build per-tenant GPU monitoring dashboards with queue time, utilization, thermal metrics, and GPU-hour chargeback on Kubernetes.
Define and monitor GPU tenant SLOs for queue time, inference latency, GPU utilization, and job completion rate with Prometheus alerting.
Collect and analyze OpenClaw agent logs using Elasticsearch, Fluent Bit, and Kibana for debugging and audit trails.
Set up monitoring for OpenClaw AI gateway on Kubernetes with Prometheus metrics, Grafana dashboards, and alerting for uptime, message throughput, and.
Track NCCL benchmark outcomes and GPU telemetry over time with Prometheus and Grafana dashboards to detect communication regressions early.
Detect and report node-level issues automatically with Node Problem Detector. Learn to identify kernel problems, hardware failures, and container.
Configure Alertmanager to route and manage Prometheus alerts. Set up notification channels including Slack, PagerDuty, and email with routing rules.
Configure logging for Kubernetes applications. Implement sidecar logging, log aggregation, and structured logging best practices.
Create comprehensive Grafana dashboards for Kubernetes monitoring. Learn to visualize cluster, node, pod, and application metrics effectively.
Deploy Jaeger for distributed tracing in Kubernetes. Trace requests across microservices to identify latency issues and debug complex systems.
Deploy Prometheus for Kubernetes monitoring. Collect metrics from nodes, pods, and applications with ServiceMonitors and alerting rules.
Set up Prometheus monitoring for Kubernetes clusters. Configure scraping, alerting rules, and visualize metrics with Grafana dashboards.
Set up Alertmanager to route, group, and deliver Kubernetes alerts. Learn to configure Slack, PagerDuty, and email notifications.
Learn to instrument your Kubernetes applications with Prometheus metrics. Complete guide to ServiceMonitors, scraping configuration, and custom metrics.
Deploy Jaeger for distributed tracing in Kubernetes. Learn to instrument applications, trace requests across services, and identify performance.
Deploy Elasticsearch, Fluentd, and Kibana for centralized Kubernetes logging. Learn to collect, parse, and visualize container logs at scale.
Deploy OpenTelemetry Collector for unified metrics, traces, and logs collection in Kubernetes. Learn pipelines, processors, and exporters configuration.
Our book includes an entire chapter dedicated to observability with dozens more examples.
📖 Explore All Chapters