πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event

πŸ“Š Observability

Monitor everything: Prometheus, Grafana dashboards, EFK logging, distributed tracing with Jaeger, health probes, custom metrics, and GPU monitoring for AI workloads.

28 recipes 🟒 4 beginner 🟑 19 intermediate πŸ”΄ 5 advanced
intermediate ⏱ 15 minutes

Kubernetes Alerting Best Practices

Design effective Kubernetes alerts that reduce noise and catch real issues. Covers alert severity tiers, golden signals, runbook links, and alert fatigue prevention.

alertingprometheusalertmanagersre
beginner ⏱ 15 minutes

Kubernetes Cost Monitoring with Kubecost

Monitor and optimize Kubernetes costs with Kubecost. Track per-namespace, per-deployment, and per-label spend with cloud billing integration and savings recommendations.

kubecostcost-monitoringfinopsoptimization
intermediate ⏱ 15 minutes

OpenTelemetry on Kubernetes: Traces, Metrics, Logs

Deploy OpenTelemetry Collector on Kubernetes for unified observability. Collect traces, metrics, and logs with auto-instrumentation and export to any backend.

opentelemetryoteltracingmetrics
advanced ⏱ 15 minutes

Kubernetes Service Mesh: Istio vs Linkerd vs Cilium

Compare Kubernetes service meshes: Istio, Linkerd, and Cilium. Covers mTLS, traffic management, observability, performance overhead, and when you need a mesh.

service-meshistiolinkerdcilium
intermediate ⏱ 15 minutes

Kubernetes Logging with ELK Stack

Deploy centralized logging for Kubernetes with Elasticsearch, Fluentd, and Kibana. Covers log collection, parsing, indexing, and retention policies.

loggingelasticsearchfluentdkibana
intermediate ⏱ 15 minutes

Kubernetes Monitoring with Prometheus and Grafana

Set up Kubernetes monitoring with Prometheus and Grafana. Covers kube-prometheus-stack, custom dashboards, alerting rules, and key metrics to monitor.

monitoringprometheusgrafanaalerting
advanced ⏱ 15 minutes

OpenTelemetry Complete Setup on Kubernetes

Deploy OpenTelemetry Collector, auto-instrumentation, and exporters on Kubernetes. Unified traces, metrics, and logs pipeline to Jaeger, Prometheus, and Loki.

opentelemetryoteltracingmetrics
beginner ⏱ 15 minutes

OpenClaw Health Probes on Kubernetes

Configure liveness and readiness probes for OpenClaw on Kubernetes. Custom Node.js health checks against /healthz and /readyz endpoints with proper timing.

openclawhealth-probeslivenessreadiness
intermediate ⏱ 20 minutes

Enable User Workload Monitoring OpenShift

Enable user workload monitoring on OpenShift. Deploy ServiceMonitor, PodMonitor, alerting rules, and Grafana dashboards.

openshiftmonitoringprometheusservicemonitor
intermediate ⏱ 15 minutes

Per-Tenant GPU Monitoring and Chargeback

Build per-tenant GPU monitoring dashboards with queue time, utilization, thermal metrics, and GPU-hour chargeback on Kubernetes.

monitoringgpuchargebackprometheus
intermediate ⏱ 15 minutes

GPU Tenant SLO Observability

Define and monitor GPU tenant SLOs for queue time, inference latency, GPU utilization, and job completion rate with Prometheus alerting.

slogpuobservabilityprometheus
intermediate ⏱ 15 minutes

OpenClaw Logging with EFK Stack

Collect and analyze OpenClaw agent logs using Elasticsearch, Fluent Bit, and Kibana (EFK stack) for debugging and audit trails.

openclawloggingelasticsearchfluent-bit
intermediate ⏱ 20 minutes

Monitor OpenClaw with Prometheus and Grafana on Kubernetes

Set up monitoring for OpenClaw AI gateway on Kubernetes with Prometheus metrics, Grafana dashboards, and alerting for uptime, message throughput, and.

openclawprometheusgrafanamonitoring
intermediate ⏱ 30 minutes

Monitor NCCL Benchmark Runs with Prometheus and Grafana

Track NCCL benchmark outcomes and GPU telemetry over time with Prometheus and Grafana dashboards to detect communication regressions early.

ncclprometheusgrafanaobservability
intermediate ⏱ 20 minutes

How to Set Up Node Problem Detector

Detect and report node-level issues automatically with Node Problem Detector. Learn to identify kernel problems, hardware failures, and container.

node-problem-detectorobservabilitymonitoringtroubleshooting
intermediate ⏱ 15 minutes

How to Set Up Alertmanager for Prometheus

Configure Alertmanager to route and manage Prometheus alerts. Set up notification channels including Slack, PagerDuty, and email with routing rules.

alertmanagerprometheusalertsnotifications
beginner ⏱ 15 minutes

How to Set Up Container Logging

Implement effective logging strategies for Kubernetes containers. Configure log collection, aggregation, and analysis with various logging patterns.

loggingobservabilityfluentdelasticsearch
intermediate ⏱ 15 minutes

How to Implement Container Logging Patterns

Configure logging for Kubernetes applications. Implement sidecar logging, log aggregation, and structured logging best practices.

loggingobservabilitysidecarfluentd
advanced ⏱ 15 minutes

How to Implement Distributed Tracing with Jaeger

Deploy Jaeger for distributed tracing in Kubernetes. Learn to instrument applications, trace requests across services, and identify performance.

tracingjaegeropentelemetryobservability
intermediate ⏱ 15 minutes

How to Monitor Kubernetes with Grafana Dashboards

Create comprehensive Grafana dashboards for Kubernetes monitoring. Learn to visualize cluster, node, pod, and application metrics effectively.

grafanamonitoringdashboardsprometheus
intermediate ⏱ 15 minutes

Jaeger Distributed Tracing on Kubernetes

Deploy Jaeger for distributed tracing in Kubernetes. Trace requests across microservices to identify latency issues and debug complex systems.

jaegertracingobservabilityopentelemetry
beginner ⏱ 15 minutes

How to Use Kubernetes Events for Monitoring

Monitor cluster activity through Kubernetes events. Capture, filter, and alert on events for troubleshooting and operational visibility.

eventsmonitoringtroubleshootingobservability
advanced ⏱ 15 minutes

How to Set Up Centralized Logging with EFK Stack

Deploy Elasticsearch, Fluentd, and Kibana for centralized Kubernetes logging. Learn to collect, parse, and visualize container logs at scale.

loggingelasticsearchfluentdkibana
advanced ⏱ 15 minutes

How to Collect Metrics with OpenTelemetry Collector

Deploy OpenTelemetry Collector for unified metrics, traces, and logs collection in Kubernetes. Learn pipelines, processors, and exporters configuration.

opentelemetryotelmetricsobservability
intermediate ⏱ 15 minutes

How to Monitor Kubernetes with Prometheus

Set up Prometheus monitoring for Kubernetes clusters. Configure scraping, alerting rules, and visualize metrics with Grafana dashboards.

prometheusmonitoringmetricsgrafana
intermediate ⏱ 15 minutes

How to Set Up Prometheus Monitoring

Deploy Prometheus for Kubernetes monitoring. Collect metrics from nodes, pods, and applications with ServiceMonitors and alerting rules.

prometheusmonitoringmetricsalerting
intermediate ⏱ 30 minutes

How to Configure Alertmanager for Kubernetes Alerts

Set up Alertmanager to route, group, and deliver Kubernetes alerts. Learn to configure Slack, PagerDuty, and email notifications.

alertmanagermonitoringalertsnotifications
intermediate ⏱ 35 minutes

How to Set Up Prometheus Monitoring for Applications

Learn to instrument your Kubernetes applications with Prometheus metrics. Complete guide to ServiceMonitors, scraping configuration, and custom metrics.

prometheusmonitoringmetricsobservability
Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens