⚙️ Configuration
Configure K8s right: ConfigMaps, Secrets, resource limits, node management, MachineConfigPools, GPU Operator, NicClusterPolicy, and DOCA driver builds.
kubectl Cheat Sheet: Essential Commands
Complete kubectl cheat sheet with essential commands for pods, deployments, services, logs, debugging, and cluster management. Copy-paste ready examples.
Kubernetes Node and Pod Affinity Guide
Configure node affinity, pod affinity, and anti-affinity rules for advanced Kubernetes scheduling. Control pod placement across zones, nodes, and topologies.
Kubernetes Annotations Complete Guide
Use Kubernetes annotations for metadata, automation triggers, and controller configuration. Covers common annotation patterns, ingress annotations, and Helm labels.
Kubernetes Backup and Restore with Velero
Backup and restore Kubernetes clusters with Velero. Covers namespace backups, scheduled backups, disaster recovery, and migration between clusters.
Kubernetes Cluster Upgrade Step-by-Step
Upgrade Kubernetes clusters safely with kubeadm. Covers pre-flight checks, control plane upgrade, worker node drain, and rollback procedures.
Kubernetes ConfigMap Complete Guide
Create and use ConfigMaps in Kubernetes for application configuration. Mount as files, inject as environment variables, and hot-reload without restarting pods.
Kubernetes Environment Variables Guide
Set environment variables in Kubernetes pods from literals, ConfigMaps, Secrets, and the Downward API. Covers variable ordering, references, and best practices.
Kubernetes Labels and Selectors Guide
Master Kubernetes labels and selectors for organizing and querying resources. Covers label conventions, equality selectors, set-based selectors, and field selectors.
Kubernetes Pod Lifecycle and States Explained
Understand the Kubernetes pod lifecycle from Pending to Terminated. Covers pod phases, container states, restart policies, graceful shutdown, and preStop hooks.
kubectl Port-Forward: Access Pods and Services
Use kubectl port-forward to access Kubernetes pods, services, and deployments from your local machine. Debug, test, and access internal services securely.
Kubernetes Resource Requests and Limits Guide
Configure CPU and memory requests and limits in Kubernetes. Understand QoS classes, OOMKilled, CPU throttling, and right-sizing with VPA recommendations.
Kubernetes Taints and Tolerations Guide
Use Kubernetes taints and tolerations to control pod scheduling. Dedicate nodes for GPU workloads, isolate teams, and prevent scheduling on specific nodes.
Fix ConfigMap Changes Not Applied to Pods
Debug ConfigMap updates not reflected in running pods. Covers volume mount propagation delays, env var immutability, and sidecar-based reload strategies.
Kubernetes API Deprecation Migration Guide
Migrate deprecated Kubernetes APIs before cluster upgrades. Detect deprecated resources with pluto, kubent, and kubectl convert.
Kubernetes Disaster Recovery Planning
Build a Kubernetes disaster recovery plan with etcd backups, Velero, cross-region replication, and RTO/RPO targets for production clusters.
Kubernetes etcd Operations and Maintenance
Manage etcd for Kubernetes: backup, restore, compaction, defragmentation, member management, and disaster recovery procedures.
Kubernetes Init Containers Complete Guide
Use init containers for database migrations, config loading, dependency waiting, and secret fetching. Patterns for sequential initialization in Kubernetes pods.
Kubernetes Namespace Best Practices
Design and manage Kubernetes namespaces effectively. Covers naming conventions, resource quotas, RBAC isolation, network policies, and multi-tenancy patterns.
Kubernetes Sidecar Container Patterns
Implement sidecar containers for logging, proxying, config reload, and security. Built-in sidecar support in Kubernetes 1.28+ with restartPolicy Always.
KubeCon EU 2026 Book Giveaway Recap
Recap of the Kubernetes Recipes book giveaway at KubeCon EU 2026 Amsterdam. Photos from the signing sessions, community highlights, and how to get your copy.
Inspect MachineConfig Annotations on Nodes
Read and interpret MachineConfig annotations on OpenShift nodes. Check desired vs current config, node state, and rendered config hashes to diagnose MCP issues.
Set Kernel Parameters via MachineConfig
Tune kernel sysctl parameters on OpenShift nodes using MachineConfig. Set networking, memory, and performance sysctls on RHCOS.
Configure NTP Chrony via MachineConfig
Set custom NTP servers on OpenShift RHCOS nodes using MachineConfig. Fix time drift, configure chrony, and verify time synchronization across your cluster.
Configure Container Registries via MachineConfig
Set up mirror registries and blocked registries on OpenShift nodes using MachineConfig to control CRI-O image pull on RHCOS.
Configure MCP maxUnavailable for Rollouts
Control how many nodes the MachineConfig Operator updates simultaneously. Set maxUnavailable for faster rollouts or safer one-at-a-time updates in production.
Pause and Unpause MCP Rollouts
Temporarily pause MachineConfigPool rollouts to batch multiple MachineConfig changes or coordinate with maintenance windows. Unpause to resume node updates.
Automate MCP Updates with Drain Script
Bash script to automate OpenShift MachineConfigPool updates when drains are blocked by PDB violations. Auto-detects blockers, scales down, drains, and restores.
Separate Worker and Infra MachineConfigPools
Create dedicated MachineConfigPools for infrastructure and GPU nodes. Isolate MCP rollout blast radius and control update order for different node types.
Use oc adm drain Dry-Run for Diagnostics
Preview node drain impact without evicting pods. Identify PDB violations, unmanaged pods, and local storage blockers before maintenance.
OpenClaw Multi-Model Provider Setup on Kubernetes
Configure OpenClaw with multiple AI providers on Kubernetes. Anthropic, OpenAI, Gemini, OpenRouter with fallback chains and cost control.
OpenClaw Node Pairing for IoT and Edge Devices
Pair phones, Raspberry Pi, and edge devices with OpenClaw on Kubernetes. Camera, location, screen control, and remote command execution.
Cordon, Drain, and Uncordon Nodes
Safely remove workloads from OpenShift and Kubernetes nodes for maintenance. Cordon to prevent scheduling, drain to evict pods, uncordon to restore.
Configure PDBs for OpenShift Routers
Set PodDisruptionBudgets for OpenShift IngressController routers. Balance availability during maintenance with node drain ability.
Restore Scaled Deployments After Node Drain
Restore deployments scaled down for maintenance. Verify node health, check pod scheduling, and confirm service availability.
Scale Deployments to Unblock Node Drains
Safely scale down deployments that block node drains due to PDB violations. Record original replicas, scale to zero, drain, then restore after the node returns.
ITMS External-to-External Registry Mirroring
Configure OpenShift ImageTagMirrorSet to map external registries to your private registry. Mirror Docker Hub, GHCR, Quay.io, and NVIDIA NGC.
How ITMS Updates registries.conf via MachineConfig
How ITMS and IDMS update /etc/containers/registries.conf on immutable CoreOS nodes via MCO and MachineConfig. Full chain deep-dive.
400 Recipes Milestone: What We Built and What's Next
Kubernetes Recipes reaches 400 articles. Explore new AI/GPU infrastructure, NVIDIA networking, ArgoCD GitOps, OpenShift, and RHACS security recipes.
KubeCon EU 2026 Book Signing Events
Join Luca Berton at two KubeCon Amsterdam events: Signal Overflow at Booking.com HQ (Mon 23 Mar) and book signing at vCluster booth #521 (Tue 24 Mar).
ClusterPolicy MOFED Upgrade Strategy
Configure safe MOFED driver upgrade policies in the NVIDIA GPU Operator ClusterPolicy with rolling updates, node draining, and rollback procedures.
NVIDIA DOCA Driver Container in Kubernetes
Deploy and configure NVIDIA DOCA Driver containers via NicClusterPolicy for RDMA, NFS-RDMA, and precompiled driver builds.
DOCA Driver on OpenShift with DTK
Build and deploy precompiled NVIDIA DOCA Driver containers on OpenShift using DriverToolKit, MachineConfig, and upgrade lifecycle.
GPU Operator ClusterPolicy Complete Reference
Complete reference for the NVIDIA GPU Operator ClusterPolicy CRD covering driver, toolkit, device plugin, MOFED, GDS, MIG, and DCGM configuration options.
NVIDIA GPU Operator MOFED Driver Configuration
Configure the NVIDIA GPU Operator to deploy Mellanox OFED drivers for high-performance RDMA networking on Kubernetes GPU nodes with InfiniBand and RoCE support.
GPU Cluster Upgrade Version Matrix
Maintain a version compatibility matrix for GPU Operator, Network Operator, drivers, firmware, CUDA, and OpenShift for safe upgrades.
MOFED and DOCA Driver Building for OpenShift
Build NVIDIA MOFED and DOCA drivers for OpenShift using DriverToolKit, Buildah, and MachineConfig for RDMA and GPU networking.
NicClusterPolicy MOFED Affinity and Node Selection
Configure NicClusterPolicy node selectors and affinity rules to deploy MOFED drivers only on RDMA-capable nodes in Kubernetes clusters.
Open Kernel Modules and DMA-BUF for GPUs
Migrate from proprietary NVIDIA kernel modules and nvidia-peermem to open kernel modules with DMA-BUF for safer GPU upgrades.
OpenClaw Cron Jobs and Heartbeats on Kubernetes
Configure OpenClaw's built-in cron scheduling and heartbeat system on Kubernetes for proactive notifications, periodic checks, and automated background.
Manage OpenClaw Skills on Kubernetes
Deploy and manage OpenClaw agent skills (tools, automations, integrations) on Kubernetes using ConfigMaps, PVCs, and git-sync for dynamic capability.
GitOps for OpenClaw Workspaces on Kubernetes
Manage OpenClaw agent workspaces (SOUL.md, skills, memory) with GitOps using Flux or ArgoCD, enabling version-controlled AI persona management on.
Create Custom CatalogSources for OLM Operators
Configure CatalogSource in OpenShift to serve custom operator catalogs from private registries or air-gapped environments.
OpenShift Lifecycle and Version Support
Understand OpenShift Container Platform version lifecycle, support phases, EUS releases, and upgrade planning for production clusters.
OpenShift Project Request Template for Pull Secrets
Configure an OpenShift Project Request Template so every new namespace automatically gets a ServiceAccount with imagePullSecrets for your private Quay registry.
PriorityClasses for GPU Workloads
Configure Kubernetes PriorityClasses for GPU workloads with training, serving, batch, and interactive tiers and preemption policies.
ResourceQuota and LimitRange for GPUs
Configure ResourceQuota and LimitRange for GPU workloads with per-tenant caps on GPU, CPU, memory, and object counts in Kubernetes.
RHCOS for OpenShift Nodes
Understand and manage Red Hat Enterprise Linux CoreOS (RHCOS) for OpenShift nodes including MachineConfig, ignition, OS updates, and node customization.
Configure ClusterPolicy kernelModuleType for GPU Operator
Understand and configure the driver.kernelModuleType field in the NVIDIA GPU Operator ClusterPolicy to choose between auto, open, and proprietary kernel.
Switch to Open NVIDIA Kernel Modules on OpenShift
Step-by-step guide to migrate the NVIDIA GPU Operator from proprietary to open kernel modules on OpenShift, enabling DMA-BUF and GPUDirect Storage support.
Tune NCCL Environment Variables for RDMA and Ethernet
Apply safe NCCL environment variable profiles for RDMA-capable and Ethernet-only GPU clusters to maximize collective communication throughput.
Crossplane for Cloud Infrastructure Management
Use Crossplane to provision and manage cloud infrastructure resources like databases, storage, and networking using Kubernetes-native APIs and GitOps.
Multi-Node NVLink with ComputeDomains
Configure ComputeDomains for robust and secure Multi-Node NVLink (MNNVL) workloads on NVIDIA GB200 and similar systems using DRA
Dynamic Resource Allocation for GPUs with NVIDIA DRA Driver
Learn to use Kubernetes Dynamic Resource Allocation (DRA) for flexible GPU allocation, sharing, and configuration with the NVIDIA DRA Driver
MIG GPU Partitioning with DRA
Dynamically partition NVIDIA A100 and H100 GPUs using Multi-Instance GPU (MIG) technology with Dynamic Resource Allocation for flexible workload isolation
Mixed Accelerator Workloads with DRA
Orchestrate heterogeneous accelerator workloads combining GPUs, TPUs, FPGAs, and custom AI chips using Dynamic Resource Allocation
TPU Allocation with Dynamic Resource Allocation
Configure Google Cloud TPUs in Kubernetes using DRA for flexible allocation, multi-slice workloads, and optimized machine learning training
Kubernetes API Aggregation Layer
Extend the Kubernetes API with custom API servers using the aggregation layer to add new resource types and functionality without modifying core components
How to Upgrade Kubernetes Clusters Safely
Perform Kubernetes cluster upgrades with zero downtime. Learn upgrade strategies, pre-flight checks, rollback procedures, and best practices for.
Kubernetes Scheduler Configuration and Tuning
Customize the Kubernetes scheduler with scheduling profiles, plugins, and advanced placement strategies for optimal pod placement and resource utilization
How to Manage Kubernetes API Versions and Deprecations
Handle Kubernetes API version changes and deprecations. Migrate resources to stable APIs and ensure cluster upgrade compatibility.
How to Create Custom Resource Definitions (CRDs)
Extend Kubernetes API with Custom Resource Definitions. Define custom objects, configure validation schemas, and manage CRD lifecycle.
How to Use the Downward API
Expose pod and container metadata to applications using the Downward API. Access labels, annotations, resource limits, and pod information from within.
How to Use Downward API for Pod Metadata
Expose pod and container metadata to applications using the Downward API. Access labels, annotations, resource limits, and node information from within.
How to Configure Environment Variables and ConfigMaps
Manage application configuration with environment variables and ConfigMaps. Learn injection methods, mounting as files, and dynamic configuration updates.
How to Configure Image Pull Secrets
Pull container images from private registries using image pull secrets. Configure authentication for Docker Hub, GCR, ECR, ACR, and private registries.
How to Manage Kubernetes Contexts and Clusters
Switch between multiple clusters efficiently. Configure kubeconfig, manage contexts, and set up secure multi-cluster access.
How to Optimize Kubernetes Costs
Reduce cloud costs in Kubernetes clusters. Right-size resources, use spot instances, implement autoscaling, and monitor spending effectively.
How to Use Kubernetes Finalizers
Manage resource cleanup with Kubernetes finalizers. Implement custom cleanup logic and understand how finalizers prevent premature resource deletion.
How to Use Kubernetes Jobs and CronJobs
Run batch workloads and scheduled tasks with Jobs and CronJobs. Configure retries, parallelism, and completion tracking for reliable task execution.
How to Use Labels and Annotations Effectively
Organize and manage Kubernetes resources with labels and annotations. Implement labeling strategies for selection, filtering, and metadata.
How to Use Kubernetes Lease Objects
Implement leader election and distributed coordination with Kubernetes Lease objects. Build highly available controllers and prevent split-brain scenarios.
How to Use Kustomize for Configuration Management
Manage Kubernetes configurations with Kustomize overlays. Customize base manifests for different environments without template duplication.
How to Implement Kubernetes Taints and Tolerations
Control pod scheduling with taints and tolerations. Dedicate nodes for specific workloads, handle node conditions, and implement scheduling constraints.
How to Use Pod Presets and Mutations
Automatically inject configurations into pods using admission controllers. Configure environment variables, volumes, and annotations at deployment time.
How to Configure Pod Resource Management
Set CPU and memory requests and limits effectively. Understand QoS classes, resource quotas, and optimize container resource allocation.
How to Configure Resource Limits and Requests
Set CPU and memory requests and limits for containers. Understand QoS classes, resource quotas, and best practices for right-sizing workloads.
How to Configure Resource Quotas per Namespace
Implement resource quotas to limit CPU, memory, and object counts per namespace. Ensure fair resource allocation across teams and environments.
How to Configure Resource Quotas
Limit resource consumption per namespace with ResourceQuotas. Control CPU, memory, storage, and object counts to ensure fair cluster sharing.
How to Manage ConfigMaps and Secrets Effectively
Master Kubernetes ConfigMaps and Secrets for application configuration. Learn creation methods, mounting strategies, and security best practices.
How to Manage Kubernetes Namespaces Effectively
Master Kubernetes namespace organization for multi-team environments. Learn resource quotas, network policies, and RBAC per namespace.
How to Set Resource Requests and Limits Properly
Master Kubernetes resource management with proper CPU and memory requests and limits. Avoid OOMKills, throttling, and resource contention.