📚Book Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) — free book giveaway!RSVP Booking.com Event

Kubernetes Recipes

1376 production-ready recipes for every K8s challenge

1376 recipes
🌐 Networking advanced

Run DOCA Bench on OpenShift with SR-IOV and Privileged SCC

Run NVIDIA DOCA Bench as a Kubernetes Job on OpenShift with SR-IOV VF allocation, privileged SCC, and huge pages to benchmark BlueField DPU from x86 pods.

⏱ 15 minutes openshiftnetworkingbenchmarking
🌐 Networking advanced

ib_write_bw RDMA Bandwidth Testing on Kubernetes GPU Nodes

Validate RDMA write bandwidth on Kubernetes GPU nodes with ib_write_bw and SR-IOV. Device selection, RoCE GID index, and ConnectX-7 400G expectations.

⏱ 15 minutes networkingrdmaperformance
🌐 Networking advanced

NVIDIA DOCA Bench for DPU Performance Testing on Kubernetes

Benchmark NVIDIA BlueField DPU accelerators in Kubernetes with DOCA Bench: throughput/latency modes, RDMA, compression offload, and multi-core scaling.

⏱ 15 minutes networkingperformancerdma
🤖 AI & GPU advanced

H200 NVL 8-GPU Topology Bandwidth Tiers for Kubernetes

Map the three bandwidth tiers of 8× H200 NVL GPU nodes—NVLink (~337 GB/s), PCIe+UPI (~50 GB/s), RoCE (~35 GB/s)—for NCCL topology-aware NUMA scheduling.

⏱ 15 minutes gpuncclperformance
💾 Storage advanced

Dell PowerScale NFS Access Zones for Kubernetes AI Storage

Configure Dell PowerScale (Isilon) access zones and SmartConnect pools for Kubernetes AI storage with per-environment NFS isolation and IP pool sizing.

⏱ 15 minutes storagenfsnetworking
🚀 Deployments intermediate

Automate Kubernetes Day-2 Operations with Ansible

Use Ansible to automate Kubernetes day-2 operations — apply manifests, roll out upgrades, and reconcile cluster state with the kubernetes.core collection.

⏱ 20 minutes ansibleautomationdeployments
🤖 AI & GPU advanced

Disable GDS and Enable IOMMU Passthrough on K8s GPUs

Disable GPUDirect Storage (GDS) when not needed and configure IOMMU passthrough mode for GPU and NIC device assignment. Kernel parameters, BIOS settings, VFIO

⏱ 15 minutes iommupassthroughgds
🤖 AI & GPU advanced

GPU Operator ClusterPolicy RDMA and GDS Configuration

Configure NVIDIA GPU Operator ClusterPolicy to disable RDMA and enable GPUDirect Storage (GDS). Control nvidia-peermem, nvidia-fs modules, driver

⏱ 15 minutes gpu-operatorrdmagds
🤖 AI & GPU advanced

GPUDirect RDMA Setup and Verification on Kubernetes

Enable and verify GPUDirect RDMA for GPU-to-NIC direct data transfer on Kubernetes. Install nvidia-peermem, configure DMA-BUF, verify RDMA paths, troubleshoot

⏱ 15 minutes gpudirectrdmanvidia
🤖 AI & GPU advanced

IOMMU Kernel Parameters for Kubernetes GPU Nodes

Configure IOMMU kernel parameters for optimal GPU and RDMA performance on Kubernetes. Compare intel_iommu, amd_iommu, and iommu settings, passthrough vs off vs

⏱ 15 minutes iommukernelgpu
🤖 AI & GPU intermediate

Kubeflow MPIJob Worker SSH Setup for GPU Training

Configure SSH daemon in Kubeflow MPIJob worker pods for multi-node GPU training. Covers SSHD setup in containers, host key generation, authorized keys from MPI

⏱ 15 minutes mpisshopenshift
🤖 AI & GPU advanced

Kubernetes Topology Manager for GPU and NUMA Alignment

Configure Kubernetes Topology Manager to align CPU, GPU, and NIC allocations on the same NUMA node. Covers policies, kubelet config, and GPU performance tuning.

⏱ 15 minutes topology-managernumagpu
🤖 AI & GPU intermediate

MPI DNS Resolution and Hostfile for Kubernetes GPU Jobs

Troubleshoot MPI hostfile DNS resolution in Kubeflow MPIJob on Kubernetes. Covers headless Service creation, subdomain configuration, DNS wait loops, FQDN

⏱ 15 minutes mpidnsnetworking
🤖 AI & GPU advanced

NCCL All-Reduce Benchmarking on Multi-Node GPUs

Run and interpret NCCL all_reduce_perf benchmarks on multi-node Kubernetes GPU clusters. Understand bus bandwidth results, expected throughput for H200 NVL

⏱ 15 minutes ncclbenchmarkingall-reduce
🤖 AI & GPU advanced

NCCL Channel Routing and Transport Path Analysis

Interpret NCCL channel logs to understand GPU communication paths on Kubernetes. Decode P2P/CUMEM, SHM/direct, NET/IB/GDRDMA transport

⏱ 15 minutes nccldebugginggpu-communication
🔧 Troubleshooting intermediate

NCCL Debug Subsystems for GPU Network Troubleshooting

Configure NCCL_DEBUG and NCCL_DEBUG_SUBSYS for targeted logging during multi-node GPU training. Covers INIT, NET, GRAPH subsystems, log

⏱ 15 minutes nccltroubleshootingobservability
🤖 AI & GPU advanced

NCCL DMABUF Enable for GPUDirect RDMA on Kubernetes

Enable NCCL DMA-BUF support for GPUDirect RDMA in Kubernetes GPU clusters. Covers NCCL_DMABUF_ENABLE=1, kernel requirements, nvidia-peermem vs dmabuf, GPU

⏱ 15 minutes ncclrdmagpu
🤖 AI & GPU advanced

NCCL GPUDirect RDMA Distance Levels and PIX vs SYS

Understand NCCL GPU Direct RDMA distance-based enablement. When PIX mode disables GDRDMA for distant GPU-HCA pairs (distance 9 > 4) and when SYS mode enables

⏱ 15 minutes ncclgpudirectrdma
🤖 AI & GPU advanced

NCCL GPUDirect RDMA Level Tuning PIX PXB PHB SYS

Tune NCCL_NET_GDR_LEVEL for optimal GPUDirect RDMA performance on Kubernetes. Compare PIX, PXB, PHB, and SYS distance thresholds with PCIe topology. Benchmark

⏱ 15 minutes ncclrdmagpu
🤖 AI & GPU advanced

NCCL IB HCA Selection and QPS Tuning for RoCE

Configure NCCL_IB_HCA, NCCL_IB_GID_INDEX, NCCL_IB_QPS_PER_CONNECTION, and NCCL_IB_SPLIT_DATA_ON_QPS for optimal RoCE performance on Kubernetes GPU clusters.

⏱ 15 minutes ncclrdmaperformance
🤖 AI & GPU advanced

NCCL Network Validation Script for OpenShift GPU Clusters

Build a comprehensive NCCL network validation script for OpenShift GPU clusters with SR-IOV. Configure NCCL_IB_GID_INDEX, NCCL_NET_GDR_LEVEL=SYS, per-rank HCA

⏱ 15 minutes ncclopenshiftsr-iov
🔧 Troubleshooting advanced

NCCL Network Validation Troubleshooting Checklist

Complete troubleshooting checklist for NCCL multi-node GPU bandwidth validation. Covers SR-IOV VF allocation, /dev/infiniband visibility, RoCE GID

⏱ 15 minutes nccltroubleshootingrdma
🤖 AI & GPU advanced

Production NCCL Network Validator for Kubeflow MPIJob

Deploy a production-ready NCCL network validation framework using Kubeflow MPIJob on OpenShift. Complete validate_network.sh script

⏱ 15 minutes ncclmpirdma
🤖 AI & GPU advanced

NCCL RoCE Validation MPIJob Complete Reference

Complete nccl-roce-validation.yaml MPIJob reference for OpenShift GPU clusters. Full launcher environment variables, OpenMPI control plane settings, NCCL

⏱ 15 minutes ncclmpiroce
🤖 AI & GPU advanced

NCCL RoCE Validation with Kubeflow MPIJob on Kubernetes

Run NCCL all_reduce_perf validation tests using Kubeflow MPIJob on GPU clusters. Configure MPI launcher and workers, NCCL environment variables, test

⏱ 15 minutes ncclmpirdma
🤖 AI & GPU intermediate

Shared Memory Transport for NCCL Intra-Node GPU

Configure NCCL shared memory (SHM) transport for intra-node GPU communication on Kubernetes. Covers /dev/shm sizing with emptyDir and NVLink/PCIe P2P paths.

⏱ 15 minutes ncclgpuperformance
🤖 AI & GPU advanced

NVIDIA GPU Topology Matrix Interpretation on Kubernetes

Read and interpret nvidia-smi topo and nvidia-device-plugin topology matrices on Kubernetes GPU nodes. Understand X, NV, SYS, NODE, PIX, PXB, PHB connection

⏱ 15 minutes nvidiagpu-topologynvidia-smi
🌐 Networking advanced

RDMA Configuration with NVIDIA Network Operator

Deploy and configure RDMA for GPU clusters using the NVIDIA Network Operator. NicClusterPolicy setup, MLNX_OFED driver container, shared and SR-IOV RDMA device

⏱ 15 minutes rdmanvidianetwork-operator
🤖 AI & GPU advanced

NVLink Bridge Architecture for GPU Kubernetes Nodes

Understand NVLink Bridge logical architecture in GPU servers for Kubernetes. Dual-socket PCIe Gen5 topology, NVL4 groups, GPU-NIC-NVMe placement, PCIe switch

⏱ 15 minutes nvlinkgpu-architecturepcie
🤖 AI & GPU advanced

OpenMPI Control Plane Separation for NCCL RDMA

Configure OpenMPI to use eth0 for MPI control traffic while NCCL uses net1 SR-IOV for data. Covers btl_tcp_if_include, pml, routed direct, plm_rsh_agent SSH

⏱ 15 minutes mpincclnetworking
🌐 Networking advanced

OpenShift SR-IOV Network with NVIDIA IPAM for GPU Fabric

Configure SriovNetwork resources on OpenShift with nv-ipam for GPU fabric IP allocation. SR-IOV Network Operator setup, Mellanox NIC resource targeting, IPAM

⏱ 15 minutes sriovopenshiftnv-ipam
🤖 AI & GPU advanced

Run:ai GPU Scheduling with Kubeflow MPIJob

Integrate Run:ai GPU scheduler with Kubeflow MPIJob for multi-node NCCL workloads. Covers Run:ai project namespaces, GPU quota annotations, pod group

⏱ 15 minutes gpuschedulingopenshift
🌐 Networking advanced

Shared RDMA Device Plugin for Kubernetes GPU Pods

Configure the RDMA shared device plugin to allow multiple pods to share RDMA-capable NICs on Kubernetes. K8s-rdma-shared-dev-plugin setup, resource

⏱ 15 minutes rdmadevice-pluginshared
🌐 Networking advanced

SR-IOV Multus Network Attachment for GPU RDMA Pods

Configure Multus CNI NetworkAttachmentDefinition for SR-IOV RDMA in Kubernetes GPU workloads. Covers k8s.v1.cni.cncf.io/networks annotation, IPAM

⏱ 15 minutes networkingsriovrdma
💾 Storage advanced

CloudNativePG PostgreSQL Operator on Kubernetes

Deploy production PostgreSQL on Kubernetes with CloudNativePG operator. Automated failover, continuous backup to S3, point-in-time recovery, connection

⏱ 15 minutes cloudnativepgpostgresqldatabase
⚙️ Configuration intermediate

Crossplane Kubernetes Infrastructure Management

Manage cloud infrastructure as Kubernetes resources with Crossplane. Provision AWS, GCP, and Azure resources using custom resource

⏱ 15 minutes crossplaneinfrastructure-as-codemulti-cloud
🤖 AI & GPU intermediate

GenAI-Perf Benchmarking LLM Inference on Kubernetes

Benchmark LLM inference performance with NVIDIA GenAI-Perf on Kubernetes. Profile vLLM, TensorRT-LLM, and Triton endpoints with concurrency sweeps, token

⏱ 15 minutes genai-perfbenchmarkingvllm
📊 Observability intermediate

Grafana Kubernetes Monitoring Dashboards Guide

Deploy and configure Grafana dashboards for Kubernetes monitoring including dashboard 6417 for pod metrics, dashboard 315 for cluster overview, and custom

⏱ 15 minutes grafanaprometheusmonitoring
🎯 Helm intermediate

Helm Sprig Functions Complete Reference

Complete reference for Helm Sprig template functions including cat, print, join, tostring, add1, trim, quote, default, and more. Examples and common patterns

⏱ 15 minutes helmsprigtemplates
⚡ Autoscaling intermediate

KEDA Event-Driven Autoscaling on Kubernetes

Deploy KEDA for event-driven autoscaling on Kubernetes. Scale deployments to zero based on queue depth, HTTP requests, cron schedules, Prometheus

⏱ 15 minutes kedaautoscalingevent-driven
🔒 Security advanced

Kubernetes Audit Logging Configuration

Configure Kubernetes audit logging to track API requests. Define audit policies, capture who did what and when, send logs to backends like

⏱ 15 minutes audit-loggingsecuritycompliance
🚀 Deployments intermediate

Kubernetes Blue-Green and Canary Deployment Strategies

Implement blue-green and canary deployment strategies on Kubernetes. Zero-downtime releases using Service label switching, traffic splitting, progressive

⏱ 15 minutes blue-greencanarydeployment-strategy
⚙️ Configuration beginner

Kubernetes CronJob ConcurrencyPolicy Guide

Configure Kubernetes CronJob concurrencyPolicy with Allow, Forbid, and Replace options. Control concurrent job execution, prevent overlapping runs, and handle

⏱ 15 minutes cronjobschedulingconcurrency
🚀 Deployments beginner

Kubernetes DaemonSet One Pod Per Node Guide

Deploy DaemonSets on Kubernetes to run exactly one pod per node. Configure tolerations, node selectors, affinity rules, and resource management

⏱ 15 minutes daemonsetschedulingnode-management
📊 Observability intermediate

Kubernetes EFK Stack Centralized Logging

Deploy the EFK stack (Elasticsearch, Fluentd, Kibana) on Kubernetes for centralized log collection, processing, and visualization. DaemonSet log

⏱ 15 minutes efkelasticsearchfluentd
⚙️ Configuration beginner

Kubernetes EnvFrom ConfigMap Environment Variables

Inject all ConfigMap keys as environment variables using envFrom in Kubernetes pods. Configure configMapRef, secretRef, prefix options, and selective key

⏱ 15 minutes configmapenvironment-variablesenvfrom
🔧 Troubleshooting intermediate

Kubernetes Ephemeral Containers for Debugging

Debug running pods with Kubernetes ephemeral containers. Attach debug containers without restarting pods, troubleshoot distroless images, inspect network

⏱ 15 minutes ephemeral-containersdebuggingkubectl-debug
🔧 Troubleshooting intermediate

Kubernetes Finalizers Explained and Troubleshooting

Understand Kubernetes finalizers for resource cleanup. How finalizers block deletion, common stuck resource scenarios, manual removal

⏱ 15 minutes finalizersresource-lifecycletroubleshooting
🚀 Deployments intermediate

Kubernetes Graceful Shutdown and Pod Termination

Implement graceful shutdown for Kubernetes pods. Configure terminationGracePeriodSeconds, preStop hooks, SIGTERM handling, connection

⏱ 15 minutes graceful-shutdownpod-lifecycletermination
🔒 Security intermediate

Kubernetes gVisor and Kata Containers RuntimeClass

Deploy sandboxed container runtimes on Kubernetes using RuntimeClass with gVisor (runsc) and Kata Containers. Isolate untrusted workloads with kernel-level

⏱ 15 minutes gvisorkata-containersruntimeclass
⚡ Autoscaling advanced

Kubernetes HPA Custom Metrics Prometheus Adapter

Configure Kubernetes Horizontal Pod Autoscaler with custom Prometheus metrics via the Prometheus Adapter. Scale on request latency, queue depth, GPU

⏱ 15 minutes hpaautoscalingprometheus
🔧 Troubleshooting beginner

Kubernetes ImagePullBackOff Troubleshooting Guide

Debug and fix ImagePullBackOff and ErrImagePull errors in Kubernetes. Resolve authentication failures, registry connectivity, image not found, TLS certificate

⏱ 15 minutes imagepullbackofftroubleshootingcontainer-registry
🌐 Networking intermediate

Kubernetes Ingress TLS Certificate with cert-manager

Automate TLS certificate management on Kubernetes with cert-manager. Let's Encrypt integration, ClusterIssuer configuration, automatic renewal, wildcard

⏱ 15 minutes cert-managertlscertificates
🚀 Deployments beginner

Kubernetes Init Containers Patterns and Examples

Use Kubernetes init containers for pod initialization. Wait for dependencies, clone Git repos, setup configuration, database migrations, certificate

⏱ 15 minutes init-containerspod-lifecyclepatterns
⚙️ Configuration beginner

Kubernetes Kind Local Development Cluster

Create local Kubernetes clusters with kind (Kubernetes in Docker). Multi-node clusters, ingress setup, local registry, port mapping, volume mounts, and CI/CD

⏱ 15 minutes kindlocal-developmentdocker
⚙️ Configuration intermediate

Kubernetes Kustomize Configuration Management

Manage Kubernetes configurations with Kustomize. Build overlays for multiple environments, patch resources, generate ConfigMaps and Secrets, and integrate

⏱ 15 minutes kustomizeconfigurationoverlays
⚙️ Configuration beginner

Kubernetes Labels and Annotations Best Practices

Implement Kubernetes labels and annotations following best practices. Recommended label keys, organizational conventions, selectors, annotations vs labels

⏱ 15 minutes labelsannotationsmetadata
🚀 Deployments intermediate

Kubernetes Multi-Container Pod Patterns

Implement multi-container pod patterns in Kubernetes: sidecar for logging and proxying, ambassador for outbound connections, adapter for format

⏱ 15 minutes sidecarambassadoradapter
⚙️ Configuration beginner

Kubernetes Namespace Best Practices

Organize Kubernetes clusters with namespace best practices. Separation strategies, resource quotas, network policies, RBAC per namespace, naming

⏱ 15 minutes namespacesmulti-tenancyresource-quotas
🔒 Security intermediate

Default Deny NetworkPolicy: Zero-Trust Examples

Implement default deny network policies in Kubernetes for zero-trust pod networking. Block all ingress and egress by default, then allow only required traffic

⏱ 15 minutes networkpolicysecurityzero-trust
🔧 Troubleshooting beginner

Kubernetes OOMKilled Troubleshooting and Prevention

Debug and prevent OOMKilled container terminations in Kubernetes. Understand memory limits, diagnose memory leaks, configure resource requests, and implement

⏱ 15 minutes oomkilledtroubleshootingmemory
🚀 Deployments intermediate

Kubernetes Pod Disruption Budget PDB Guide

Protect application availability with Kubernetes PodDisruptionBudgets. Configure minAvailable and maxUnavailable for voluntary disruptions like node

⏱ 15 minutes pdbhigh-availabilitydisruption
⚙️ Configuration intermediate

Kubernetes Pod Priority and Preemption

Configure pod priority and preemption in Kubernetes for critical workloads. PriorityClass definitions, preemption behavior, protecting system

⏱ 15 minutes prioritypreemptionscheduling
🌐 Networking intermediate

Kubernetes Rate Limiting with Gateway API

Implement rate limiting for Kubernetes services using Gateway API, Istio, Kong, NGINX, and Envoy. Protect APIs from abuse

⏱ 15 minutes rate-limitinggateway-apiingress
🔒 Security intermediate

Kubernetes Secrets Management Best Practices

Manage Kubernetes Secrets securely with best practices. External Secrets Operator, sealed secrets, RBAC restrictions, encryption at rest, secret

⏱ 15 minutes secretssecurityexternal-secrets
🌐 Networking beginner

Kubernetes Service Types LoadBalancer ClusterIP NodePort

Understand Kubernetes Service types: ClusterIP, NodePort, LoadBalancer, and ExternalName. When to use each type, configuration examples, and traffic routing

⏱ 15 minutes servicesnetworkingloadbalancer
🚀 Deployments intermediate

Kubernetes StatefulSet Headless Service Guide

Deploy stateful applications with Kubernetes StatefulSets. Stable network identity, ordered deployment, persistent storage per pod, headless services

⏱ 15 minutes statefulsetheadless-servicepersistent-storage
⚙️ Configuration intermediate

Kubernetes Taints and Tolerations Node Scheduling

Control pod scheduling with Kubernetes taints and tolerations. Dedicate nodes to specific workloads, prevent scheduling on control-plane nodes, implement GPU

⏱ 15 minutes taintstolerationsscheduling
⚡ Autoscaling intermediate

Kubernetes Vertical Pod Autoscaler VPA Guide

Deploy and configure the Vertical Pod Autoscaler (VPA) on Kubernetes. Auto-adjust CPU and memory requests based on actual usage, right-size

⏱ 15 minutes vpaautoscalingresource-management
🌐 Networking intermediate

Kubernetes Linkerd Service Mesh mTLS Guide

Deploy Linkerd service mesh on Kubernetes for automatic mTLS, traffic observability, and reliability features. Zero-config encryption, per-route

⏱ 15 minutes linkerdservice-meshmtls
🤖 AI & GPU advanced

NCCL Environment Variables Complete Reference

Complete reference for NCCL environment variables on Kubernetes. Configure network transport, InfiniBand, GPUDirect RDMA, socket

⏱ 15 minutes ncclgpurdma
⚙️ Configuration beginner

OpenShift Support Lifecycle and Version Matrix

OpenShift Container Platform support lifecycle, version EOL dates, Kubernetes version mapping, upgrade paths, and Extended Update Support (EUS). Plan upgrades

⏱ 15 minutes openshiftlifecyclesupport
💾 Storage intermediate

Velero Kubernetes Backup and Disaster Recovery

Deploy Velero for Kubernetes cluster backup and disaster recovery. Configure scheduled backups, restore namespaces, migrate workloads between

⏱ 15 minutes velerobackupdisaster-recovery
🤖 AI & GPU advanced

Kubernetes Volcano Batch Scheduler Gang Scheduling

Deploy Volcano batch scheduler for gang scheduling on Kubernetes. Configure minAvailable for all-or-nothing pod group scheduling, queue management, and GPU job

⏱ 15 minutes volcanogang-schedulingbatch
🤖 AI & GPU advanced

NCCL and RCCL Networking Performance on Kubernetes

Optimize NCCL (NVIDIA) and RCCL (AMD) collective communication performance on Kubernetes GPU clusters. Network transport selection, bandwidth tuning, latency

⏱ 15 minutes ncclrcclgpu
🤖 AI & GPU intermediate

Weights and Biases Experiment Tracking on Kubernetes

Deploy Weights & Biases (W&B) on Kubernetes for ML experiment tracking, model registry, and hyperparameter sweeps. Self-hosted W&B Server, agent-based

⏱ 15 minutes wandbmlopsexperiment-tracking
🤖 AI & GPU advanced

Integrate DisaggregatedSet with llm-d on Kubernetes

Deploy disaggregated LLM inference using DisaggregatedSet and llm-d on Kubernetes. Install LWS then DS controller, model prefill/decode roles, wire llm-d

⏱ 15 minutes leaderworkersetdisaggregated-inferencellm-d
🤖 AI & GPU advanced

DisaggregatedSet for Multi-Role LLM Inference

Deploy disaggregated LLM inference on Kubernetes with DisaggregatedSet and LeaderWorkerSet. Separate prefill and decode phases across GPU pools

⏱ 15 minutes leaderworkersetdisaggregated-inferencellm
⚙️ Configuration advanced

Mirror OpenShift Releases to Disconnected Registry

Mirror OCP release images to an air-gapped Quay registry using oc adm release mirror. Auth setup, proxy config, ImageDigestMirrorSet, and disconnected updates.

⏱ 15 minutes openshiftdisconnectedregistry
🤖 AI & GPU advanced

NCCL Topology Dump and Tuning on Kubernetes

Use NCCL_TOPO_DUMP_FILE to export and inject GPU topology on Kubernetes for reproducible distributed training performance. Topology XML caching, environment

⏱ 15 minutes ncclgpunvidia
🔒 Security intermediate

Container Image Security Scanning on Kubernetes

Implement container image security scanning in Kubernetes CI/CD pipelines. Trivy, Grype, and admission controllers to prevent vulnerable images from running.

⏱ 15 minutes securitycontainer-imagestrivy
🔒 Security advanced

Container Image Signing and Verification on Kubernetes

Sign container images with Sigstore cosign and verify signatures at admission time with Kyverno or Connaisseur. Supply chain security for Kubernetes

⏱ 15 minutes cosignsigstoresupply-chain-security
🤖 AI & GPU intermediate

Hermes Agent Self-Hosted AI on Kubernetes

Deploy Hermes Agent (Nous Research) on Kubernetes as a persistent self-hosted AI agent with memory, automated skill creation, multi-platform

⏱ 15 minutes hermesai-agentnous-research
⚙️ Configuration intermediate

Image Pull Optimization for Kubernetes

Optimize container image pull performance in Kubernetes. Layer caching, pre-pulling with DaemonSets, image streaming, lazy pulling with stargz/nydus, registry

⏱ 15 minutes container-imagesperformancecaching
🚀 Deployments intermediate

Multi-Architecture Container Images for Kubernetes

Build and deploy multi-architecture container images for mixed Kubernetes clusters. Docker buildx, manifest lists, image indexes, platform-aware

⏱ 15 minutes multi-archcontainer-imagesbuildx
📊 Observability advanced

NVIDIA CNS with Insight Operator for Network Diagnostics

Deploy NVIDIA Cloud-Native Stack (CNS) with the Insight Operator and NVIDIA Insight tools for deep GPU fabric diagnostics. Collect NIC firmware health, link

⏱ 15 minutes nvidiacnsinsight
📊 Observability advanced

NVIDIA DOCA Telemetry for Network Monitoring on Kubernetes

Deploy NVIDIA DOCA Telemetry Service (DTS) to collect real-time network metrics from BlueField DPUs and ConnectX NICs. Export RoCE counters, port

⏱ 15 minutes nvidiadocatelemetry
🤖 AI & GPU advanced

NVIDIA Dynamo Production Tuning on Kubernetes

Tune NVIDIA Dynamo for production LLM inference: prefill/decode pool sizing, KV cache transfer optimization, NCCL backend selection, SLA-driven autoscaling

⏱ 15 minutes nvidia-dynamoinference-optimizationproduction
🤖 AI & GPU intermediate

NVIDIA OpenShell Sandboxed AI Agent Runtime on Kubernetes

Deploy NVIDIA OpenShell on Kubernetes for safe, private autonomous AI agent execution. Declarative YAML network policies, sandboxed containers

⏱ 15 minutes nvidiaopenshellagents
📊 Observability advanced

NVIDIA Nsight Operator for GPU Profiling on Kubernetes

Deploy NVIDIA Nsight Systems and Nsight Compute on Kubernetes for GPU workload profiling. Capture kernel traces, memory bandwidth, SM occupancy, and NCCL

⏱ 15 minutes nvidiansightprofiling
⚙️ Configuration intermediate

OCI Container Image Internals on Kubernetes

Understand OCI container image internals: layers as tar archive diffs, image configuration JSON, content-addressable storage with SHA-256, multi-platform image

⏱ 15 minutes ocicontainer-imagesregistry
⚙️ Configuration intermediate

OpenShift Cluster Update Process Explained

Complete guide to OpenShift Container Platform cluster updates. CVO workflow, Runlevels, Machine Config Operator node updates, update channels

⏱ 15 minutes openshiftcluster-updatecvo
🤖 AI & GPU advanced

Poolside AI Foundation Models on Kubernetes

Deploy Poolside AI foundation models for enterprise software agents on Kubernetes. On-prem and VPC deployment, multi-agent orchestration, sandboxed

⏱ 15 minutes poolsidefoundation-modelsagents
🚀 Deployments intermediate

Private Container Registry on Kubernetes

Deploy a private OCI container registry on Kubernetes with persistent storage, TLS, authentication, garbage collection, and high availability. Self-hosted

⏱ 15 minutes registryocicontainer-images
🤖 AI & GPU intermediate

Red Hat AI Studio on OpenShift

Deploy Red Hat AI Studio on OpenShift for end-to-end LLM development. Model catalog, InstructLab fine-tuning, experiment tracking, model

⏱ 15 minutes red-hatopenshiftai-studio
🤖 AI & GPU intermediate

Tabnine AI Code Assistant Self-Hosted on Kubernetes

Deploy Tabnine Enterprise self-hosted on Kubernetes for private AI code completion and chat. On-prem model serving, multi-model support (Tabnine

⏱ 15 minutes tabninecode-assistantenterprise-ai
🚀 Deployments intermediate

Canary Deployment with Gateway API Traffic Splitting

Implement canary deployments using Kubernetes Gateway API HTTPRoute traffic splitting. Gradually shift traffic from stable to canary version with weight-based

⏱ 15 minutes gateway-apicanarytraffic-splitting
💾 Storage intermediate

Validate CSI Storage Performance with FIO Kubernetes Job

Benchmark CSI storage performance using FIO inside a Kubernetes Job. Create a PVC backed by a CSI StorageClass, run sequential/random read/write

⏱ 15 minutes fiocsistorage
💾 Storage beginner

emptyDir Volumes: Sharing, Lifecycle, and Memory-Backed

Master emptyDir volumes for CKA/CKAD exam prep. Share data between containers, understand volume lifecycle across restarts vs Pod deletion, and configure

⏱ 15 minutes emptydirvolumescka
🔧 Troubleshooting intermediate

Chaos Mesh Fault Injection on Kubernetes

Deploy Chaos Mesh for chaos engineering on Kubernetes. Covers PodChaos, NetworkChaos, IOChaos, StressChaos experiments, scheduling, RBAC

⏱ 15 minutes chaos-engineeringchaos-meshfault-injection
🤖 AI & GPU advanced

GPUDirect Storage on Kubernetes

Configure NVIDIA GPUDirect Storage (GDS) for direct data path between NVMe/NFS storage and GPU memory bypassing CPU. Covers Magnum IO, cuFile API, GDS driver

⏱ 15 minutes gpudirectstoragenvidia
🌐 Networking advanced

InfiniBand Subnet Manager OpenSM on Kubernetes

Deploy and manage InfiniBand Subnet Manager (OpenSM) on Kubernetes for GPU cluster fabric management. Covers SM architecture, UFM integration, partition

⏱ 15 minutes infinibandopensmsubnet-manager
🔧 Troubleshooting intermediate

LitmusChaos Engineering on Kubernetes

Deploy LitmusChaos for resilience testing on Kubernetes. Covers ChaosEngine, ChaosExperiment, ChaosResult CRDs, built-in experiments, GameDay planning, Litmus

⏱ 15 minutes chaos-engineeringlitmusresilience
🌐 Networking intermediate

NMState Network Config for GPU Worker Nodes

Declaratively configure Ethernet bonding, VLANs, MTU, and static routes on GPU worker nodes using NMState on OpenShift. Covers bonding modes, LACP

⏱ 15 minutes nmstatebondingvlan
🤖 AI & GPU advanced

NVIDIA PeerMem for GPU-Direct RDMA

Install and configure nvidia_peermem kernel module to enable GPU-Direct RDMA between NVIDIA GPUs and Mellanox RDMA NICs. Covers module

⏱ 15 minutes nvidia-peermemgpu-directrdma
🌐 Networking intermediate

OpenShift Multus CNI Multiple Network Interfaces

Attach multiple network interfaces to Pods using Multus CNI on OpenShift. Covers NetworkAttachmentDefinitions, SR-IOV, macvlan, IPVLAN, traffic separation

⏱ 15 minutes multuscniopenshift
🌐 Networking advanced

RoCE PFC and ECN Lossless Ethernet for GPU Clusters

Configure RoCE v2 with Priority Flow Control (PFC) and ECN for lossless Ethernet RDMA on GPU clusters. Covers DSCP mapping, switch configuration, NIC

⏱ 15 minutes rocepfcecn
🚀 Deployments intermediate

Strimzi Kafka Operator on Kubernetes

Deploy Apache Kafka on Kubernetes with Strimzi operator. Covers Kafka CR, KafkaTopic, KafkaUser, KafkaConnect, KafkaBridge, rack awareness, storage

⏱ 15 minutes strimzikafkaoperator
🤖 AI & GPU advanced

Disable PCIe ACS for GPU-Direct P2P

Disable PCIe Access Control Services (ACS) to enable GPU-Direct peer-to-peer DMA between GPUs and RDMA NICs. Covers BIOS disable, kernel override, and when

⏱ 15 minutes acspciegpu-direct
🌐 Networking advanced

Dual-Fabric Mellanox: GPU InfiniBand + Storage Ethernet

Design and configure dual-fabric network architecture with separate Mellanox NICs for GPU communication (InfiniBand) and storage traffic (Ethernet). Covers

⏱ 15 minutes infinibandethernetmellanox
🤖 AI & GPU advanced

IOMMU BIOS and Kernel Config for NCCL GPU-Direct

Configure IOMMU at BIOS and kernel level to enable NCCL GPU-Direct RDMA on Kubernetes. Covers Intel VT-d, AMD-Vi, kernel parameters, passthrough

⏱ 15 minutes iommuncclgpu-direct
🤖 AI & GPU advanced

NCCL PXN Cross-NIC Communication via NVLink

Configure NCCL PXN (PCIe cross-NIC via NVLink) for multi-node GPU training where not every GPU has a direct RDMA NIC. Covers topology

⏱ 15 minutes ncclpxnnvlink
🌐 Networking advanced

NVIDIA IPAM for GPU Fabric IP Address Allocation

Configure nv-ipam (NVIDIA IPAM) to assign IP addresses on GPU fabric SR-IOV networks in Kubernetes. Covers IPPool CRDs, per-node allocation, InfiniBand IPoIB

⏱ 15 minutes nv-ipamipamgpu-fabric
🌐 Networking advanced

Fix SR-IOV 'Not Enough MMIO Resources' Error

Resolve the mlx5_core 'not enough MMIO resources for SR-IOV' error on OpenShift nodes with Mellanox ConnectX NICs. Covers BIOS settings, PCIe BAR

⏱ 15 minutes sriovmmiomellanox
🤖 AI & GPU advanced

Run:ai Distributed Inference with SR-IOV RDMA

Deploy distributed vLLM inference on Run:ai using SR-IOV RDMA for NCCL inter-node communication. Covers extended-resource for Mellanox VFs, network annotation

⏱ 15 minutes runaisriovrdma
🤖 AI & GPU advanced

Run:ai Distributed Inference with vLLM and NCCL

Deploy distributed LLM inference on Run:ai with vLLM tensor parallelism across multiple workers. Covers multi-node GPU splitting, NCCL configuration, PVC model

⏱ 15 minutes runaivllmnccl
🌐 Networking intermediate

SR-IOV VF to Container Mapping and Lifecycle

How SR-IOV Virtual Functions are mapped to containers in Kubernetes. Covers VF allocation flow, link state management (VFs are down when unassigned), device

⏱ 15 minutes sriovvirtual-functioncontainers
🌐 Networking intermediate

VT-x vs VT-d vs SR-IOV Explained

Understand the difference between CPU virtualization (VT-x/SVM), I/O virtualization (VT-d/AMD-Vi/IOMMU), and SR-IOV. Which to enable or disable for GPU

⏱ 15 minutes virtualizationiommusriov
🤖 AI & GPU advanced

Debug Distributed vLLM Inference with NCCL Verbose Logging

Debug distributed vLLM inference using NCCL_DEBUG=INFO and NCCL_DEBUG_SUBSYS=ALL. Covers air-gapped deployment with TRANSFORMERS_OFFLINE, interpreting NCCL

⏱ 15 minutes vllmnccldebugging
🤖 AI & GPU advanced

Kubernetes AI Infrastructure Scaling

Scale AI inference infrastructure on Kubernetes from 10K to 100K requests per second. Covers latency optimization, horizontal scaling, caching

⏱ 15 minutes ai-infrastructurescalinginference
🤖 AI & GPU intermediate

Kubernetes for AI Search and Discoverability

Deploy AI-searchable services on Kubernetes: llms.txt implementation, RAG-optimized APIs, structured data for AI chatbots, and infrastructure patterns

⏱ 15 minutes ai-searchllms-txtrag
🔒 Security intermediate

ServiceAccount for Running Pods

Configure Kubernetes ServiceAccounts for Pods: token mounting, RBAC permissions, workload identity, automountServiceAccountToken control, and least-privilege

⏱ 15 minutes serviceaccountrbacsecurity
🌐 Networking advanced

OpenShift SR-IOV RDMA InfiniBand Device Plugin

Configure and troubleshoot SR-IOV Network Operator with Mellanox ConnectX RDMA InfiniBand devices on OpenShift. Covers SriovNetworkNodePolicy, device

⏱ 15 minutes sriovrdmainfiniband
🔒 Security intermediate

OpenShift User Account Management

Manage user accounts in OpenShift: create users, assign roles, configure identity providers, manage groups, and implement RBAC for multi-tenant clusters.

⏱ 15 minutes openshiftuser-managementrbac
⚙️ Configuration intermediate

Kubernetes Cost Optimization Strategies

Comprehensive cost reduction strategies for Kubernetes clusters: right-sizing, spot instances, autoscaling, idle resource detection, namespace budgets, and GPU

⏱ 15 minutes cost-optimizationfinopsautoscaling
🔧 Troubleshooting intermediate

Ephemeral Containers for Live Debugging

Use kubectl debug with ephemeral containers to troubleshoot running Pods without restarting them. Attach debugging tools to distroless containers, inspect

⏱ 15 minutes ephemeral-containersdebuggingkubectl-debug
⚡ Autoscaling beginner

Goldilocks VPA Dashboard for Resource Optimization

Deploy Goldilocks to visualize VPA recommendations across all workloads and identify over-provisioned or under-provisioned containers with actionable

⏱ 15 minutes goldilocksvpacost-optimization
🚀 Deployments intermediate

Pod Disruption Budget (PDB) Production Guide

Configure Pod Disruption Budgets to protect application availability during voluntary disruptions: node drains, cluster upgrades, and autoscaler scale-downs.

⏱ 15 minutes pdbavailabilitydisruption
⚡ Autoscaling intermediate

Vertical Pod Autoscaler (VPA) Guide

Configure Kubernetes Vertical Pod Autoscaler to automatically right-size container CPU and memory requests based on actual usage. Covers

⏱ 15 minutes vpaautoscalingresource-management
🔒 Security advanced

Kyverno AI Workload Provenance Verification

Use Kyverno to verify software and content provenance for AI workloads: SBOM validation, model signing with Sigstore, dataset integrity, and supply chain

⏱ 15 minutes kyvernosupply-chainai-security
🔒 Security advanced

Kyverno CEL Policy Model Migration

Migrate Kyverno policies from YAML-based rules to CEL expressions for type-safe, performant validation. Covers CEL syntax, migration patterns, and comparison

⏱ 15 minutes kyvernocelpolicy
🔒 Security intermediate

Kyverno Drift Prevention for GitOps

Prevent configuration drift in GitOps workflows using Kyverno: block manual kubectl edits, enforce ArgoCD/Flux ownership, and detect out-of-band changes

⏱ 15 minutes kyvernogitopsargocd
🔒 Security advanced

Kyverno ISO 27001 Compliance Policies

Implement ISO 27001 and BSI IT-Grundschutz security controls in Kubernetes using Kyverno policies: access control, cryptography, operations security, and audit

⏱ 15 minutes kyvernocomplianceiso27001
🔒 Security advanced

Kyverno LLM Inference Cost and Security Guardrails

Implement policy-as-code guardrails for LLM inference workloads with Kyverno: GPU quota enforcement, model size limits, cost controls, prompt injection

⏱ 15 minutes kyvernollminference
🔒 Security advanced

Kyverno ReBAC Multi-Tenant RBAC Automation

Implement Relationship-Based Access Control (ReBAC) with Kyverno to automate multi-tenant RBAC at scale: dynamic RoleBindings, namespace

⏱ 15 minutes kyvernorbacmulti-tenancy
🔒 Security advanced

Kyverno Webhook Topology and Admission Latency

Optimize Kyverno webhook topology for minimal admission latency: webhook configuration tuning, failure policies, timeout settings, and lessons from migrating

⏱ 15 minutes kyvernowebhookadmission-control
🔧 Troubleshooting beginner

OpenShift oc cp File Copy Guide

Use oc cp to copy files and directories between local machine and Pods. Covers tar-based transfer, container selection, large file handling, and comparison

⏱ 15 minutes openshiftoc-cpfile-transfer
🔧 Troubleshooting beginner

OpenShift oc rsync File Transfer

Use oc rsync to copy files between local machine and Pods in OpenShift. Covers upload, download, live sync, filtering, and common patterns for debugging

⏱ 15 minutes openshiftoc-rsyncfile-transfer
🤖 AI & GPU advanced

Deep Learning with Large Datasets on K8s

Optimize deep learning training with large datasets on Kubernetes. Covers data loading, caching strategies, parallel prefetch, and storage architecture

⏱ 15 minutes trainingdatasetsstorage
🤖 AI & GPU advanced

Distributed Multi-GPU Inference on Kubernetes

Deploy distributed inference across multiple GPUs and nodes on Kubernetes. Covers tensor parallelism, pipeline parallelism, vLLM, and NIM multi-GPU serving.

⏱ 15 minutes inferencemulti-gpudistributed
🔒 Security intermediate

External Secrets Operator on OpenShift

Manage Kubernetes secrets from external vaults using External Secrets Operator on OpenShift. Covers ExternalSecret CRD, SecretStore configuration, and GitOps

⏱ 15 minutes secretssecurityopenshift
💾 Storage intermediate

PScale NFS and SMB Storage Benchmarking

Benchmark NFS and SMB storage performance on Kubernetes using fio clients in Pods. Covers multi-client parallel testing, bandwidth measurement, and IOPS

⏱ 15 minutes benchmarkingstoragenfs
🤖 AI & GPU advanced

FSDP LoRA Fine-Tuning LLMs on Kubernetes

Fine-tune large language models with FSDP and LoRA on Kubernetes. Covers memory-efficient loading, checkpoint strategies, and multi-node H200 training.

⏱ 15 minutes fsdplorafine-tuning
🤖 AI & GPU advanced

NVIDIA GenAI-Perf Inference Benchmarking

Benchmark LLM inference throughput and latency on Kubernetes using NVIDIA GenAI-Perf. Covers vLLM, Run:ai, concurrency testing, and multi-location client runs.

⏱ 15 minutes benchmarkinginferencenvidia
🤖 AI & GPU advanced

LeaderWorkerSet Multi-Node Inference on K8s

Deploy multi-node distributed inference using LeaderWorkerSet (LWS) operator on Kubernetes. Covers vLLM pipeline parallelism across nodes for 405B+ parameter

⏱ 15 minutes inferencedistributedlws
🤖 AI & GPU advanced

Mistral FSDP LoRA Complete Accelerate Config

Complete accelerate FSDP configuration for fine-tuning Mistral-Small-4 11B with LoRA on multi-GPU H200 clusters. Covers every FSDP2 setting with explanations.

⏱ 15 minutes fsdploramistral
🤖 AI & GPU advanced

Multi-Node Distributed Training on Kubernetes

Run distributed deep learning training across multiple GPU nodes on Kubernetes. Covers PyTorch DDP, DeepSpeed, Horovod, and MPI jobs with NCCL optimization.

⏱ 15 minutes trainingdistributedmulti-node
🤖 AI & GPU advanced

NVIDIA GPUDirect Storage Benchmark on K8s

Benchmark NVIDIA GPUDirect Storage (GDS) on Kubernetes for direct NVMe-to-GPU data transfers. Covers gdsio, gds_stats, performance validation, and comparison

⏱ 15 minutes benchmarkingnvidiagds
🤖 AI & GPU advanced

NVIDIA GPU Operator GitOps on OpenShift

Deploy NVIDIA GPU Operator on OpenShift via GitOps with ArgoCD. Covers ClusterPolicy configuration, DCGM exporter, drain settings, tolerations, and rolling

⏱ 15 minutes nvidiagpu-operatoropenshift
🌐 Networking advanced

NVIDIA Network Operator NicClusterPolicy

Deploy NVIDIA Network Operator on OpenShift with NicClusterPolicy for DOCA telemetry, NIC feature discovery, RDMA IPAM, and OFED drivers. GitOps-managed

⏱ 15 minutes nvidianetwork-operatorrdma
🤖 AI & GPU advanced

OpenShift GPU Node Resource Planning

Plan CPU, memory, and overhead budgets for GPU nodes running NVIDIA GPU Operator, Network Operator, Run:ai, and OpenShift infrastructure Pods. Understand what

⏱ 15 minutes openshiftgpucapacity-planning
🤖 AI & GPU advanced

Run:ai Backend Architecture on OpenShift

Understand the full Run:ai backend deployment on OpenShift with 40+ microservices including Keycloak, PostgreSQL, NATS, Thanos, Traefik, and workload

⏱ 15 minutes runaiopenshiftarchitecture
🤖 AI & GPU advanced

Run:ai Distributed PyTorch Training on OpenShift

Submit multi-node distributed PyTorch training jobs on OpenShift using Run:ai CLI. Covers DDP, FSDP, RDMA networking, and GPU scheduling.

⏱ 15 minutes runaiopenshiftdistributed
🤖 AI & GPU advanced

FSDP Distributed Training on Run:ai

Run PyTorch FSDP distributed training workloads on Run:ai with GPU scheduling, event tracking, and GPU memory monitoring. Covers Mistral-class model

⏱ 15 minutes runaidistributed-trainingfsdp
🤖 AI & GPU intermediate

Run:ai GPU Metrics Pipeline with DCGM and Thanos

End-to-end GPU metrics pipeline on Run:ai: DCGM exporter collects GPU utilization, Prometheus scrapes, remote-writes to Thanos Receive, and Grafana dashboards

⏱ 15 minutes runaidcgmthanos
🔒 Security intermediate

Run:ai Keycloak SSO Authentication Setup

Configure Run:ai SSO authentication with Keycloak on OpenShift: OIDC integration, user federation, role mapping, and troubleshooting login failures.

⏱ 15 minutes runaikeycloaksso
📊 Observability advanced

Run:ai Observability with OpenTelemetry

Configure Run:ai observability on OpenShift with OpenTelemetry Collector, Prometheus receivers, metrics enrichment, OAuth2 export, and GPU metric collection

⏱ 15 minutes runaiopentelemetryobservability
🤖 AI & GPU intermediate

Run:ai Platform Backend Components

Overview of Run:ai backend StatefulSets and components on OpenShift: Thanos receive/query, Keycloak, NATS, Redis, PostgreSQL, workload controllers, and their

⏱ 15 minutes runaiarchitectureopenshift
🤖 AI & GPU intermediate

Run:ai Training Job Submit Script Pattern

Production pattern for submitting Run:ai training jobs via shell scripts with GPU fractional allocation, NFS mounts, custom Python environments, and private

⏱ 15 minutes runaitraininggpu
🤖 AI & GPU advanced

Run:ai Workload Controllers on OpenShift

Understand Run:ai cluster-level workload controllers on OpenShift: workload-controller, workload-overseer, workload-exporter, and status-updater components.

⏱ 15 minutes runaiopenshiftcontrollers
📊 Observability advanced

Thanos Receive Memory Sizing Guide

Calculate correct memory limits for Thanos Receive based on WAL segments, active series, retention, and ingestion rate. Prevent OOMKill crash loops

⏱ 15 minutes thanosmemorycapacity-planning
🔧 Troubleshooting advanced

Thanos Receive OOMKilled CrashLoopBackOff

Debug and fix Thanos Receive StatefulSet OOMKilled CrashLoopBackOff caused by WAL replay exceeding memory limits. Covers ArgoCD conflict resolution, liveness

⏱ 15 minutes thanosoomcrashloopbackoff
🔧 Troubleshooting intermediate

Fix Thanos Receive OOMKilled in Run:ai

Troubleshoot and fix Thanos Receive OOMKilled (exit code 137) with 143+ restarts in Run:ai backend on OpenShift. Covers memory tuning, TSDB

⏱ 15 minutes thanosrunaioomkilled
🔒 Security intermediate

CVE-2026-31431 Linux Kernel Crypto Fix

Security advisory for CVE-2026-31431: Linux kernel crypto algif_aead vulnerability. Impact on Kubernetes nodes and how to patch container host kernels.

⏱ 15 minutes securitycvelinux-kernel
🔒 Security advanced

Kubernetes 1.36 Constrained Impersonation

Use constrained impersonation in Kubernetes 1.36 to limit which identities a user can impersonate. Tighter RBAC control for multi-tenant clusters.

⏱ 15 minutes kubernetes-1.36rbacsecurity
💾 Storage advanced

Kubernetes 1.36 CSI Differential Snapshots

Use CSI differential snapshots in Kubernetes 1.36 to track changed blocks between snapshots. Enables incremental backups and faster disaster recovery.

⏱ 15 minutes kubernetes-1.36csisnapshots
⚙️ Configuration advanced

Kubernetes 1.36 Declarative Type Validation

Kubernetes 1.36 introduces declarative validation for native API types using validation-gen. Replaces hand-written validation code with struct tag annotations.

⏱ 15 minutes kubernetes-1.36apivalidation
🤖 AI & GPU advanced

Kubernetes 1.36 DRA for GPU and TPU Management

Use Dynamic Resource Allocation in Kubernetes 1.36 for advanced GPU/TPU management with partitionable devices, device taints, and tolerations.

⏱ 15 minutes kubernetes-1.36dragpu
🔒 Security advanced

Kubernetes 1.36 External SA Token Signing

Delegate ServiceAccount token signing to external KMS or HSM systems in Kubernetes 1.36. Improve security with hardware-backed key management.

⏱ 15 minutes kubernetes-1.36service-accountssecurity
🌐 Networking intermediate

Migrate from externalIPs in Kubernetes 1.36

Service externalIPs are deprecated in Kubernetes 1.36 due to CVE-2020-8554. Migrate to Gateway API, LoadBalancer services, or MetalLB for external access.

⏱ 15 minutes kubernetes-1.36deprecationnetworking
🤖 AI & GPU advanced

Kubernetes 1.36 Gang Scheduling

Use gang scheduling in Kubernetes 1.36 to schedule Pod groups atomically. Essential for distributed ML training, MPI jobs, and Spark workloads.

⏱ 15 minutes kubernetes-1.36schedulinggang-scheduling
⚙️ Configuration beginner

Migrate from gitRepo Volume in Kubernetes 1.36

The gitRepo volume plugin is permanently removed in Kubernetes 1.36. Migrate to init containers or OCI volumes to avoid broken deployments.

⏱ 15 minutes kubernetes-1.36migrationvolumes
⚙️ Configuration advanced

Kubernetes 1.36 Graceful Leader Transition

Configure graceful leader transitions in Kubernetes 1.36 control plane components. Eliminate brief outages during leader election failovers.

⏱ 15 minutes kubernetes-1.36high-availabilitycontrol-plane
⚙️ Configuration advanced

Kubernetes 1.36 L3 Cache Topology in CPU Manager

Configure L3 cache topology awareness in Kubernetes 1.36 CPU Manager. Allocate CPUs sharing L3 cache for better performance in latency-sensitive workloads.

⏱ 15 minutes kubernetes-1.36cpu-managerperformance
⚙️ Configuration advanced

Kubernetes 1.36 Memory QoS with cgroups v2

Configure memory quality of service with cgroups v2 in Kubernetes 1.36. Set memory.min and memory.high for guaranteed memory and throttling before OOM kills.

⏱ 15 minutes kubernetes-1.36memorycgroups
⚙️ Configuration advanced

Kubernetes 1.36 Mixed Version Proxy

Use the Mixed Version Proxy in Kubernetes 1.36 to handle API version skew during rolling upgrades. Ensures API availability across mixed control plane versions.

⏱ 15 minutes kubernetes-1.36api-serverupgrades
📊 Observability intermediate

Kubernetes 1.36 Native Histogram Metrics

Enable Prometheus native histograms in Kubernetes 1.36 for higher-resolution metrics with lower storage cost. Covers all control plane components.

⏱ 15 minutes kubernetes-1.36prometheusmetrics
💾 Storage intermediate

Kubernetes 1.36 OCI Volume Source

Use OCI VolumeSource in Kubernetes 1.36 to pull OCI artifacts directly into Pod volumes. No init containers needed for ML models, configs, or data.

⏱ 15 minutes kubernetes-1.36ocivolumes
🔒 Security advanced

Kubernetes 1.36 Pod Certificates (mTLS)

Use Pod Certificates in Kubernetes 1.36 to authenticate Pods to the API server via mTLS. Built-in X.509 certificate provisioning without external tools.

⏱ 15 minutes kubernetes-1.36securitymtls
⚙️ Configuration intermediate

Kubernetes 1.36 Pod-Level Resource Limits

Set resource requests and limits at the Pod level in Kubernetes 1.36 instead of per-container. Simplifies multi-container Pod resource management.

⏱ 15 minutes kubernetes-1.36resourcespods
🤖 AI & GPU advanced

Kubernetes 1.36 RestartAllContainers for ML

Use the RestartAllContainers policy in Kubernetes 1.36 to restart all Pod containers in-place when a worker fails, avoiding costly ML training rescheduling.

⏱ 15 minutes kubernetes-1.36machine-learninggpu
🔒 Security intermediate

Kubernetes 1.36 SELinux Mount-Time Labeling

Configure SELinux mount-time volume labeling in Kubernetes 1.36 to eliminate slow recursive relabeling and speed up Pod startup times dramatically.

⏱ 15 minutes kubernetes-1.36selinuxsecurity
🌐 Networking intermediate

Kubernetes 1.36 SPDY to WebSocket Migration

Kubernetes 1.36 continues migrating kubectl exec/attach/port-forward from SPDY to WebSockets. Understand the changes and troubleshoot connection issues.

⏱ 15 minutes kubernetes-1.36kubectlwebsockets
🔧 Troubleshooting beginner

Kubernetes 1.36 Statusz and Flagz Endpoints

Use /statusz and /flagz debug endpoints in Kubernetes 1.36 control plane components. Inspect runtime status and effective flag values without log parsing.

⏱ 15 minutes kubernetes-1.36debuggingcontrol-plane
🤖 AI & GPU advanced

Kubernetes 1.36 Topology-Aware Scheduling

Use topology-aware workload scheduling in Kubernetes 1.36 to place Pods on nodes with optimal GPU, NUMA, and network topology for ML training.

⏱ 15 minutes kubernetes-1.36schedulingtopology
💾 Storage intermediate

Kubernetes 1.36 VolumeGroupSnapshot GA

Use VolumeGroupSnapshot in Kubernetes 1.36 to take crash-consistent snapshots of multiple volumes atomically. Now GA and production-ready.

⏱ 15 minutes kubernetes-1.36storagesnapshots
🔒 Security advanced

Kubernetes 1.36 User Namespaces in Pods

Enable user namespaces in Kubernetes 1.36 for rootless containers and stronger Pod isolation. Map container root to unprivileged host UIDs.

⏱ 15 minutes kubernetes-1.36user-namespacessecurity
🌐 Networking advanced

Cilium: eBPF-Powered K8s Networking

Deploy Cilium CNI in Kubernetes for eBPF-based networking, network policies, service mesh, and observability with Hubble.

⏱ 15 minutes ciliumebpfcni
⚡ Autoscaling intermediate

KEDA: Event-Driven Autoscaling for K8s

Scale Kubernetes workloads with KEDA based on events from Kafka, RabbitMQ, AWS SQS, Prometheus metrics, and cron schedules.

⏱ 12 minutes kedaautoscalingevent-driven
🚀 Deployments advanced

Knative: Serverless Workloads on Kubernetes

Run serverless containers with Knative Serving and Eventing on Kubernetes. Auto-scaling to zero, traffic splitting, revision management.

⏱ 15 minutes knativeserverlessscale-to-zero
⚙️ Configuration intermediate

NATS: Lightweight Messaging for Kubernetes

Deploy NATS messaging in Kubernetes for pub/sub, request/reply, and JetStream persistent streaming. High-performance alternative to Kafka for cloud-native mi...

⏱ 10 minutes natsmessagingpub-sub
🔒 Security advanced

SPIFFE/SPIRE: Workload Identity for K8s

Deploy SPIRE for Kubernetes workload identity using SPIFFE standards. Automatic mTLS certificate issuance, cross-cluster identity federation.

⏱ 12 minutes spiffespireidentity
🤖 AI & GPU intermediate

NVIDIA GPU Feature Discovery for Kubernetes

Deploy GPU Feature Discovery (GFD) to auto-label Kubernetes nodes with GPU model, MIG capability, CUDA version, and driver info for intelligent scheduling.

⏱ 15 minutes nvidiagpuscheduling
🤖 AI & GPU advanced

OpenShift NVIDIA MIG Reconfiguration Without Reboot

Reconfigure NVIDIA MIG geometry on OpenShift without rebooting nodes. Use nvidia-mig-manager with node labels to dynamically switch GPU partitions.

⏱ 15 minutes openshiftnvidiamig
🤖 AI & GPU advanced

Talos Linux MIG Configuration with GPU Operator

Configure NVIDIA MIG on Talos Linux Kubernetes clusters. Install GPU Operator, set MIG strategy, and dynamically partition A100 GPUs without node reboot.

⏱ 15 minutes talosnvidiamig
🤖 AI & GPU advanced

DGX H100 nvidia-smi topo -m Guide

Read nvidia-smi topo -m output on DGX H100 systems. Understand NVLink, NVSwitch, PCIe topology, GPU-to-GPU bandwidth, and NUMA affinity for Kubernetes.

⏱ 15 minutes nvidiadgxh100
📊 Observability intermediate

GPU Operator Node Status Exporter Metrics

Monitor NVIDIA GPU Operator node validation with gpu_operator_node_driver_ready and status exporter metrics. Prometheus alerts for GPU node health.

⏱ 15 minutes nvidiagpu-operatorprometheus
📊 Observability beginner

Grafana Dashboard 6417 Kubernetes Pods

Import Grafana dashboard 6417 for Kubernetes pod monitoring. Configure Prometheus data source, visualize CPU, memory, network, and disk usage per pod.

⏱ 15 minutes grafanaprometheusmonitoring
🎯 Helm beginner

Helm Install: Deploy Charts Guide

Install Helm charts on Kubernetes with helm install, upgrade, rollback, and values customization. Repository management, OCI registries, and release lifecycle.

⏱ 10 minutes helmchartsdeployment
🔒 Security advanced

Kata Containers RuntimeClass Kubernetes

Deploy Kata Containers with Kubernetes RuntimeClass for hardware-isolated pods. VM-based sandboxing, microVM configuration, and multi-runtime clusters.

⏱ 20 minutes kata-containersruntimeclasssecurity
⚙️ Configuration beginner

kubectl apply vs create: Key Differences

Understand when to use kubectl apply vs kubectl create. Declarative vs imperative, last-applied annotation, server-side apply, and GitOps workflows.

⏱ 8 minutes kubectlconfigurationgitops
⚙️ Configuration beginner

kubectl Cheat Sheet: Essential Commands

Complete kubectl cheat sheet with essential commands for pods, deployments, services, debugging, and cluster management. Copy-paste ready examples.

⏱ 15 minutes kubectlcheat-sheetcka
🔧 Troubleshooting beginner

kubectl describe: Read Pod Events Guide

Use kubectl describe pod to read events, conditions, and container states. Diagnose scheduling failures, image pulls, crashes, and probe failures.

⏱ 8 minutes kubectltroubleshootingevents
🔧 Troubleshooting beginner

kubectl exec: Run Commands in Pods

Use kubectl exec to run commands inside running pods. Interactive shell, multi-container pods, debugging techniques, and security considerations.

⏱ 8 minutes kubectltroubleshootingdebugging
🚀 Deployments beginner

kubectl get pods: Output Formats Guide

Master kubectl get pods with output formats, label selectors, field selectors, and custom columns. Wide output, JSON, YAML, and jsonpath examples.

⏱ 10 minutes kubectlpodscka
🚀 Deployments beginner

kubectl run: Create Pod from Command Line

Use kubectl run to create pods and deployments from the command line. Dry-run output, resource limits, environment variables, and CKA exam patterns.

⏱ 8 minutes kubectlpodscka
🔒 Security advanced

K8s Admission Webhooks: Validate and Mutate

Build Kubernetes validating and mutating admission webhooks. Webhook configuration, TLS setup, failure policies, and common patterns for policy enforcement.

⏱ 15 minutes admission-webhookssecuritypolicy
⚙️ Configuration beginner

kubectl explain: API Resource Reference

Use kubectl explain and api-resources to discover Kubernetes API objects. Field documentation, resource versions, short names, and API group exploration.

⏱ 6 minutes kubectlapireference
🚀 Deployments intermediate

Argo Workflows: K8s-Native Pipeline Engine

Run CI/CD pipelines and data workflows with Argo Workflows in Kubernetes. DAG workflows, artifact passing, retry strategies.

⏱ 12 minutes argo-workflowsci-cdpipelines
🚀 Deployments intermediate

ArgoCD GitOps: Declarative Continuous Delivery

Deploy applications with ArgoCD GitOps in Kubernetes. Application sync, auto-heal, multi-cluster management, ApplicationSets, and Helm/Kustomize integration.

⏱ 15 minutes argocdgitopsci-cd
🔒 Security advanced

K8s Audit Logging: Track API Activity

Configure Kubernetes audit logging to track API requests. Audit policy levels, log backends, webhook integration, and security compliance monitoring.

⏱ 12 minutes auditsecuritylogging
⚙️ Configuration advanced

Backstage: K8s Developer Portal and Catalog

Deploy the Backstage developer portal on Kubernetes for a service catalog, API docs, software templates, and TechDocs documentation.

⏱ 15 minutes backstagedeveloper-portalplatform-engineering
🔒 Security intermediate

cert-manager: Automated TLS Certificates

Automate TLS certificate management with cert-manager in Kubernetes. Let's Encrypt integration, Issuer configuration, wildcard certificates, and automatic

⏱ 12 minutes tlscertificatescert-manager
🔒 Security advanced

K8s Certificate Rotation and Management

Manage Kubernetes cluster certificates with kubeadm. Check expiration, renew certificates, configure auto-rotation, and troubleshoot TLS errors.

⏱ 12 minutes certificatestlssecurity
⚙️ Configuration advanced

Cluster API: Declarative K8s Management

Manage Kubernetes cluster lifecycle with Cluster API. Provision, upgrade, and scale clusters declaratively using management clusters and infrastructure provi...

⏱ 15 minutes cluster-apicluster-managementinfrastructure
⚙️ Configuration beginner

K8s ConfigMap: Create and Mount Guide

Create Kubernetes ConfigMaps from files, literals, and directories. Mount as volumes or environment variables with hot-reload and immutable ConfigMap patterns.

⏱ 10 minutes configmapconfigurationvolumes
⚙️ Configuration intermediate

K8s Container Runtimes: containerd vs CRI-O

Compare Kubernetes container runtimes containerd and CRI-O. Configuration, crictl debugging, runtime class for gVisor and Kata, and migration from Docker.

⏱ 10 minutes container-runtimecontainerdcri-o
🔧 Troubleshooting intermediate

K8s CoreDNS: Troubleshoot DNS Issues

Troubleshoot Kubernetes CoreDNS resolution failures. Debug dns pods, ndots settings, search domains, custom Corefile, and forward plugin configuration.

⏱ 10 minutes corednsdnstroubleshooting
⚙️ Configuration advanced

K8s Custom Resources: CRD Development

Create Kubernetes Custom Resource Definitions with schema validation, additional printer columns, subresources, and conversion webhooks.

⏱ 12 minutes crdcustom-resourcesapi
🔧 Troubleshooting beginner

Fix CreateContainerError in Kubernetes

Troubleshoot Kubernetes CreateContainerError with step-by-step debugging. ConfigMap mounts, Secret references, volume permissions, and container runtime issues.

⏱ 8 minutes troubleshootingcontainerserrors
🚀 Deployments intermediate

K8s CronJob: Advanced Scheduling Patterns

Configure Kubernetes CronJobs with concurrency policies, deadlines, history limits, and suspend/resume. Timezone scheduling, failure handling, and monitoring.

⏱ 10 minutes cronjobschedulingbatch
⚙️ Configuration advanced

Crossplane: Provision Cloud from Kubernetes

Manage cloud infrastructure with Crossplane in Kubernetes. Provision AWS RDS, S3, Azure databases, and GCP resources using Kubernetes manifests and compositi...

⏱ 15 minutes crossplaneinfrastructurecloud
💾 Storage advanced

K8s CSI Drivers: Container Storage Guide

Install and configure Kubernetes CSI drivers for persistent storage. CSI architecture, StorageClass provisioners, snapshots, and volume expansion patterns.

⏱ 15 minutes csistoragepersistent-volumes
🚀 Deployments beginner

K8s DaemonSet: Run Pod on Every Node

Deploy Kubernetes DaemonSets to run one pod per node. Log collectors, monitoring agents, node-level networking, tolerations, and update strategies.

⏱ 10 minutes daemonsetdeploymentsmonitoring
🚀 Deployments intermediate

Dapr: Microservice Building Blocks on K8s

Deploy Dapr in Kubernetes for service invocation, state management, pub/sub messaging, and secrets. Sidecar architecture that works with any language or fram...

⏱ 12 minutes daprmicroservicespub-sub
🚀 Deployments intermediate

K8s Deployment Rolling Update Strategy

Configure Kubernetes Deployment rolling updates with maxSurge and maxUnavailable. Rollback, revision history, blue-green, and canary deployment patterns.

⏱ 12 minutes deploymentsrolling-updaterollback
🌐 Networking intermediate

K8s DNS for Services: Resolution Guide

Understand Kubernetes DNS for Services and Pods. Service discovery patterns, FQDN format, headless services, DNS policies, ndots configuration.

⏱ 10 minutes dnsservicesnetworking
💾 Storage beginner

K8s Volumes: emptyDir and hostPath Guide

Configure Kubernetes emptyDir and hostPath volumes for temporary storage and host filesystem access. Memory-backed tmpfs, size limits.

⏱ 8 minutes volumesstorageemptydir
🌐 Networking intermediate

K8s EndpointSlice and Service Discovery

Understand Kubernetes EndpointSlice for scalable service discovery. DNS resolution, headless services, external services, and endpoint conditions.

⏱ 10 minutes endpointsliceservice-discoverydns
💾 Storage advanced

K8s etcd Backup and Restore Commands

Backup and restore Kubernetes etcd with etcdctl snapshot save and restore. Automated CronJob backups, verification, and disaster recovery procedures.

⏱ 15 minutes etcdbackupdisaster-recovery
⚙️ Configuration advanced

etcd Deep Dive: K8s Data Store Operations

Master etcd operations for Kubernetes. Backup and restore, compaction, defragmentation, health checks, member management, and performance tuning for production.

⏱ 15 minutes etcdbackupcluster-administration
🔒 Security intermediate

External Secrets Operator: Vault and Cloud

Sync secrets from HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, and GCP Secret Manager into Kubernetes with External Secrets Operator.

⏱ 10 minutes secretsvaultsecurity
🔒 Security advanced

Falco: K8s Runtime Threat Detection

Deploy Falco for Kubernetes runtime security monitoring. Detect suspicious container behavior, privilege escalation, file access.

⏱ 12 minutes falcoruntime-securitysecurity
🚀 Deployments intermediate

Flux: GitOps Toolkit for Kubernetes

Deploy Flux GitOps toolkit for Kubernetes continuous delivery. Kustomization, HelmRelease, image automation, and multi-tenant GitOps with source controllers.

⏱ 12 minutes fluxgitopsci-cd
🌐 Networking intermediate

Gateway API: Next-Gen K8s Ingress

Replace Kubernetes Ingress with Gateway API. HTTPRoute, GRPCRoute, TLSRoute configuration. Multi-tenant gateways, traffic splitting, and header-based routing.

⏱ 12 minutes gateway-apinetworkingingress
🚀 Deployments intermediate

Kubernetes Graceful Shutdown Guide

Implement graceful shutdown in Kubernetes pods. Configure terminationGracePeriodSeconds, preStop hooks, SIGTERM handling, and drain connections properly.

⏱ 15 minutes graceful-shutdowndeploymentslifecycle
🔒 Security intermediate

Harbor: Private Container Registry on K8s

Deploy Harbor container registry in Kubernetes for private image hosting. Vulnerability scanning, image replication, RBAC, Helm chart repository.

⏱ 12 minutes harborregistrysecurity
⚡ Autoscaling intermediate

K8s Horizontal Scaling: Manual and Auto

Scale Kubernetes workloads horizontally with kubectl scale, HPA, and KEDA. Covers replica management and event-driven scaling strategies.

⏱ 10 minutes autoscalinghpascaling
⚡ Autoscaling intermediate

K8s HPA: Autoscale on CPU and Memory

Configure Kubernetes HorizontalPodAutoscaler to scale on CPU and memory utilization. Target utilization, minReplicas, maxReplicas, and scaling behavior.

⏱ 10 minutes hpaautoscalingcpu
🔧 Troubleshooting beginner

Troubleshoot ImagePullBackOff and ErrImagePull

Troubleshoot Kubernetes ImagePullBackOff and ErrImagePull errors. Private registry auth, image pull secrets, tag verification, and network connectivity fixes.

⏱ 8 minutes troubleshootingimage-pullcontainers
🌐 Networking intermediate

K8s Ingress NGINX: Routing and TLS

Configure Kubernetes Ingress with NGINX controller. Path-based routing, TLS termination, annotations, rate limiting, and multiple hosts with examples.

⏱ 12 minutes ingressnginxtls
🚀 Deployments beginner

K8s Init Containers: Setup Before Main

Use Kubernetes init containers to run setup tasks before main containers start. Database migrations, config fetching, dependency checks, and ordering.

⏱ 8 minutes init-containerspodsdeployments
🚀 Deployments beginner

K8s Jobs and CronJobs: Complete Guide

Create Kubernetes Jobs and CronJobs for batch processing. Parallelism, backoff limits, completion counts, cron schedules, and failure handling patterns.

⏱ 10 minutes jobscronjobsbatch
⚙️ Configuration advanced

kubeadm init: Bootstrap K8s Cluster

Bootstrap a Kubernetes cluster with kubeadm init and join. Control plane setup, worker node joining, pod network installation.

⏱ 20 minutes kubeadmcluster-setupinstallation
⚙️ Configuration advanced

K8s kubeadm Upgrade: Step-by-Step Guide

Upgrade Kubernetes clusters with kubeadm from one minor version to the next. Control plane upgrade, worker node drain, kubelet upgrade, and rollback procedures.

⏱ 20 minutes kubeadmupgradecluster-management
🔧 Troubleshooting intermediate

kubectl debug: Advanced Pod Debugging

Use kubectl debug for ephemeral containers, node debugging, and pod copy debugging. Debug distroless images, share process namespaces, and node-level access.

⏱ 10 minutes kubectldebuggingephemeral-containers
⚙️ Configuration beginner

kubectl Plugins: Extend with Krew

Install kubectl plugins with Krew package manager. Essential plugins for debugging, resource management, and cluster operations. Build custom kubectl plugins.

⏱ 8 minutes kubectlpluginskrew
⚙️ Configuration intermediate

kubectl wait: Script K8s Operations

Use kubectl wait for scripting Kubernetes operations. Wait for pod ready, job completion, deployment rollout, and custom conditions in CI/CD pipelines.

⏱ 8 minutes kubectlscriptingautomation
⚙️ Configuration advanced

K8s Kubelet Configuration and Tuning

Configure Kubernetes kubelet with KubeletConfiguration API. Resource reservation, eviction thresholds, image garbage collection, and node allocatable settings.

⏱ 12 minutes kubeletnode-managementconfiguration
⚙️ Configuration intermediate

Kustomize: Customize K8s Manifests

Use Kustomize to customize Kubernetes manifests without templates. Overlays, patches, configMapGenerator, secretGenerator.

⏱ 12 minutes kustomizeconfigurationgitops
🔒 Security intermediate

Kyverno: K8s Policy Engine Without Code

Enforce Kubernetes policies with Kyverno. Validate, mutate, and generate resources using YAML policies. Image verification, label enforcement.

⏱ 12 minutes kyvernopolicysecurity
⚙️ Configuration beginner

Kubernetes Labels Best Practices

Kubernetes labels best practices for organizing workloads. Recommended label schemas, selector patterns, naming conventions, and operational label strategies.

⏱ 10 minutes labelsbest-practicesconfiguration
🌐 Networking intermediate

Linkerd: Lightweight K8s Service Mesh

Deploy Linkerd service mesh in Kubernetes for mTLS, traffic splitting, retries, and observability. Lighter alternative to Istio with zero-config mTLS and min...

⏱ 12 minutes linkerdservice-meshnetworking
📊 Observability beginner

K8s Metrics Server: kubectl top Guide

Install Kubernetes Metrics Server for kubectl top and HPA. Resource usage monitoring, troubleshooting metrics, and custom metrics integration.

⏱ 8 minutes metricsmonitoringkubectl
⚙️ Configuration beginner

Kubernetes Namespaces: Complete Guide

Create and manage Kubernetes namespaces for multi-tenant isolation. Resource quotas, RBAC per namespace, network policies, and LimitRange configuration.

⏱ 10 minutes namespacesmulti-tenancyrbac
🔧 Troubleshooting intermediate

K8s Network Debugging: Connectivity Guide

Debug Kubernetes network issues with tcpdump, netshoot, and connectivity tests. Pod-to-pod, pod-to-service, DNS, and external connectivity troubleshooting.

⏱ 12 minutes networkingtroubleshootingdebugging
🌐 Networking intermediate

K8s NetworkPolicy: Allow and Deny Rules

Configure Kubernetes NetworkPolicy for pod-to-pod traffic control. Default deny, allow by label, namespace selectors, egress rules, and CIDR blocks.

⏱ 10 minutes networkpolicysecuritynetworking
🚀 Deployments intermediate

K8s Node Affinity and Pod Scheduling

Configure Kubernetes node affinity, pod affinity, and anti-affinity rules. nodeSelector, requiredDuringScheduling, preferredDuringScheduling, and topology.

⏱ 12 minutes schedulingnode-affinitypod-affinity
🔧 Troubleshooting beginner

Fix Untolerated Taint node-role master

Fix 'node untolerated taint node-role.kubernetes.io/master' scheduling error. Remove or tolerate control plane taints to schedule pods on master nodes.

⏱ 5 minutes taintstolerationsscheduling
📊 Observability advanced

OpenTelemetry in Kubernetes: Traces and Metrics

Deploy OpenTelemetry Collector in Kubernetes for distributed tracing and metrics. Auto-instrumentation, OTLP export, Jaeger integration.

⏱ 12 minutes opentelemetrytracingobservability
🚀 Deployments advanced

K8s Operator Pattern: Build Controllers

Build Kubernetes operators with the controller pattern. Reconciliation loops, watch events, owner references, finalizers, and operator frameworks comparison.

⏱ 15 minutes operatorscontrollerscrd
💾 Storage intermediate

K8s PV and PVC: Persistent Storage Guide

Create Kubernetes PersistentVolumes and PersistentVolumeClaims. StorageClass, dynamic provisioning, access modes, reclaim policies, and volume expansion.

⏱ 12 minutes persistent-volumesstoragepvc
💾 Storage beginner

K8s PersistentVolumeClaimSpec Reference

Complete PersistentVolumeClaimSpec reference for Kubernetes. accessModes, storageClassName, resources, selector, volumeMode, and dataSource explained.

⏱ 10 minutes storagepvcpersistent-volumes
🚀 Deployments intermediate

K8s PodDisruptionBudget PDB Guide

Configure Kubernetes PodDisruptionBudgets to protect application availability during node drains. minAvailable, maxUnavailable, and drain safety patterns.

⏱ 8 minutes pdbavailabilitynode-drain
🚀 Deployments intermediate

K8s Pod Lifecycle and Graceful Shutdown

Understand Kubernetes pod lifecycle phases, termination sequence, preStop hooks, SIGTERM handling, and terminationGracePeriodSeconds for zero-downtime shutdo...

⏱ 10 minutes pod-lifecycleterminationgraceful-shutdown
🔒 Security intermediate

K8s Pod Security Admission Standards

Configure Kubernetes Pod Security Admission with enforce, audit, and warn modes. Privileged, baseline, and restricted profiles for namespace-level pod security.

⏱ 10 minutes pod-securitysecurityadmission-controller
🚀 Deployments intermediate

K8s PriorityClass: Pod Scheduling Priority

Configure Kubernetes PriorityClass for pod scheduling priority and preemption. System-critical pods, resource guarantees, and preemption policies.

⏱ 8 minutes prioritypreemptionscheduling
🚀 Deployments beginner

Kubernetes Liveness and Readiness Probes Guide

Configure Kubernetes liveness, readiness, and startup probes for health checks. HTTP, TCP, exec probes, timing parameters, and failure threshold tuning.

⏱ 10 minutes probeshealth-checksliveness
⚙️ Configuration intermediate

K8s Projected Volumes: Combine Sources

Configure Kubernetes projected volumes to combine secrets, configmaps, downward API, and service account tokens into a single mount.

⏱ 8 minutes volumesprojectedconfiguration
📊 Observability intermediate

Prometheus: K8s Monitoring and Alerting

Deploy Prometheus monitoring in Kubernetes with kube-prometheus-stack. ServiceMonitor, PrometheusRule, Grafana dashboards, and alerting for production clusters.

⏱ 15 minutes prometheusmonitoringalerting
⚙️ Configuration intermediate

K8s QoS Classes: Guaranteed vs Burstable

Understand Kubernetes QoS classes for pod eviction priority. Guaranteed, Burstable, and BestEffort resource configurations and eviction behavior under pressure.

⏱ 8 minutes qosresource-managementeviction
🌐 Networking intermediate

Kubernetes Rate Limiting Guide

Implement rate limiting in Kubernetes with Ingress annotations, Gateway API, Envoy filters, and application-level middleware. Protect APIs from abuse.

⏱ 15 minutes rate-limitingingressgateway-api
🔒 Security intermediate

K8s RBAC: Role and RoleBinding Guide

Configure Kubernetes RBAC with Role, ClusterRole, RoleBinding, and ClusterRoleBinding. Service account permissions, least privilege, and audit examples.

⏱ 12 minutes rbacsecurityservice-accounts
🚀 Deployments beginner

K8s ReplicaSet: Maintain Pod Replicas

Understand Kubernetes ReplicaSets for maintaining desired pod count. Selector matching, scaling, ownership, and relationship to Deployments.

⏱ 8 minutes replicasetpodsscaling
⚡ Autoscaling intermediate

Kubernetes Right-Sizing and Cost Optimization

Optimize Kubernetes resource allocation with right-sizing, VPA recommendations, bin packing, request-to-limit ratios, and cost reduction best practices.

⏱ 20 minutes resourcesoptimizationcost
⚙️ Configuration intermediate

K8s ResourceQuota and LimitRange Guide

Configure Kubernetes ResourceQuota and LimitRange for namespace resource management. CPU and memory quotas, pod count limits, and default container limits.

⏱ 10 minutes resource-quotaslimitrangemulti-tenancy
🚀 Deployments intermediate

K8s Rolling Update: Deployment Strategies

Configure Kubernetes rolling update strategies with maxSurge, maxUnavailable, and recreate strategy. Blue-green, canary patterns, and rollback procedures.

⏱ 10 minutes rolling-updatedeployment-strategydeployments
🔒 Security beginner

K8s Secrets: Types and Usage Guide

Create and manage Kubernetes Secrets: Opaque, docker-registry, TLS, and basic-auth types. Mount as volumes, inject as env vars, and encrypt at rest.

⏱ 10 minutes secretssecurityencryption
🔒 Security intermediate

K8s SecurityContext: Container Hardening

Configure Kubernetes SecurityContext for pods and containers. runAsNonRoot, readOnlyRootFilesystem, capabilities, seccomp profiles, and privilege escalation.

⏱ 10 minutes security-contextsecuritycontainers
🌐 Networking advanced

Istio Service Mesh: Traffic Management

Deploy Istio service mesh in Kubernetes for traffic management, mTLS, observability, and canary deployments. VirtualService, DestinationRule.

⏱ 15 minutes istioservice-meshnetworking
🌐 Networking beginner

K8s Service Types: ClusterIP NodePort LB

Kubernetes Service types explained: ClusterIP, NodePort, LoadBalancer, and ExternalName. When to use each type with YAML examples and traffic flow diagrams.

⏱ 10 minutes servicesnetworkingload-balancer
🔒 Security intermediate

K8s ServiceAccount: Pod Identity Guide

Create Kubernetes ServiceAccounts for pod authentication. Token projection, RBAC binding, workload identity, automountServiceAccountToken, and OIDC federation.

⏱ 10 minutes service-accountssecurityrbac
🚀 Deployments intermediate

K8s Sidecar Containers: Native Support

Configure Kubernetes native sidecar containers with restartPolicy Always in initContainers. Logging sidecars, service mesh proxies, and lifecycle management.

⏱ 10 minutes sidecarcontainerspods
🚀 Deployments intermediate

K8s StatefulSet: Stable Identity Guide

Deploy stateful applications with Kubernetes StatefulSets. Stable network identity, ordered deployment, persistent storage, and headless service patterns.

⏱ 12 minutes statefulsetdeploymentsstorage
⚙️ Configuration intermediate

K8s Taints and Tolerations Explained

Configure Kubernetes taints and tolerations for pod scheduling. NoSchedule, PreferNoSchedule, NoExecute effects, GPU node taints, and drain behavior.

⏱ 10 minutes taintstolerationsscheduling
🚀 Deployments intermediate

Tekton: Cloud-Native CI/CD Pipelines

Build CI/CD pipelines with Tekton in Kubernetes. Tasks, Pipelines, PipelineRuns, workspaces, and Tekton Hub integration for cloud-native continuous delivery.

⏱ 12 minutes tektonci-cdpipelines
🚀 Deployments intermediate

K8s Topology Spread: Distribute Pods

Configure Kubernetes topology spread constraints to distribute pods across zones, nodes, and regions. maxSkew, whenUnsatisfiable, and scheduling strategies.

⏱ 10 minutes topologyschedulinghigh-availability
🔒 Security intermediate

Trivy: K8s Security Scanning and SBOM

Scan Kubernetes clusters with Trivy for vulnerabilities, misconfigurations, and secrets. Trivy Operator for continuous scanning, SBOM generation.

⏱ 10 minutes trivyvulnerability-scanningsecurity
💾 Storage intermediate

Velero: K8s Backup and Disaster Recovery

Back up and restore Kubernetes clusters with Velero. Schedule backups, restore namespaces, and migrate workloads between clusters.

⏱ 12 minutes backupdisaster-recoveryvelero
🌐 Networking intermediate

NGINX Ingress limit-burst-multiplier

Configure nginx.ingress.kubernetes.io/limit-burst-multiplier for rate limiting burst control. Tune burst size, rate limits, and 429 response handling.

⏱ 10 minutes nginxingressrate-limiting
🤖 AI & GPU intermediate

NVIDIA H300 GPU Setup on Kubernetes

Deploy NVIDIA H300 GPUs on Kubernetes. H300 vs H100 vs H200 specs comparison, memory bandwidth, GPU Operator setup, and AI inference optimization.

⏱ 15 minutes nvidiagpuh300
🤖 AI & GPU intermediate

NVIDIA PyTorch Container on Kubernetes

Deploy nvcr.io/nvidia/pytorch containers on Kubernetes for GPU training. Version selection, CUDA compatibility, multi-node DDP, and NCCL configuration.

⏱ 15 minutes nvidiapytorchgpu
⚡ Autoscaling beginner

Install VPA with hack/vpa-up.sh Script

Install Kubernetes Vertical Pod Autoscaler using hack/vpa-up.sh from the official repository. VPA components, prerequisites, and troubleshooting guide.

⏱ 10 minutes vpaautoscalinginstallation
⚙️ Configuration advanced

Air-Gap OpenShift Upgrade oc-mirror OSUS

Upgrade air-gapped OpenShift with oc-mirror and OSUS. Mirror release payloads and Cincinnati graph, configure IDMS, and drive CVO upgrades.

⏱ 45 minutes openshiftairgapdisconnected
⚙️ Configuration intermediate

Cincinnati Graph OpenShift Upgrades

Understand Cincinnati upgrade graph for OpenShift. Query graph endpoints, decode channels, blocked edges, conditional updates, and debug upgrade paths.

⏱ 15 minutes openshiftcincinnatiupgrades
⚙️ Configuration intermediate

containerd certs.d Registry CA Trust

Configure containerd to trust private registry CAs using /etc/containerd/certs.d. Set up hosts.toml for custom CA certificates and mirror registries.

⏱ 15 minutes containerdregistrytls
🤖 AI & GPU intermediate

GenAI-Perf Benchmark LLM Kubernetes

Benchmark LLM inference with GenAI-Perf on Kubernetes. Use --service-kind openai for vLLM, NIM, and TGI. Measure TTFT, ITL, and throughput.

⏱ 20 minutes genai-perfbenchmarkingllm
🔒 Security intermediate

GKE OIDC Issuer Workload Identity

Enable OIDC issuer on GKE with --enable-oidc-issuer. Configure workload identity federation for cross-cloud auth and external IdP integration.

⏱ 20 minutes gkeoidcworkload-identity
🔧 Troubleshooting intermediate

Journald Verify Config Kubernetes Nodes

Validate journald configuration on Kubernetes nodes. Fix journal corruption, tune storage limits, configure persistence, and troubleshoot systemd-journald.

⏱ 15 minutes journaldsystemdlogging
⚙️ Configuration beginner

kubectl create secret docker-registry

Create Kubernetes Docker registry secrets with --docker-password-stdin. Authenticate to private registries and configure imagePullSecrets securely.

⏱ 10 minutes kubectlsecretsregistry
🌐 Networking advanced

NMState Bond LACP Configuration OpenShift

Configure LACP bonding with NMState on OpenShift. NodeNetworkConfigurationPolicy for 802.3ad bonds, VLAN tagging, and storage network bonds.

⏱ 20 minutes nmstatebondinglacp
🔧 Troubleshooting intermediate

NXDOMAIN DNS Troubleshooting Kubernetes

Fix NXDOMAIN errors in Kubernetes. Debug CoreDNS failures, ndots configuration, search domain issues, and external DNS lookup problems.

⏱ 15 minutes dnsnxdomaincoredns
🔧 Troubleshooting advanced

oc-mirror Troubleshooting Disconnected

Troubleshoot oc-mirror failures in disconnected OpenShift. Fix archive corruption, registry auth errors, v1/v2 mismatches, and delta mirror issues.

⏱ 20 minutes oc-mirrordisconnectedopenshift
🔧 Troubleshooting advanced

OpenShift Cluster Operator Upgrade Debug

Debug degraded cluster operators during OpenShift upgrades. Identify stuck operators, decode status conditions, and unblock stalled rollouts.

⏱ 20 minutes openshiftcluster-operatorsupgrades
⚙️ Configuration intermediate

OpenShift IDMS ITMS Mirror Rules Guide

Configure IDMS and ITMS mirror rules in OpenShift for disconnected registries. NeverContactSource vs AllowContactingSource and ICSP migration.

⏱ 15 minutes openshiftidmsitms
🚀 Deployments advanced

Convert Connected to Disconnected OCP

Convert a connected OpenShift cluster to disconnected. Mirror images, configure IDMS, update pull secrets, fix Insights Operator, and verify applications.

⏱ 45 minutes openshiftdisconnectedmigration
🚀 Deployments intermediate

Disconnected Environments OpenShift

Complete guide to OpenShift disconnected and air-gapped environments. Mirror registry, oc-mirror, OLM, OSUS, IDMS, upgrades, and enclave support overview.

⏱ 15 minutes openshiftdisconnectedair-gapped
💾 Storage advanced

etcd Backup Restore Kubernetes

Back up and restore etcd in Kubernetes and OpenShift clusters. Automated snapshots, disaster recovery procedures, and cluster state restoration.

⏱ 20 minutes etcdbackupdisaster-recovery
⚙️ Configuration intermediate

IDMS ITMS ICSP Disconnected OpenShift

Configure ImageDigestMirrorSet, ImageTagMirrorSet, and ImageContentSourcePolicy for disconnected OpenShift. Redirect image pulls to your mirror registry.

⏱ 20 minutes idmsitmsicsp
💾 Storage intermediate

Kubernetes Backup Velero Guide

Set up Velero for Kubernetes cluster backup and restore. Schedule backups, protect namespaces, restore applications, and configure S3 storage backends.

⏱ 25 minutes velerobackupdisaster-recovery
⚙️ Configuration beginner

Kubernetes ConfigMap Secrets Management

Manage ConfigMaps and Secrets in Kubernetes. Create, mount, update, and secure application configuration and sensitive data effectively.

⏱ 15 minutes configmapsecretsconfiguration
🚀 Deployments intermediate

Kubernetes Deployment Strategies

Compare rolling update, recreate, blue-green, and canary deployment strategies in Kubernetes. Configuration, trade-offs, and production rollback procedures.

⏱ 18 minutes deploymentsrolling-updatecanary
⚡ Autoscaling intermediate

Kubernetes HPA Autoscaling Guide

Configure Horizontal Pod Autoscaler for automatic scaling based on CPU, memory, and custom metrics. HPA v2 policies, scaling behavior, and production tuning.

⏱ 18 minutes hpaautoscalingscaling
🌐 Networking beginner

Kubernetes Ingress Fundamentals

Configure Kubernetes Ingress for HTTP routing, TLS termination, and path-based routing. NGINX Ingress Controller setup, annotations, and multi-service routing.

⏱ 15 minutes ingressnginxtls
🌐 Networking intermediate

Kubernetes IPPool Management Guide

Configure IP address pools in Kubernetes with Whereabouts, NV-IPAM, MetalLB, and Calico IPPool for secondary networks and LoadBalancer IPs.

⏱ 20 minutes ippoolipamnetworking
🚀 Deployments beginner

Kubernetes Jobs CronJobs Guide

Run batch workloads with Kubernetes Jobs and CronJobs. Parallel execution, completion tracking, failure handling, TTL cleanup, and scheduled tasks.

⏱ 15 minutes jobscronjobsbatch
🚀 Deployments beginner

Kubernetes Probes Liveness Readiness

Configure liveness, readiness, and startup probes in Kubernetes. HTTP, TCP, exec, and gRPC probe types with real-world tuning for production workloads.

⏱ 15 minutes probeshealth-checksliveness
📊 Observability intermediate

Kubernetes Logging Fluent Bit Guide

Deploy Fluent Bit for centralized Kubernetes logging. DaemonSet configuration, parsing, filtering, and forwarding logs to Elasticsearch, Loki, or S3.

⏱ 20 minutes loggingfluent-bitobservability
⚙️ Configuration beginner

Kubernetes Namespace Management Guide

Create, manage, and organize Kubernetes namespaces for multi-tenancy. Resource isolation, RBAC scoping, namespace quotas, and lifecycle best practices.

⏱ 12 minutes namespacesmulti-tenancyrbac
🔒 Security intermediate

Kubernetes NetworkPolicy Guide

Secure pod-to-pod traffic with Kubernetes NetworkPolicies. Ingress and egress rules, namespace selectors, deny-all policies, and CNI requirements.

⏱ 18 minutes network-policysecuritynetworking
🚀 Deployments beginner

Kubernetes Node Drain Cordon Guide

Safely drain and cordon Kubernetes nodes for maintenance. Graceful pod eviction, PDB-aware drains, force drain, and maintenance window procedures.

⏱ 12 minutes node-draincordonmaintenance
💾 Storage beginner

Kubernetes Persistent Volumes Guide

Manage Kubernetes Persistent Volumes with PV, PVC, and StorageClass. Dynamic provisioning, access modes, reclaim policies, and volume expansion.

⏱ 18 minutes persistent-volumesstoragepvc
🔒 Security intermediate

Kubernetes RBAC Role ClusterRole

Configure RBAC in Kubernetes with Roles, ClusterRoles, RoleBindings, and ClusterRoleBindings. Least-privilege access for users, groups, and service accounts.

⏱ 18 minutes rbacsecurityaccess-control
⚙️ Configuration intermediate

Kubernetes ResourceQuota LimitRange

Configure ResourceQuota and LimitRange for Kubernetes namespace resource governance. CPU, memory, storage, and object count limits for multi-tenant clusters.

⏱ 18 minutes resource-quotalimit-rangemulti-tenancy
🚀 Deployments intermediate

Mirror Registry Disconnected OpenShift

Set up a mirror registry for disconnected OpenShift installations. Deploy mirror-registry for Red Hat OpenShift, configure storage, TLS, and credentials.

⏱ 30 minutes openshiftdisconnectedmirror-registry
🌐 Networking advanced

MOFED Driver for Kubernetes: Setup Guide

Install and manage MOFED drivers in Kubernetes. Network Operator integration, NicClusterPolicy, driver versions, and RDMA troubleshooting.

⏱ 25 minutes mofedmellanoxnvidia
🌐 Networking advanced

MOFED Driver Operator Build Kubernetes

Let the NVIDIA Network Operator build MOFED drivers on-node via DKMS. Kernel header detection, compile flags, and DTK integration for OpenShift.

⏱ 20 minutes mofednvidianetwork-operator
🚀 Deployments advanced

oc-mirror Plugin Disconnected OpenShift

Use oc-mirror to mirror OpenShift content for disconnected installations. ImageSetConfiguration, incremental mirrors, and operator catalog mirroring.

⏱ 35 minutes oc-mirroropenshiftdisconnected
🚀 Deployments advanced

OLM Disconnected OpenShift Operators

Use Operator Lifecycle Manager in disconnected OpenShift clusters. Mirror catalogs, create CatalogSources, and manage Operators without internet access.

⏱ 25 minutes olmoperatorsdisconnected
🔧 Troubleshooting advanced

OpenShift MCP Validation Broken Rules

Validate MachineConfigPool rules before applying in OpenShift. Detect broken MachineConfigs, degraded MCPs, and implement pre-flight checks.

⏱ 25 minutes openshiftmachineconfigmcp
⚙️ Configuration advanced

OSUS Direct vs Replicated OpenShift

Choose between direct and replicated OSUS graph data modes in OpenShift. Configure UpdateService for connected and disconnected environments.

⏱ 20 minutes openshiftosusupdate-service
📊 Observability intermediate

Prometheus Monitoring Kubernetes Guide

Deploy Prometheus for Kubernetes cluster monitoring. ServiceMonitor, PodMonitor, alerting rules, Grafana dashboards, and kube-prometheus-stack Helm install.

⏱ 25 minutes prometheusmonitoringalerting
🚀 Deployments advanced

Red Hat Quay Registry Kubernetes

Deploy and manage Quay container registry on Kubernetes. Mirror policies, robot accounts, security scanning, and integration with OpenShift.

⏱ 30 minutes quayregistryopenshift
🔧 Troubleshooting intermediate

SELinux SSH Login Failure Troubleshoot

Fix SSH login failures caused by SELinux enforcement. Diagnose AVC denials, restore file labels, fix custom SSH ports, and resolve PAM denials.

⏱ 15 minutes selinuxsshtroubleshooting
🚀 Deployments intermediate

Skopeo Container Image Operations

Use skopeo to inspect, copy, sync, and delete container images across registries. Essential tool for disconnected Kubernetes and OpenShift environments.

⏱ 20 minutes skopeocontainer-imagesregistry
🌐 Networking advanced

SR-IOV Device Plugin PF Flag on Kubernetes

Configure SR-IOV device plugin PF flag in Kubernetes. Expose physical functions as allocatable resources for exclusive RDMA access.

⏱ 20 minutes sriovdevice-pluginrdma
🌐 Networking beginner

cert-manager Cloudflare DNS01 K8s

Configure cert-manager with Cloudflare DNS01 challenge for wildcard TLS certificates on Kubernetes. API token secret, ClusterIssuer, and auto-renewal.

⏱ 15 minutes cert-managercloudflaredns01
🔧 Troubleshooting intermediate

Cilium Debug Pod Troubleshooting

Debug Kubernetes networking with Cilium debug pods and containers. cilium-dbg, netshoot, hubble observe, and endpoint connectivity troubleshooting.

⏱ 15 minutes ciliumdebugnetshoot
🚀 Deployments intermediate

CloudNativePG PostgreSQL Operator K8s

Deploy PostgreSQL with CloudNativePG operator on Kubernetes. Cluster setup, affinity, replication lag monitoring, backup, and high availability configuration.

⏱ 15 minutes cloudnativepgpostgresqloperator
🤖 AI & GPU intermediate

Continuous Batching LLM Inference K8s

Configure continuous batching for LLM inference on Kubernetes. vLLM and TRT-LLM batch scheduling, max-num-seqs tuning, and throughput optimization.

⏱ 15 minutes continuous-batchinginferencethroughput
🤖 AI & GPU beginner

CUDA Version Compatibility K8s Guide

Match CUDA versions with GPU drivers and container images on Kubernetes. Forward compatibility, driver requirements, and container toolkit matrix.

⏱ 15 minutes cudacompatibilitydriver-version
🔧 Troubleshooting intermediate

Fix CUDA Out of Memory K8s Pods

Troubleshoot CUDA out of memory errors in Kubernetes GPU pods. Memory fragmentation, batch size tuning, gradient checkpointing, and resource limits.

⏱ 15 minutes cudaoomgpu-memory
🤖 AI & GPU advanced

DeepSpeed ZeRO Training Kubernetes

Deploy DeepSpeed ZeRO-1/2/3 for large model training on Kubernetes. Multi-node config, NCCL tuning, memory optimization, and 70B+ model training.

⏱ 15 minutes deepspeedzerodistributed-training
🤖 AI & GPU advanced

DGX H100 GPU Topology nvidia-smi

Inspect DGX H100 GPU topology with nvidia-smi topo -m. NVSwitch NV18 links, cross-socket detection, PCIe hierarchy, and NCCL performance validation.

⏱ 15 minutes dgxh100gpu-topology
📊 Observability advanced

DOCA Telemetry BlueField Kubernetes

Collect NVIDIA BlueField DPU telemetry in Kubernetes using DOCA Telemetry libraries. Monitor adaptive retransmission, PCC, diagnostics, and PCI metrics.

⏱ 30 minutes nvidiadocabluefield
🔒 Security intermediate

EDR Flexera Agents Kubernetes Deploy

Deploy EDR and Flexera agents on Kubernetes with DaemonSets. Priority classes, host path access, exclusion paths, and security agent lifecycle.

⏱ 25 minutes edrflexeracrowdstrike
⚙️ Configuration intermediate

Flexera License Management Kubernetes

Manage software licenses in Kubernetes with Flexera. FlexNet Manager, container license tracking, GPU software metering, and compliance for enterprise K8s.

⏱ 25 minutes flexeralicensingcompliance
🤖 AI & GPU beginner

GPU Feature Discovery Node Labels

Configure NVIDIA GPU Feature Discovery for automatic node labeling on Kubernetes. GPU model, driver version, CUDA, and MIG labels for scheduling.

⏱ 15 minutes gpu-feature-discoverynode-labelsscheduling
🤖 AI & GPU intermediate

GPU Node Affinity Scheduling K8s

Schedule GPU workloads with node affinity and topology on Kubernetes. GPU type selection, multi-GPU locality, and NUMA-aware pod placement.

⏱ 15 minutes node-affinitygpu-schedulingtopology
🤖 AI & GPU beginner

K8s GPU Limits Requests Configuration

Configure GPU resource limits and requests in Kubernetes pod specs. nvidia.com/gpu resource, fractional GPUs, MIG slices, and multi-GPU allocation.

⏱ 15 minutes gpu-limitsresource-requestsnvidia
⚡ Autoscaling intermediate

HPA Prometheus Custom Metrics K8s

Configure HPA with custom Prometheus metrics using prometheus-adapter on Kubernetes. Custom and external metrics, query mapping, and scaling on business KPIs.

⏱ 15 minutes hpaprometheuscustom-metrics
🌐 Networking intermediate

K8s Ingress Rate Limit NGINX Config

Configure rate limiting on Kubernetes NGINX Ingress. limit-rps, limit-burst-multiplier annotations, per-client limits, and webhook protection patterns.

⏱ 15 minutes ingressrate-limitnginx
🌐 Networking advanced

LACP Storage Switch Kubernetes Guide

Configure LACP bond aggregation for NFS and iSCSI storage switches in Kubernetes clusters. 802.3ad setup, hash policies, switch config, and failure handling.

⏱ 30 minutes lacpbondingstorage
🤖 AI & GPU advanced

LoRA Adapter Serving vLLM on K8s

Serve multiple LoRA adapters with a single vLLM base model on Kubernetes. Dynamic loading, per-request routing, and multi-tenant fine-tuned models.

⏱ 15 minutes lorafine-tuningvllm
🤖 AI & GPU advanced

Multi-GPU PyTorch DDP on Kubernetes

Run PyTorch DistributedDataParallel across multiple GPUs on Kubernetes. torchrun, NCCL backend, pod topology, and scaling to multi-node training.

⏱ 15 minutes pytorchddpmulti-gpu
💾 Storage advanced

NFS Tenant Segregation Kubernetes

Implement NFS tenant segregation in Kubernetes with six-layer defense-in-depth. Exports, StorageClass, quotas, and admission policies.

⏱ 35 minutes nfsmulti-tenancystorage
🌐 Networking beginner

NMState Operator Install OpenShift K8s

Install and configure the NMState operator on OpenShift and Kubernetes. Enable declarative node networking with NNCP, NodeNetworkState, and enactments.

⏱ 15 minutes nmstateoperatoropenshift
🌐 Networking intermediate

NNCP NodeNetworkConfigurationPolicy

Master NodeNetworkConfigurationPolicy (NNCP) on OpenShift and Kubernetes. Configure VLANs, bonds, bridges, SR-IOV, MTU, static IPs, and DNS with NMState.

⏱ 25 minutes nncpnmstateopenshift
🤖 AI & GPU intermediate

NVIDIA Driver Update K8s Nodes Guide

Safely update NVIDIA GPU drivers on Kubernetes nodes. Rolling updates, drain strategy, driver compatibility matrix, and GPU Operator upgrades.

⏱ 15 minutes nvidia-driverupgraderolling-update
🔧 Troubleshooting intermediate

NVIDIA GPU Operator Troubleshooting

Fix common NVIDIA GPU Operator issues on Kubernetes. Driver pod crashes, toolkit failures, device plugin not ready, and validation pod errors.

⏱ 15 minutes gpu-operatornvidiadriver
🤖 AI & GPU advanced

NVIDIA PeerMem GPUDirect RDMA K8s

Configure nvidia_peermem and ib_register_peer_memory_client for GPUDirect RDMA on Kubernetes. Module loading and modprobe invalid argument fix.

⏱ 15 minutes nvidia-peermemgpudirectrdma
🤖 AI & GPU beginner

nvidia-smi Monitoring in K8s Pods

Run nvidia-smi inside Kubernetes pods for GPU monitoring. Memory usage, temperature, utilization, and automated health checks with liveness probes.

⏱ 15 minutes nvidia-smigpu-monitoringhealth-check
🔒 Security intermediate

OpenShift ACS RHACS Security Guide

Deploy Red Hat Advanced Cluster Security (RHACS/ACS) on OpenShift. Vulnerability scanning, compliance, runtime threat detection, and policy enforcement.

⏱ 15 minutes openshiftacsrhacs
🚀 Deployments advanced

OpenShift Upgrade Disconnected Cluster

Step-by-step guide to upgrading OpenShift in a disconnected air-gapped environment. Mirror releases, configure ICSP/IDMS, validate, and execute the upgrade.

⏱ 45 minutes openshiftdisconnectedair-gapped
🚀 Deployments intermediate

OpenShift Upgrade Service Graph Guide

Use the OpenShift Upgrade Service (OSUS) and Cincinnati graph to plan safe upgrade paths. Channel selection, conditional edges, and air-gapped graph data.

⏱ 20 minutes openshiftupgradecincinnati
🚀 Deployments advanced

OSUS Operator Disconnected OpenShift

Deploy the OpenShift Update Service (OSUS) operator for disconnected clusters. Local Cincinnati graph, graph-data image mirroring, and upgrade path serving.

⏱ 30 minutes osusopenshiftdisconnected
🤖 AI & GPU intermediate

Prefix Caching vLLM KV Cache K8s

Enable automatic prefix caching in vLLM on Kubernetes for shared-prompt workloads. KV cache reuse, memory savings, and chatbot latency optimization.

⏱ 15 minutes prefix-cachingkv-cachevllm
🤖 AI & GPU intermediate

Quantize LLMs AWQ GPTQ for K8s Deploy

Deploy AWQ and GPTQ quantized LLMs on Kubernetes. 4-bit inference with vLLM, model conversion, accuracy trade-offs, and GPU memory savings guide.

⏱ 15 minutes quantizationawqgptq
🔒 Security advanced

RHACS NFS Tenant Security Kubernetes

Enforce NFS tenant isolation with RHACS policies. Detect direct NFS mounts, wrong StorageClass usage, privileged escalation, and cross-tenant violations.

⏱ 25 minutes rhacsstackroxnfs
🤖 AI & GPU advanced

Speculative Decoding with vLLM on Kubernetes

Enable speculative decoding in vLLM on Kubernetes for 2-3x faster LLM inference. Draft model selection, acceptance rates, and latency optimization.

⏱ 15 minutes speculative-decodingvllminference-optimization
🤖 AI & GPU intermediate

TensorRT-LLM vs vLLM Benchmark 2026

Compare TensorRT-LLM vs vLLM for LLM inference on Kubernetes. TTFT, throughput, GPU utilization benchmarks, and when to use each inference engine.

⏱ 15 minutes tensorrt-llmvllmbenchmark
🤖 AI & GPU intermediate

vLLM Alternatives LLM Inference K8s

Compare vLLM alternatives for LLM inference on Kubernetes. TensorRT-LLM, SGLang, NVIDIA NIM, Ollama, and text-generation-inference feature comparison.

⏱ 15 minutes vllmalternativesinference
🔒 Security intermediate

Ubuntu 26.04 LTS K8s Node Hardening

Harden Kubernetes nodes with Ubuntu 26.04 LTS Resolute Raccoon. sudo-rs Rust rewrite, APT rollback, Kernel 7.0 TDX, ROCm GPU, and secure base images.

⏱ 20 minutes ubuntuhardeningsudo-rs
🌐 Networking advanced

Cilium ClusterMesh Multi-Cluster

Connect multiple K8s clusters with Cilium ClusterMesh. Shared services, global service discovery, and cross-cluster network policies.

⏱ 15 minutes ciliumclustermeshmulti-cluster
📊 Observability intermediate

Cilium Hubble Observability Guide

Monitor Kubernetes network flows with Cilium Hubble. CLI usage, Hubble UI, flow filtering, DNS visibility, and L7 HTTP observability.

⏱ 15 minutes ciliumhubblenetwork-flows
⚙️ Configuration intermediate

crun vs runc Container Runtime 2026

Compare crun vs runc container runtimes for Kubernetes. Performance benchmarks, memory usage, cgroup v2 support, and migration from runc to crun guide.

⏱ 15 minutes crunrunccontainer-runtime
💾 Storage intermediate

CSI Snapshot and Restore K8s Guide

Create and restore volume snapshots with CSI on K8s. VolumeSnapshot, VolumeSnapshotClass, and cross-namespace clone patterns.

⏱ 15 minutes csisnapshotrestore
🔧 Troubleshooting advanced

Fix etcd Leader Election Timeout

Troubleshoot etcd leader election timeouts in K8s. Disk latency, network partition, heartbeat interval, and recovery steps.

⏱ 15 minutes etcdleader-electiontimeout
🔧 Troubleshooting intermediate

Fix Certificate Errors Kubernetes

Troubleshoot TLS certificate errors in K8s. x509 unknown authority, expired certs, cert-manager issues, and custom CA bundles.

⏱ 15 minutes certificatetlsx509
🔧 Troubleshooting intermediate

Fix DNS Resolution Issues in Kubernetes

Troubleshoot Kubernetes DNS resolution failures. ndots, search domains, CoreDNS CrashLoop, and pod-level DNS debugging steps.

⏱ 15 minutes dnsresolutioncoredns
🔧 Troubleshooting advanced

Fix Pod cgroup Memory Errors K8s

Fix cgroup memory limit and OOM errors in Kubernetes pods. Covers cgroup v2 migration, memory.max, swap settings, and kernel tuning for stable workloads.

⏱ 15 minutes cgroupmemoryoom
🔧 Troubleshooting intermediate

Fix Service Not Reachable in Kubernetes

Debug Kubernetes Service connectivity issues. Endpoint selection, kube-proxy rules, DNS resolution, and NetworkPolicy blocks.

⏱ 15 minutes serviceconnectivityendpoints
🎯 Helm intermediate

Helm Chart Dependencies: Complete Guide

Manage Helm chart dependencies and subcharts. Condition flags, tags, import-values, alias patterns, and dependency update workflow for K8s.

⏱ 15 minutes helmdependenciessubcharts
🎯 Helm intermediate

Helm Hooks and Lifecycle Management Guide

Master Helm hooks for Kubernetes deployments. Pre-install, post-install, pre-upgrade, hook weights, deletion policies, and database migration patterns.

⏱ 15 minutes helmhookslifecycle
🎯 Helm beginner

Helm Rollback and History Guide

Roll back Helm releases and manage revision history. Diagnose failed upgrades, compare revisions, and automate rollback.

⏱ 15 minutes helmrollbackhistory
🎯 Helm beginner

Helm Values Override Patterns Explained

Master Helm values override patterns. CLI flags, multiple files, JSON values, and precedence rules for complex deployments.

⏱ 15 minutes helmvaluesoverride
🔧 Troubleshooting beginner

Fix 502 Bad Gateway Kubernetes Ingress

Fix 502 Bad Gateway errors in Kubernetes Ingress. Backend not ready, timeout tuning, readiness probes, and NGINX ingress controller troubleshooting.

⏱ 15 minutes 502-bad-gatewayingresstroubleshooting
⚙️ Configuration intermediate

K8s Admission Controllers List Guide

Complete list of Kubernetes admission controllers. Enable and disable controllers, PodSecurity, ResourceQuota, and custom validating webhooks guide.

⏱ 15 minutes admission-controllerwebhookvalidation
⚙️ Configuration beginner

Kubernetes API Versions Explained

Understand K8s API versions: alpha, beta, stable. API deprecation policy, migration strategy, and kubectl api-versions usage.

⏱ 15 minutes api-versionsdeprecationmigration
🚀 Deployments intermediate

ArgoCD Sync Waves and Hooks Guide

Configure ArgoCD sync waves for ordered deployments. Wave ordering, sync hooks, resource health checks, and dependency management patterns.

⏱ 15 minutes argocdsync-waveshooks
🌐 Networking intermediate

Calico NetworkPolicy K8s Guide

Configure Calico NetworkPolicy for K8s. GlobalNetworkPolicy, host endpoints, application layer policies, and DNS policy rules.

⏱ 15 minutes caliconetworkpolicyglobal
🚀 Deployments intermediate

Canary Deployment Kubernetes Guide

Implement canary deployments on K8s without service mesh. Native K8s strategy, traffic splitting, and automated rollback.

⏱ 15 minutes canarydeploymentrollout
🔒 Security intermediate

Certificate Expiration Management K8s

Monitor and manage Kubernetes certificate expiration. kubeadm cert check, cert-manager alerts, auto-renewal, and preventing expired certificate outages.

⏱ 15 minutes certificatesexpirationkubeadm
⚡ Autoscaling intermediate

Cluster Autoscaler Kubernetes Guide

Configure Kubernetes Cluster Autoscaler for automatic node scaling. Scale-down delay, expanders, priority, and integration with cloud providers.

⏱ 15 minutes cluster-autoscalernode-scalingcloud
🌐 Networking intermediate

CNI Comparison 2026 Kubernetes

Compare Kubernetes CNI plugins: Calico, Cilium, Flannel, Multus, and OVN-Kubernetes. Performance benchmarks, features, and selection guidance.

⏱ 15 minutes cnicalicocilium
⚙️ Configuration intermediate

ConfigMap subPath Update Fix K8s

Handle ConfigMap subPath mount limitations in Kubernetes. Why subPath mounts don't auto-update, workarounds, and alternative patterns.

⏱ 15 minutes configmapsubpathvolume-mount
🌐 Networking intermediate

CoreDNS Custom Config Kubernetes

Customize CoreDNS on Kubernetes for advanced DNS needs. Forward zones, stub domains, custom records, caching tuning, and DNS debugging.

⏱ 15 minutes corednsdnscustom-config
🌐 Networking intermediate

DNS Policy Configuration Kubernetes

Configure Kubernetes DNS policies: Default, ClusterFirst, ClusterFirstWithHostNet, and None. Custom resolv.conf, ndots tuning, and DNS performance.

⏱ 15 minutes dnsdns-policycoredns
⚙️ Configuration beginner

Docker Registry Secret kubectl

Create Kubernetes docker-registry secrets with kubectl. --docker-password-stdin, .dockerconfigjson format, and automating registry authentication.

⏱ 10 minutes docker-registrysecretauthentication
⚙️ Configuration beginner

Kubernetes Downward API: Complete Guide

Expose pod and container metadata to applications using the Downward API. Environment variables, volume files, fieldRef, resourceFieldRef, and common patterns.

⏱ 15 minutes downward-apimetadataenvironment-variables
📊 Observability intermediate

EFK Logging System Principles K8s

EFK logging system principles for Kubernetes. Elasticsearch, Fluentd, Kibana architecture, log pipeline design, parsing, and retention strategies.

⏱ 15 minutes efkelasticsearchfluentd
💾 Storage beginner

emptyDir tmpfs Kubernetes Guide

Configure emptyDir volumes with memory-backed tmpfs on Kubernetes. Size limits, memory accounting, sidecar sharing, and ephemeral cache patterns.

⏱ 15 minutes emptydirtmpfsephemeral-storage
⚙️ Configuration beginner

Env Variables from ConfigMap K8s

Inject environment variables from ConfigMaps and Secrets in Kubernetes. envFrom, valueFrom, configMapKeyRef, and secretKeyRef patterns.

⏱ 15 minutes environment-variablesconfigmapsecrets
⚙️ Configuration beginner

envFrom ConfigMapRef Kubernetes

Inject all ConfigMap keys as environment variables using envFrom configMapRef in Kubernetes. Bulk injection, prefix, and selective key patterns.

⏱ 10 minutes envfromconfigmaprefenvironment-variables
⚙️ Configuration advanced

etcd Performance Tuning Kubernetes

Tune etcd for Kubernetes cluster performance. Disk IOPS requirements, compaction, defragmentation, and monitoring etcd health metrics.

⏱ 15 minutes etcdperformancetuning
🔒 Security intermediate

Falco Rules for Kubernetes: Complete Guide

Write custom Falco rules for K8s runtime security. Syscall detection, container escape alerts, and cryptomining detection.

⏱ 15 minutes falcorulesruntime-security
💾 Storage intermediate

fsGroupChangePolicy OnRootMismatch

Configure fsGroupChangePolicy OnRootMismatch to skip recursive chown on volume mounts. Fix slow pod startup with large persistent volumes on Kubernetes.

⏱ 10 minutes fsgroupchangepolicyonrootmismatchchown
🚀 Deployments intermediate

Flux Sources Config Kubernetes

Configure Flux source controllers for GitOps on Kubernetes. GitRepository, HelmRepository, OCIRepository, and Bucket sources for multi-source deployments.

⏱ 15 minutes fluxgitopssources
📊 Observability intermediate

Grafana Dashboards for Kubernetes Guide

Import and customize Grafana dashboards for Kubernetes monitoring. Dashboard 315, 6417, kube-prometheus-stack, and custom panel creation.

⏱ 10 minutes grafanadashboardsmonitoring
💾 Storage beginner

hostPath vs PVC Kubernetes Guide

Compare hostPath and PVC storage options for Kubernetes. Security risks of hostPath, node affinity constraints, and when to use each storage type.

⏱ 15 minutes hostpathpvccomparison
⚡ Autoscaling beginner

HPA Max Replicas Configuration K8s

Set max replicas for Kubernetes HPA to control autoscaling ceiling. maxReplicas tuning, scaling behavior, stabilization window, and cost protection strategies.

⏱ 10 minutes hpamax-replicasautoscaling
⚡ Autoscaling beginner

HPA Tutorial for Kubernetes Beginners

Step-by-step HPA tutorial for Kubernetes. Create, monitor, and tune Horizontal Pod Autoscalers with kubectl commands and YAML examples.

⏱ 10 minutes hpatutorialkubectl
🔒 Security intermediate

Trivy Image Scanning Kubernetes

Scan container images with Trivy on K8s. Admission webhook, CI/CD integration, CIS benchmarks, and vulnerability reporting.

⏱ 15 minutes trivyimage-scanningvulnerability
⚙️ Configuration beginner

imagePullSecrets Pod Config K8s

Configure imagePullSecrets for pulling from private container registries on Kubernetes. Docker registry secrets, service account default.

⏱ 10 minutes imagepullsecretsregistryauthentication
🌐 Networking beginner

Ingress Path Routing Kubernetes

Configure Kubernetes Ingress for path-based and host-based routing. PathType Prefix vs Exact, rewrite rules, and multi-service routing patterns.

⏱ 15 minutes ingressroutingpath-based
⚡ Autoscaling intermediate

Karpenter Node Autoscaler for Kubernetes

Scale Kubernetes nodes with Karpenter. NodePool configuration, instance selection, consolidation, and cost optimization vs Cluster Autoscaler.

⏱ 15 minutes karpenternode-autoscalingcost-optimization
⚡ Autoscaling intermediate

KEDA Scalers Guide for Kubernetes

Configure KEDA scalers for event-driven autoscaling on Kubernetes. Covers Kafka, RabbitMQ, Prometheus, and cron trigger configuration.

⏱ 15 minutes kedascalersevent-driven
🚀 Deployments beginner

KIND Local Kubernetes Dev Guide

Use KIND for local Kubernetes development. Multi-node clusters, ingress setup, load balancer, persistent storage, and CI/CD integration.

⏱ 15 minutes kindlocaldevelopment
🔧 Troubleshooting beginner

kubectl exec Into Pods: Complete Guide

Use kubectl exec to debug running pods. Interactive shells, non-interactive commands, multi-container pods, and ephemeral debug containers.

⏱ 15 minutes kubectlexecdebug
🤖 AI & GPU advanced

Kubeflow PyTorchJob Training K8s

Run distributed PyTorch training on Kubernetes with Kubeflow PyTorchJob. ElasticPolicy, nproc_per_node, RDMA configuration, and multi-GPU scaling.

⏱ 15 minutes kubeflowpytorchjobdistributed-training
⚙️ Configuration beginner

K8s Labels vs Annotations Explained

Kubernetes labels vs annotations differences explained. When to use each, recommended labels, label selectors, and annotation best practices for K8s.

⏱ 15 minutes labelsannotationsmetadata
🌐 Networking beginner

Let's Encrypt Ingress Kubernetes

Set up Let's Encrypt TLS certificates for Kubernetes Ingress with cert-manager. HTTP-01 challenge, automatic renewal, and HTTPS redirect configuration.

⏱ 10 minutes letsencryptingresstls
💾 Storage intermediate

Local Persistent Volumes Kubernetes

Configure local persistent volumes on Kubernetes for high-performance storage. Node affinity, local-path-provisioner, and SSD-backed database workloads.

⏱ 15 minutes local-pvpersistent-volumenode-affinity
🚀 Deployments intermediate

K8s Multi-Cluster Management Guide

Kubernetes multi-cluster management guide. Federation, Cluster API, Rancher, and GitOps patterns for fleet management across production environments.

⏱ 10 minutes multi-clusterfederationfleet
🔧 Troubleshooting beginner

Fix Namespace Stuck Terminating K8s

Fix Kubernetes namespaces stuck in Terminating state. Finalizer removal, API resource cleanup, and force deletion of stuck namespaces.

⏱ 15 minutes namespaceterminatingfinalizers
🔒 Security beginner

NetworkPolicy Examples Cookbook K8s

Copy-paste Kubernetes NetworkPolicy examples. Default deny all, allow DNS, allow specific namespace, database access, and external egress patterns.

⏱ 10 minutes networkpolicyexamplesdeny
🔧 Troubleshooting intermediate

Fix Node NotReady Status in Kubernetes

Troubleshoot Kubernetes nodes in NotReady state. Kubelet issues, disk pressure, network problems, certificate expiration, and recovery procedures.

⏱ 15 minutes node-notreadykubelettroubleshooting
🔧 Troubleshooting beginner

Fix node-role.kubernetes.io/master

Remove the node-role.kubernetes.io/master taint to schedule pods on control plane nodes. Single-node clusters, tolerations, and untolerated taint fix.

⏱ 10 minutes taintmastercontrol-plane
🔒 Security advanced

K8s OIDC Authentication Login Guide

Configure OIDC authentication for Kubernetes API server. --enable-oidc-issuer with GKE, Keycloak, Dex, kubelogin plugin, and RBAC SSO integration.

⏱ 15 minutes oidcauthenticationsso
🔧 Troubleshooting beginner

Fix OOMKilled Kubernetes Guide

Troubleshoot and fix OOMKilled errors in Kubernetes. Memory limit tuning, Java heap sizing, memory leak detection, and VPA recommendations.

⏱ 15 minutes oomkilledmemorytroubleshooting
🚀 Deployments intermediate

Pod Disruption Budget Best Practices

Configure PodDisruptionBudgets for high availability on Kubernetes. minAvailable vs maxUnavailable, voluntary disruptions, and upgrade coordination.

⏱ 15 minutes pdbdisruption-budgetavailability
🔧 Troubleshooting beginner

Fix Pending Pods Kubernetes Guide

Troubleshoot Kubernetes pods stuck in Pending state. Insufficient resources, node selector mismatch, PVC binding, taints, and scheduling failures.

⏱ 15 minutes pendingschedulingtroubleshooting
💾 Storage beginner

PersistentVolumeClaim PVC Guide K8s

Create and manage PersistentVolumeClaims on Kubernetes. Access modes, storage classes, volume expansion, and namespace-scoped PVC lifecycle.

⏱ 15 minutes pvcpersistent-volumestorage
🔧 Troubleshooting intermediate

Fix Pod Eviction Kubernetes Guide

Troubleshoot Kubernetes pod evictions. DiskPressure, MemoryPressure, ephemeral storage limits, and eviction thresholds configuration.

⏱ 15 minutes evictiondisk-pressurememory-pressure
🔧 Troubleshooting beginner

Pod Lifecycle and States Guide

Understand Kubernetes pod lifecycle phases and container states. Pending, Running, Succeeded, Failed, Unknown, and troubleshooting stuck pods.

⏱ 15 minutes pod-lifecyclepod-statespending
🔒 Security intermediate

RBAC Audit Review Kubernetes Guide

Audit Kubernetes RBAC permissions for security compliance. Identify over-permissioned roles, service account privileges, and least-privilege enforcement.

⏱ 15 minutes rbacauditcompliance
🚀 Deployments beginner

Readiness Liveness Startup Probes

Configure Kubernetes health probes correctly. When to use each probe type, common mistakes, and production-ready probe configurations.

⏱ 15 minutes probesreadinessliveness
🚀 Deployments beginner

Readiness Probe Kubernetes Guide

Configure readiness probes correctly on Kubernetes. HTTP, TCP, exec probes, failure threshold tuning, and why readiness probes should never check databases.

⏱ 10 minutes readiness-probehealth-checkhttp-get
⚙️ Configuration beginner

Resource Format 200m 256Mi Syntax

Understand Kubernetes resource format: CPU millicores (200m, 500m, 1) and memory units (256Mi, 1Gi). Syntax reference for requests, limits.

⏱ 10 minutes resourcescpumemory
🔒 Security intermediate

RuntimeClass gVisor Kubernetes

Deploy gVisor as a sandboxed container runtime on Kubernetes using RuntimeClass. Covers installation, runsc configuration, and workload isolation.

⏱ 10 minutes runtimeclassgvisorrunsc
🔒 Security intermediate

K8s Secrets Management Best Practices

Kubernetes secrets management best practices. Encryption at rest, external secrets operator, rotation strategies, and RBAC for secure secret handling.

⏱ 15 minutes secretsencryptionbest-practices
🔒 Security intermediate

K8s Security Checklist 2026 Guide

Complete Kubernetes security checklist for 2026. RBAC audit, network policies, pod security standards, image scanning, and compliance hardening steps.

⏱ 15 minutes securitychecklisthardening
🌐 Networking beginner

Service DNS Discovery Kubernetes

How Kubernetes DNS service discovery works. Service FQDN format, headless services, SRV records, and cross-namespace DNS resolution patterns.

⏱ 10 minutes dnsservice-discoveryfqdn
💾 Storage beginner

Kubernetes StorageClass Complete Guide

Configure StorageClasses for dynamic provisioning on Kubernetes. Covers reclaim policies, volume binding modes, and cloud provider examples.

⏱ 15 minutes storageclassprovisioningdynamic
🚀 Deployments beginner

terminationGracePeriodSeconds Guide

Configure terminationGracePeriodSeconds for Kubernetes pods. SIGTERM vs SIGKILL timing, connection draining, long-running tasks, and graceful shutdown.

⏱ 15 minutes terminationgraceful-shutdownsigterm
💾 Storage intermediate

Velero Snapshot Locations on Kubernetes

Configure Velero snapshot locations for Kubernetes backup. Volume snapshots, file system backup, cross-region copies, and backup verification.

⏱ 15 minutes velerosnapshotsbackup
⚡ Autoscaling intermediate

VPA Recommender Setup Kubernetes

Configure the VPA Recommender for Kubernetes resource right-sizing. Off mode recommendations, memory-only mode, and interpreting VPA suggestions.

⏱ 15 minutes vparecommenderright-sizing
⚙️ Configuration beginner

Kustomize vs Helm Comparison Guide

Kustomize vs Helm comparison for Kubernetes. When to use each tool, complexity trade-offs, GitOps compatibility, and combined workflow patterns.

⏱ 15 minutes kustomizehelmcomparison
🤖 AI & GPU advanced

NCCL Environment Variables Reference

Complete NCCL environment variables reference for Kubernetes GPU training. NCCL_IB_DISABLE, NCCL_SOCKET_IFNAME, NCCL_DEBUG, and network tuning guide.

⏱ 15 minutes ncclenvironment-variablesgpu
🤖 AI & GPU advanced

NCCL Test Benchmark Kubernetes

Run NCCL tests on Kubernetes for GPU communication benchmarking. all_reduce_perf, all_gather_perf, multi-node bandwidth, and latency validation.

⏱ 10 minutes ncclbenchmarkgpu
📊 Observability intermediate

NVIDIA DCGM Exporter GPU Monitoring

Monitor GPU metrics with DCGM Exporter on K8s. Prometheus integration, Grafana dashboards, and alerting on utilization and temperature.

⏱ 15 minutes dcgmgpu-monitoringprometheus
🤖 AI & GPU intermediate

GPU Time-Slicing vs MIG Comparison

Compare NVIDIA GPU time-slicing and MIG for K8s workloads. When to use each, performance trade-offs, and configuration examples.

⏱ 15 minutes gputime-slicingmig
⚙️ Configuration intermediate

OpenShift Lifecycle Versions Guide

OpenShift Container Platform lifecycle, version support, and upgrade planning. EUS versions, support timelines, K8s version mapping, and EOL dates.

⏱ 15 minutes openshiftlifecycleversions
🔒 Security intermediate

OpenShift OAuth Proxy Sidecar Guide

Protect K8s services with OpenShift OAuth proxy sidecar. Authentication, RBAC delegation, and SSO for internal dashboards.

⏱ 15 minutes openshiftoauthproxy
🌐 Networking beginner

OpenShift Routes vs Ingress Guide

Compare OpenShift Routes and Kubernetes Ingress. Covers edge, passthrough, and re-encrypt TLS termination, and when to use each option.

⏱ 15 minutes openshiftroutesingress
🔒 Security intermediate

OpenShift SCC Security Context Guide

Configure OpenShift Security Context Constraints for pods. Restricted, anyuid, privileged SCCs, custom SCC, and migration to PSA.

⏱ 15 minutes openshiftsccsecurity-context
🤖 AI & GPU advanced

TensorRT-LLM Kubernetes Deployment

Deploy TensorRT-LLM on K8s for optimized inference. Engine building, model conversion, and serving with Triton Inference Server.

⏱ 15 minutes tensorrt-llminferencetriton
⚡ Autoscaling intermediate

VPA Setup hack/vpa-up.sh Guide

Install Vertical Pod Autoscaler with hack/vpa-up.sh on Kubernetes. Recommender, Updater, Admission Controller components and production configuration.

⏱ 10 minutes vpavertical-pod-autoscalerinstallation
🤖 AI & GPU intermediate

vLLM Deployment Kubernetes Guide

Deploy vLLM inference engine on K8s. Model loading, tensor parallelism, continuous batching, and OpenAI-compatible API setup.

⏱ 15 minutes vllminferencellm
🔒 Security advanced

AI ML Security and Compliance Kubernetes

Secure AI and ML workloads on Kubernetes with model encryption, data governance, audit logging, network isolation for training jobs.

⏱ 20 minutes ai-securityml-compliancemodel-encryption
🤖 AI & GPU intermediate

AI Resource Allocation Optimization

Optimize GPU and memory allocation for AI workloads on Kubernetes. Right-size GPU requests, bin-packing strategies, gang scheduling.

⏱ 20 minutes gpuresource-optimizationbin-packing
🤖 AI & GPU beginner

CNCF AI Projects Landscape Kubernetes

Navigate the CNCF AI project landscape for Kubernetes. Kubeflow, KServe, KAITO, Volcano, and emerging projects for training, serving, scheduling.

⏱ 20 minutes cncfai-landscapecloud-native
🌐 Networking advanced

Dell Switch RoCEv2 PFC ECN DSCP

Configure Dell OS10 switches for lossless RoCEv2 with PFC, ECN, WRED, and DSCP-to-traffic-class mapping. Priority 3 for RDMA traffic classes 24 and 26.

⏱ 20 minutes dellswitchrocev2
🤖 AI & GPU advanced

Distributed Training TensorFlow PyTorch

Run distributed training jobs on Kubernetes with TensorFlow and PyTorch. Training Operator, multi-worker strategies, NCCL configuration.

⏱ 20 minutes distributed-trainingtensorflowpytorch
🌐 Networking advanced

ECN MachineConfig OpenShift Nodes

Enable ECN (Explicit Congestion Notification) on OpenShift nodes via MachineConfig for lossless RoCEv2 RDMA networking. Sysctl and Mellanox NIC configuration.

⏱ 15 minutes ecnmachineconfigopenshift
🤖 AI & GPU intermediate

Feast Feature Store Kubernetes

Deploy Feast feature store on Kubernetes for ML feature management. Offline and online stores, feature serving, point-in-time joins.

⏱ 20 minutes feastfeature-storeml-features
🚀 Deployments intermediate

GitLab Runner Helm Kubernetes Executor

Deploy GitLab Runner on Kubernetes with Helm. Configure concurrent jobs, internal registry, PodMonitor metrics, scale-to-zero, security contexts.

⏱ 20 minutes gitlabrunnerhelm
🤖 AI & GPU intermediate

GPU Sharing MIG and Time-Slicing Kubernetes

Share GPUs across multiple pods with NVIDIA MIG and time-slicing on Kubernetes. MIG profiles for A100/H100, time-slicing configuration.

⏱ 20 minutes gpumigtime-slicing
🤖 AI & GPU intermediate

KAITO AI Model Inference Kubernetes

Deploy AI models with KAITO (Kubernetes AI Toolchain Operator) for automated GPU provisioning, model serving, and inference workload management.

⏱ 20 minutes kaitoinferencegpu-provisioning
🤖 AI & GPU advanced

Katib Hyperparameter Tuning Kubernetes

Automate hyperparameter tuning with Katib on Kubernetes. Bayesian optimization, random search, grid search, early stopping.

⏱ 20 minutes katibhyperparameterautoml
🤖 AI & GPU advanced

KnativeServing for AI Inference OpenShift

Configure KnativeServing with scale-to-zero, GPU scheduling features, Kourier ingress, and custom domain templates for AI inference workloads on OpenShift.

⏱ 20 minutes knativeserverlessinference
🤖 AI & GPU intermediate

KServe Model Serving Kubernetes

Deploy ML models with KServe for serverless inference on Kubernetes. InferenceService, scale-to-zero, canary rollouts, model transformers.

⏱ 20 minutes kservemodel-servinginference
🤖 AI & GPU advanced

Kubeflow ML Platform Setup Kubernetes

Deploy Kubeflow as a production-ready ML platform on Kubernetes. Notebooks, pipelines, training operators, and model serving with KServe for end-to-end MLO.

⏱ 20 minutes kubeflowmlopsmachine-learning
🤖 AI & GPU intermediate

AI Cost Management on Kubernetes

Control AI infrastructure costs on Kubernetes with GPU utilization tracking, chargeback per team, spot instance strategies, right-sizing recommendations.

⏱ 20 minutes cost-managementgpu-costchargeback
🤖 AI & GPU advanced

AI Inference Optimization Kubernetes

Optimize AI inference performance on Kubernetes. Request batching, KV cache tuning, speculative decoding, continuous batching.

⏱ 20 minutes inferenceoptimizationbatching
📊 Observability intermediate

AI Workload Monitoring Kubernetes

Monitor AI and GPU workloads on Kubernetes with DCGM Exporter, Prometheus, and Grafana. GPU utilization, memory usage, inference latency.

⏱ 20 minutes gpu-monitoringdcgmprometheus
⚙️ Configuration advanced

API Priority and Fairness K8s Guide

Configure Kubernetes API Priority and Fairness to protect the API server. Covers FlowSchemas, PriorityLevelConfigurations, and request concurrency tuning.

⏱ 15 minutes api-priorityfairnessflow-schema
🚀 Deployments intermediate

Argo Rollouts Canary Blue-Green K8s

Progressive delivery with Argo Rollouts on Kubernetes. Canary, blue-green, analysis templates, and experiment-based promotion for safe deployments.

⏱ 15 minutes argo-rolloutscanaryprogressive-delivery
🚀 Deployments advanced

Canary Deployments with Flagger

Automate canary deployments in Kubernetes using Flagger with Istio, Linkerd, or NGINX ingress. Progressive traffic shifting, metric analysis.

⏱ 20 minutes canaryflaggerprogressive-delivery
🔒 Security intermediate

cert-manager Advanced Configuration

Advanced cert-manager patterns for Kubernetes. Wildcard certificates, DNS-01 challenges, certificate rotation, cross-namespace sharing.

⏱ 20 minutes cert-managertlscertificates
🔧 Troubleshooting intermediate

LitmusChaos Chaos Engineering K8s

Run chaos experiments on Kubernetes with LitmusChaos. Pod kill, network latency, disk fill, and CPU stress experiments for resilience testing.

⏱ 15 minutes chaos-engineeringlitmusresilience
🔒 Security intermediate

Cilium Network Policies Kubernetes

Advanced network policies with Cilium on Kubernetes. L7 HTTP-aware policies, DNS-based egress, identity-based security, cluster-wide policies.

⏱ 20 minutes ciliumnetwork-policyebpf
⚙️ Configuration beginner

ConfigMap Best Practices K8s Guide

ConfigMap best practices for Kubernetes applications. Size limits, binary data, environment variables vs volume mounts, and hot-reload patterns.

⏱ 15 minutes configmapconfigurationbest-practices
⚙️ Configuration intermediate

ConfigMap Reload Patterns Kubernetes

Implement automatic ConfigMap reload in Kubernetes using volume projection, Reloader operator, checksum annotations, and inotify sidecars.

⏱ 15 minutes configmapreloadreloader
⚙️ Configuration beginner

Immutable ConfigMaps and Secrets

Use immutable ConfigMaps and Secrets for performance and safety in Kubernetes. Reduce API server load, prevent accidental changes.

⏱ 15 minutes configmapsecretimmutable
⚙️ Configuration intermediate

Container Runtime Comparison K8s

Compare Kubernetes container runtimes: containerd vs CRI-O vs Kata Containers. Performance, security, and use cases for each runtime in production.

⏱ 15 minutes container-runtimecontainerdcri-o
🌐 Networking intermediate

CoreDNS Customization Guide Kubernetes

Customize CoreDNS with forward zones, rewrite rules, cache tuning, and stub domains. Troubleshoot DNS resolution failures and optimize query performance in.

⏱ 15 minutes corednsdnsnetworking
🔒 Security intermediate

Cosign Image Signing Kubernetes

Verify container image signatures with Cosign and Sigstore on Kubernetes. Policy enforcement with Kyverno, supply chain security, and SBOM attestation.

⏱ 15 minutes cosignsigstoreimage-signing
⚙️ Configuration advanced

CRD Development Kubernetes Guide

Design and implement Kubernetes Custom Resource Definitions. Schema validation, status subresource, printer columns, conversion webhooks.

⏱ 15 minutes crdcustom-resourcedevelopment
🚀 Deployments beginner

CronJob Best Practices Kubernetes

Configure Kubernetes CronJobs with concurrency policies, failure handling, timezone scheduling, resource limits, and job history cleanup.

⏱ 10 minutes cronjobschedulingbest-practices
⚙️ Configuration advanced

Crossplane Infrastructure as Code

Manage cloud infrastructure from Kubernetes with Crossplane. Covers Composite Resources, Compositions, and provider configuration for AWS and GCP.

⏱ 20 minutes crossplaneinfrastructure-as-codecloud
💾 Storage advanced

Build Custom CSI Drivers Kubernetes

Develop custom Container Storage Interface drivers for Kubernetes. CSI spec, controller and node plugins, volume lifecycle, and testing with csi-sanity.

⏱ 15 minutes csistorage-driverdevelopment
⚡ Autoscaling advanced

Custom Metrics with Prometheus Adapter

Expose application metrics to Kubernetes HPA via Prometheus Adapter. Configure custom.metrics.k8s.io for HTTP requests per second, queue depth.

⏱ 20 minutes prometheuscustom-metricshpa
🚀 Deployments advanced

Custom Scheduler Kubernetes Guide

Build and deploy custom Kubernetes schedulers for specialized workloads. Scheduler profiles, extender webhooks, scoring plugins.

⏱ 20 minutes schedulercustom-schedulerscheduling
🚀 Deployments intermediate

DaemonSet Update Strategies Kubernetes

Configure DaemonSet rolling updates with maxUnavailable, OnDelete strategy, partition rollouts, and canary updates for node-level workloads like log collec.

⏱ 15 minutes daemonsetrolling-updateondelete
🔧 Troubleshooting beginner

Debug Containers and Ephemeral Pods

Use kubectl debug with ephemeral containers to troubleshoot running pods without restart. Debug distroless images, node debugging.

⏱ 10 minutes debugephemeral-containerstroubleshooting
🔧 Troubleshooting intermediate

DNS Debugging Kubernetes Guide

Debug Kubernetes DNS issues systematically. CoreDNS troubleshooting, ndots configuration, search domains, and resolving slow DNS lookups.

⏱ 15 minutes dnscorednsdebugging
🌐 Networking advanced

EndpointSlices and Service Topology

Understand EndpointSlices for scalable service discovery in Kubernetes. Covers topology-aware routing and traffic localization for large clusters.

⏱ 15 minutes endpointsliceservice-topologyrouting
💾 Storage intermediate

Ephemeral Storage Management Guide

Manage ephemeral storage in Kubernetes with emptyDir size limits, ephemeral-storage requests and limits, and eviction thresholds.

⏱ 15 minutes ephemeral-storageemptydireviction
⚙️ Configuration advanced

etcd Backup and Restore Kubernetes

Back up and restore etcd for Kubernetes disaster recovery. Covers automated snapshots, S3 upload, and point-in-time restore procedures.

⏱ 20 minutes etcdbackuprestore
💾 Storage advanced

etcd Maintenance Operations Kubernetes

Perform etcd maintenance for Kubernetes clusters. Defragmentation, compaction, snapshot backup, member health checks, and performance monitoring with etcdctl.

⏱ 15 minutes etcdmaintenancebackup
🌐 Networking intermediate

ExternalDNS Automation Kubernetes

Automate DNS record management with ExternalDNS on Kubernetes. Route53, CloudDNS, and Azure DNS integration for Ingress, Service, and Gateway resources.

⏱ 20 minutes external-dnsdnsautomation
⚙️ Configuration advanced

Finalizers and Ownership Guide

Understand Kubernetes finalizers and owner references for resource lifecycle management. Prevent resource leaks, implement cleanup logic.

⏱ 15 minutes finalizersowner-referencesgarbage-collection
🌐 Networking intermediate

Gateway API HTTPRoute Kubernetes

Configure HTTPRoute for Kubernetes Gateway API. Path matching, header-based routing, traffic splitting, URL rewriting, and request mirroring.

⏱ 15 minutes gateway-apihttprouterouting
🤖 AI & GPU intermediate

GPU Node Provisioning Kubernetes

Automate GPU node provisioning for Kubernetes with Karpenter, Cluster Autoscaler, and cloud-specific node pools for AI and ML workloads.

⏱ 20 minutes gpukarpenterautoscaler
🤖 AI & GPU advanced

GPU Operator Advanced Configuration

Advanced NVIDIA GPU Operator configuration on Kubernetes. Driver containers, CUDA toolkit, GDS, GPUDirect RDMA, MIG manager, DCGM Exporter.

⏱ 20 minutes gpu-operatornvidiadriver
🎯 Helm intermediate

Helm Chart Testing CI/CD Guide

Test Helm charts with helm test, helm lint, chart-testing, and conftest. Unit tests, integration tests, and CI/CD pipeline integration for chart quality.

⏱ 15 minutes helmtestingchart-testing
🎯 Helm advanced

Helm Library Charts Reusable Guide

Create reusable Helm library charts for Kubernetes. Shared templates, named templates, and standardizing deployments across teams with common patterns.

⏱ 15 minutes helmlibrary-charttemplates
🎯 Helm intermediate

Helm OCI Registry Push Pull Guide

Push and pull Helm charts from OCI registries. Harbor, ECR, ACR, and GCR integration for Helm chart distribution and versioning.

⏱ 15 minutes helmociregistry
🌐 Networking advanced

DNS Autoscaling and CoreDNS Scaling

Scale CoreDNS horizontally with dns-autoscaler and proportional autoscaling. Tune cache size, configure node-local DNS cache.

⏱ 15 minutes dnscorednsautoscaling
⚡ Autoscaling intermediate

HPA Custom Metrics Scaling Guide

Scale Kubernetes workloads on custom Prometheus metrics with HPA. Prometheus Adapter, external metrics, and request-rate-based scaling for web services.

⏱ 15 minutes hpacustom-metricsprometheus-adapter
🚀 Deployments intermediate

Image Pull Optimization Kubernetes

Optimize container image pulls with pre-pulling DaemonSets, registry mirrors, image caching, and pull-through proxies for faster pod startup.

⏱ 15 minutes image-pullregistrycache
🚀 Deployments beginner

Init Container Patterns Kubernetes

Use init containers for dependency waiting, database migration, config generation, certificate fetching, and permission setup.

⏱ 10 minutes init-containerspatternsdependency
🌐 Networking advanced

Istio Traffic Management Kubernetes

Advanced Istio traffic management on Kubernetes. VirtualService routing, DestinationRule load balancing, traffic mirroring, fault injection.

⏱ 15 minutes istiotraffic-managementvirtual-service
📊 Observability intermediate

Jaeger Tracing Kubernetes Guide

Deploy Jaeger for distributed tracing on Kubernetes. Collector, storage backends, sampling strategies, and trace analysis for microservice debugging.

⏱ 15 minutes jaegertracingdistributed-tracing
🚀 Deployments intermediate

Job Completion Patterns Kubernetes

Configure Kubernetes Jobs with indexed completions, work queues, parallel processing, backoff limits, and TTL cleanup for batch workloads.

⏱ 15 minutes jobbatchparallel
🚀 Deployments beginner

Job TTL Cleanup Kubernetes Guide

Automate Kubernetes Job cleanup with TTL controller. ttlSecondsAfterFinished, CronJob history limits, and preventing completed Job accumulation.

⏱ 15 minutes jobttlcleanup
⚡ Autoscaling intermediate

KEDA Event-Driven Pod Autoscaling Guide

Scale Kubernetes workloads on external events with KEDA. Covers Kafka queue length, Prometheus metrics, and cron schedule trigger patterns.

⏱ 20 minutes kedaautoscalingevent-driven
⚙️ Configuration intermediate

Kustomize Advanced Patterns Kubernetes

Advanced Kustomize patterns for Kubernetes configuration management. Strategic merge patches, JSON patches, components, replacements.

⏱ 20 minutes kustomizeconfigurationoverlays
⚙️ Configuration intermediate

Kustomize Overlays Guide Kubernetes

Manage Kubernetes manifests with Kustomize overlays. Base and overlay patterns, strategic merge patches, JSON patches, ConfigMap generators.

⏱ 15 minutes kustomizeoverlaysconfiguration
📊 Observability intermediate

Loki Log Aggregation Kubernetes

Deploy Grafana Loki for log aggregation on Kubernetes. Promtail DaemonSet, LogQL queries, structured logging, retention policies, and Grafana integration.

⏱ 20 minutes lokiloggingpromtail
💾 Storage intermediate

Longhorn Distributed Storage K8s

Deploy Longhorn for distributed block storage on Kubernetes. Replicated volumes, snapshots, backups, and disaster recovery for bare-metal clusters.

⏱ 15 minutes longhorndistributed-storagereplication
🌐 Networking intermediate

MetalLB Bare Metal Load Balancer

Deploy MetalLB for LoadBalancer services on bare-metal Kubernetes. L2 mode, BGP mode, IP address pools, and integration with Cilium and Gateway API.

⏱ 20 minutes metallbload-balancerbare-metal
🌐 Networking advanced

Multi-Cluster Service Mesh Kubernetes

Connect multiple Kubernetes clusters with service mesh federation. Istio multi-cluster, Linkerd multi-cluster, cross-cluster service discovery.

⏱ 20 minutes multi-clusterservice-meshistio
⚙️ Configuration advanced

Multi-Cluster K8s Mgmt Patterns

Manage multiple Kubernetes clusters with kubectx, Cluster API, Fleet, and federation patterns. Context switching, workload distribution.

⏱ 20 minutes multi-clusterkubectxfleet
🔒 Security intermediate

Multi-Tenancy Namespaces Kubernetes

Implement multi-tenancy on Kubernetes with namespaces. Resource quotas, network policies, RBAC isolation, and hierarchical namespaces for team separation.

⏱ 15 minutes multi-tenancynamespacesisolation
🔧 Troubleshooting intermediate

Network Debugging Tools Kubernetes

Debug Kubernetes networking with tcpdump, netshoot, iptables tracing, conntrack inspection, and DNS resolution testing techniques.

⏱ 15 minutes networkingtcpdumpdebug
🔒 Security beginner

NetworkPolicy Recipes Cookbook K8s

Common Kubernetes NetworkPolicy recipes. Default deny, allow DNS, namespace isolation, database access, and external egress patterns for zero-trust networking.

⏱ 15 minutes network-policysecurityfirewall
🔒 Security intermediate

NetworkPolicy Zero Trust Kubernetes

Implement zero-trust networking with Kubernetes NetworkPolicies. Default-deny ingress and egress, namespace isolation, DNS egress rules, and Cilium L7 policies.

⏱ 15 minutes networkpolicyzero-trustsecurity
💾 Storage intermediate

NFS Dynamic Provisioner Kubernetes

Deploy NFS dynamic provisioner for ReadWriteMany storage on Kubernetes. NFS CSI driver, StorageClass configuration, and performance tuning with nconnect.

⏱ 15 minutes nfsstoragereadwritemany
🚀 Deployments beginner

Node Affinity Scheduling Kubernetes

Configure node affinity rules for Kubernetes pod scheduling. Required vs preferred affinity, label selectors, and combining with taints and tolerations.

⏱ 15 minutes node-affinityschedulinglabels
🚀 Deployments beginner

Node Maintenance and Drain Operations

Safely drain Kubernetes nodes for maintenance with cordon, drain, and uncordon. Handle PodDisruptionBudgets, DaemonSets, and local storage.

⏱ 15 minutes draincordonmaintenance
🔒 Security intermediate

OPA Gatekeeper Policy Enforcement

Enforce policies with OPA Gatekeeper on Kubernetes. ConstraintTemplates, Constraints, dry-run mode, audit, and common policies for security compliance.

⏱ 20 minutes opagatekeeperpolicy
📊 Observability intermediate

OpenTelemetry Collector Kubernetes

Deploy the OpenTelemetry Collector on Kubernetes for unified observability. Traces, metrics, and logs pipeline configuration, auto-instrumentation.

⏱ 20 minutes opentelemetrytracingobservability
🚀 Deployments advanced

Build Operators with Operator SDK

Build Kubernetes operators with Operator SDK. Controller reconciliation, custom resources, status subresource, leader election, and testing patterns.

⏱ 15 minutes operatorsdkcontroller
🚀 Deployments intermediate

PDB Rolling Update Coordination K8s

Coordinate PodDisruptionBudgets with rolling updates on Kubernetes. minAvailable vs maxUnavailable, voluntary disruptions, and upgrade-safe configurations.

⏱ 15 minutes pdbrolling-updatedisruption-budget
💾 Storage beginner

Persistent Volume Expansion Kubernetes

Expand PersistentVolumeClaims online without downtime. allowVolumeExpansion, filesystem resize, StatefulSet PVC expansion.

⏱ 15 minutes pvcexpansionstorage
🚀 Deployments intermediate

Pod Affinity and Anti-Affinity Guide

Configure pod affinity and anti-affinity rules for Kubernetes scheduling. Co-locate cache with app, spread replicas across nodes.

⏱ 15 minutes affinityanti-affinityscheduling
🚀 Deployments intermediate

Pod Disruption Budget Strategies

Configure PodDisruptionBudgets for zero-downtime maintenance. MinAvailable vs maxUnavailable strategies for stateful workloads, GPU training.

⏱ 10 minutes pdbdisruptionmaintenance
🔒 Security intermediate

Kubernetes Pod Security Standards Guide

Implement Pod Security Standards with Pod Security Admission. Privileged, baseline, and restricted profiles, namespace labels.

⏱ 15 minutes pod-securitypsastandards
🚀 Deployments advanced

Pod Topology Spread Advanced Patterns

Advanced topology spread constraints for Kubernetes. Multi-zone HA, GPU rack awareness, combined with affinity rules, and minDomains for scaling clusters.

⏱ 15 minutes topology-spreadschedulinghigh-availability
🚀 Deployments intermediate

Priority and Preemption Scheduling

Configure PriorityClasses for Kubernetes workload scheduling. System-critical pods, GPU training preemption, and preemptionPolicy Never for batch workloads.

⏱ 15 minutes prioritypreemptionscheduling
📊 Observability intermediate

Prometheus Alerting Rules Kubernetes

Write effective Prometheus alerting rules for Kubernetes. Alertmanager routing, inhibition, silence, and production-ready alert templates for CPU, memory.

⏱ 20 minutes prometheusalertingalertmanager
💾 Storage beginner

PV Reclaim Policy Retain vs Delete

Understand Kubernetes PersistentVolume reclaim policies. Retain vs Delete vs Recycle, recovering data from released PVs.

⏱ 15 minutes persistent-volumereclaim-policystorage
🔒 Security intermediate

RBAC Least Privilege Kubernetes

Configure Kubernetes RBAC with least-privilege Roles, ClusterRoles, and service account bindings. Audit permissions, restrict secrets access.

⏱ 15 minutes rbacsecurityleast-privilege
🔧 Troubleshooting intermediate

Fix RBAC Permission Errors K8s

Debug Kubernetes RBAC permission errors. kubectl auth can-i, impersonation testing, ClusterRole aggregation, and common permission mistakes.

⏱ 15 minutes rbactroubleshootingpermissions
⚙️ Configuration beginner

Resource Limits and Requests Guide

Configure CPU and memory requests and limits for Kubernetes pods. Guaranteed vs Burstable vs BestEffort QoS classes, OOMKill prevention.

⏱ 15 minutes resourceslimitsrequests
⚙️ Configuration advanced

CPU and Memory Limits Deep Dive

Deep dive into Kubernetes CPU and memory management. CFS bandwidth throttling, OOMKill scoring, cgroup v2 behavior, memory.high vs memory.

⏱ 15 minutes cpumemorycgroups
💾 Storage advanced

Rook Ceph Storage Kubernetes Guide

Deploy Rook-Ceph for enterprise storage on Kubernetes. Block, file, and object storage, erasure coding, and multi-site replication for production workloads.

⏱ 15 minutes rookcephblock-storage
🔒 Security intermediate

Sealed Secrets Management Kubernetes

Manage secrets securely with Bitnami Sealed Secrets on Kubernetes. Encrypt secrets for Git storage, cluster-scoped and namespace-scoped sealing.

⏱ 20 minutes sealed-secretssecretsencryption
🔒 Security intermediate

External Secrets Management Kubernetes

Integrate Kubernetes with external secret stores using External Secrets Operator. Sync secrets from HashiCorp Vault, AWS Secrets Manager, Azure Key Vault.

⏱ 15 minutes secretsvaultexternal-secrets
🔒 Security intermediate

Service Account Tokens Kubernetes

Manage Kubernetes service account tokens securely. Projected volumes, bound tokens, token request API, and eliminating long-lived tokens for zero-trust aut.

⏱ 15 minutes service-accounttokensauthentication
🔒 Security intermediate

Service Accounts and Workload Identity

Configure Kubernetes service accounts with cloud workload identity for AWS IRSA, GCP Workload Identity, and Azure AD pod federation.

⏱ 15 minutes service-accountworkload-identityirsa
🌐 Networking intermediate

Service Mesh Comparison Kubernetes

Compare Istio, Linkerd, and Cilium service mesh for Kubernetes. mTLS, observability, traffic management, resource overhead.

⏱ 15 minutes service-meshistiolinkerd
🚀 Deployments intermediate

Kubernetes StatefulSet Management Guide

Manage stateful applications on Kubernetes with StatefulSets. Ordered deployment, stable network identity, persistent storage.

⏱ 15 minutes statefulsetstatefuldatabases
💾 Storage intermediate

Storage Classes and Provisioners

Configure Kubernetes StorageClasses for dynamic volume provisioning. CSI drivers, reclaim policies, volume expansion, topology-aware provisioning.

⏱ 20 minutes storage-classcsipersistent-volume
📊 Observability intermediate

Grafana Tempo Tracing Kubernetes

Deploy Grafana Tempo for cost-effective distributed tracing on Kubernetes. Object storage backend, TraceQL queries, and Grafana integration.

⏱ 15 minutes tempotracinggrafana
📊 Observability advanced

Thanos HA Prometheus Kubernetes

Scale Prometheus with Thanos for high availability and long-term storage on Kubernetes. Sidecar, Store, Compactor, and Query frontend for multi-cluster metrics.

⏱ 15 minutes thanosprometheushigh-availability
🌐 Networking intermediate

Topology-Aware Routing Kubernetes

Enable topology-aware routing for cost optimization on Kubernetes. Zone-local traffic, EndpointSlice hints, and reducing cross-zone data transfer costs.

⏱ 15 minutes topologyroutingzone-aware
💾 Storage intermediate

Velero Backup and Restore Kubernetes

Back up and restore Kubernetes applications with Velero. Scheduled backups, cross-cluster migration, selective restore, and disaster recovery workflows.

⏱ 20 minutes velerobackuprestore
⚡ Autoscaling intermediate

Vertical Pod Autoscaler Deep Dive

Configure VPA for automatic memory and CPU right-sizing in Kubernetes. Recommendation modes, update policies, VPA with HPA coexistence, and GPU workload tuning.

⏱ 15 minutes vpaautoscalingright-sizing
⚡ Autoscaling intermediate

VPA Resource Right-Sizing Kubernetes

Use Vertical Pod Autoscaler to right-size Kubernetes resource requests and limits. Off mode for recommendations, Auto mode for live adjustment.

⏱ 20 minutes vpavertical-pod-autoscalerright-sizing
🤖 AI & GPU intermediate

Kueue Job Queuing Fair Sharing Kubernetes

Implement fair-share GPU job queuing with Kueue on Kubernetes. ClusterQueues, LocalQueues, ResourceFlavors, and cohort-based borrowing for multi-team AI cl.

⏱ 20 minutes kueuejob-queuingfair-sharing
🤖 AI & GPU advanced

LLM Deployment Challenges Kubernetes

Address common LLM deployment challenges on Kubernetes. GPU memory management, model loading optimization, inference latency tuning, batch scheduling.

⏱ 20 minutes llmdeploymentgpu-memory
🌐 Networking advanced

Mellanox RoCE DSCP QoS DaemonSet

Deploy a DaemonSet that configures DSCP trust, PFC priority 3, and RoCE ToS 106 on all Mellanox PFs. Uses DOCA driver image with ibdev2netdev, mlnx_qos.

⏱ 15 minutes mellanoxrocedscp
🤖 AI & GPU intermediate

ML Pipeline Automation Kubernetes

Automate ML pipelines on Kubernetes with Kubeflow Pipelines, Argo Workflows, and Tekton. Data preprocessing, training, evaluation, model registration.

⏱ 20 minutes ml-pipelinekubeflow-pipelinesargo-workflows
🤖 AI & GPU advanced

ModelMesh Multi-Model Serving Kubernetes

Deploy hundreds of ML models on shared GPU infrastructure with ModelMesh. Intelligent model loading and unloading, memory management, routing.

⏱ 20 minutes modelmeshmulti-modelinference
🤖 AI & GPU advanced

Multi-Cloud AI Workloads Kubernetes

Run AI workloads across multiple cloud providers with Kubernetes. GPU instance availability, spot pricing arbitrage, model portability.

⏱ 20 minutes multi-cloudgpu-availabilityspot-instances
🤖 AI & GPU advanced

NCCL SR-IOV GDS PyTorch Configuration

Configure NCCL with SR-IOV RDMA and GPUDirect Storage on Kubernetes. PyTorch 25.11 container with NCCL 2.28, CUDA 13, MOFED 5.4, GDRCopy 2.

⏱ 30 minutes ncclsriovgds
🌐 Networking advanced

RDMA Network QoS Traffic Classes DCQCN

Complete RDMA network QoS architecture with traffic classes TC0-TC6, DSCP and dot1p mappings, PFC, ECN, WRED, and DCQCN congestion control for lossless RoC.

⏱ 20 minutes rdmaqostraffic-class
🌐 Networking advanced

RoCEv2 End-to-End Lossless Stack

Complete RoCEv2 lossless fabric configuration from GPU node to switch and back. Dell OS10 switches, Mellanox NICs, OpenShift MachineConfig, PFC, ECN.

⏱ 30 minutes rocev2losslesspfc
🤖 AI & GPU advanced

Volcano Job minAvailable Gang Schedule

Volcano batch scheduling with minAvailable gang scheduling on Kubernetes. Job configuration, queue policies, and AI training workload scheduling.

⏱ 20 minutes volcanobatch-schedulinggang-scheduling
🤖 AI & GPU advanced

AIPerf Offline vLLM Benchmarking

Benchmark vLLM inference with AIPerf in air-gapped Kubernetes clusters. Use dummy tokenizers, offline mode, custom endpoints.

⏱ 15 minutes aiperfvllmbenchmarking
🌐 Networking advanced

ib_write_bw RDMA Bandwidth Testing

Run ib_write_bw from perftest on Kubernetes to measure RDMA write bandwidth between GPU nodes. Full CLI reference, bidirectional tests, HugePages.

⏱ 18 minutes ib-write-bwperftestrdma
⚙️ Configuration intermediate

Disable OperatorHub Default Sources

Disable default OperatorHub catalog sources in OpenShift for air-gapped clusters. Use OperatorHub CR to disable individual or all sources with Ansible auto.

⏱ 10 minutes openshiftoperatorhubair-gapped
🤖 AI & GPU advanced

Run:ai Distributed vLLM with NCCL

Deploy distributed vLLM inference on Run:ai with NCCL over NVLink and RDMA. Tensor parallelism across GPUs with NCCL debug logging, SR-IOV networking.

⏱ 25 minutes runaivllmnccl
🤖 AI & GPU advanced

AIPerf LLM Benchmarking on K8s

Benchmark generative AI inference on Kubernetes with NVIDIA AIPerf. Measure TTFT, ITL, throughput, and latency across vLLM, NIM.

⏱ 20 minutes aiperfbenchmarkingllm
⚙️ Configuration advanced

Databases on K8s: Memory Overcommit

Why vm.overcommit_memory must be disabled for production databases on Kubernetes. Configure guaranteed QoS, disable swap.

⏱ 18 minutes databasesmemoryovercommit
🤖 AI & GPU advanced

DOCA Perftest RDMA Benchmarking

Run NVIDIA DOCA perftest on Kubernetes to benchmark RDMA bandwidth and latency between GPU nodes. Traffic patterns, GPUDirect memory modes.

⏱ 25 minutes docaperftestrdma
🌐 Networking advanced

mlnx_qos QoS on MOFED Containers

Configure RDMA QoS with mlnx_qos from MOFED containers on Kubernetes. Set PFC, ETS, DSCP trust mode, and validate lossless RoCE traffic classes on ConnectX.

⏱ 20 minutes mlnx-qosmofedpfc
🤖 AI & GPU advanced

RetinaNet GPU Training on Kubernetes

Train RetinaNet object detection models on Kubernetes with unlimited memlock for RDMA, CRI-O ulimit configuration, and multi-GPU distributed training.

⏱ 20 minutes retinanetgpu-trainingmemlock
🔒 Security advanced

Kubernetes Certificate Signing Requests

Use the Kubernetes CSR API to issue, approve, and manage TLS certificates. Automate certificate workflows for services, users, and kubelet rotation.

⏱ 15 minutes certificatescsrtls
⚙️ Configuration beginner

Kubernetes startupProbe Configuration Guide

Configure startupProbe for slow-starting containers to prevent premature kills. Understand interaction with liveness and readiness probes.

⏱ 8 minutes startup-probeprobeshealth-check
🚀 Deployments intermediate

Kubernetes DaemonSet Update Strategies

Configure DaemonSet rolling updates with maxUnavailable and maxSurge. Understand OnDelete vs RollingUpdate strategies for node-level workloads.

⏱ 10 minutes daemonsetupdate-strategyrolling-update
🌐 Networking intermediate

EndpointSlice Service Discovery

Understand Kubernetes EndpointSlices for scalable service discovery. Compare with legacy Endpoints and configure topology-aware routing.

⏱ 10 minutes endpointsliceservice-discoverynetworking
🚀 Deployments intermediate

Kubernetes preStop Hooks for Graceful Shutdown

Configure preStop hooks and terminationGracePeriodSeconds for zero-downtime pod termination. Handle SIGTERM correctly in your applications.

⏱ 12 minutes graceful-shutdownprestopsigterm
⚡ Autoscaling advanced

HPA v2 Multiple Metrics Scaling Guide

Configure HorizontalPodAutoscaler v2 with CPU, memory, custom, and external metrics. Control scaling behavior with stabilization windows.

⏱ 15 minutes hpaautoscalingmetrics
⚙️ Configuration beginner

Kubernetes imagePullPolicy Guide

Configure imagePullPolicy correctly: Always, Never, and IfNotPresent behavior. Understand digest pinning and tag mutability implications.

⏱ 8 minutes image-pull-policycontainer-imagesregistry
⚙️ Configuration intermediate

Kubernetes Job Parallelism Guide

Configure Kubernetes Jobs with parallelism, completions, and indexed completion mode for efficient batch processing and parallel workloads.

⏱ 12 minutes jobsbatchparallelism
⚙️ Configuration beginner

Kubernetes LimitRange Defaults

Set default resource requests and limits per namespace with LimitRange. Enforce min/max constraints and prevent unbounded resource consumption.

⏱ 8 minutes limitrangeresource-defaultsnamespace
🚀 Deployments intermediate

Multi-Container Pod Patterns in Kubernetes

Implement sidecar, ambassador, and adapter patterns in Kubernetes pods. Share volumes and network namespace between containers for modular architectures.

⏱ 12 minutes sidecarmulti-containerambassador
🌐 Networking intermediate

Kubernetes Egress Network Policies

Control outbound traffic from pods with egress NetworkPolicies. Allow DNS, block internet access, and restrict pod-to-pod communication by namespace.

⏱ 12 minutes network-policyegresssecurity
⚙️ Configuration intermediate

Kubernetes Node Affinity Guide

Schedule pods to specific nodes with requiredDuringScheduling and preferredDuringScheduling node affinity. Control placement with expressions and weights.

⏱ 10 minutes node-affinityschedulingnode-selector
💾 Storage intermediate

PersistentVolume Reclaim Policies

Understand Retain, Delete, and Recycle reclaim policies for PersistentVolumes. Manage PV lifecycle after PVC deletion and recover bound volumes.

⏱ 10 minutes persistent-volumereclaim-policystorage
⚙️ Configuration intermediate

Pod Priority Preemption Kubernetes

Configure PriorityClasses to ensure critical workloads get resources by preempting lower-priority pods. Understand preemption mechanics and safeguards.

⏱ 12 minutes prioritypreemptionscheduling
⚙️ Configuration advanced

Pod Topology Spread Constraints Guide

Use topologySpreadConstraints to distribute pods evenly across zones, nodes, and failure domains for high availability in Kubernetes.

⏱ 15 minutes topologyschedulinghigh-availability
💾 Storage intermediate

Kubernetes Projected Volumes Explained

Combine Secrets, ConfigMaps, Downward API, and ServiceAccount tokens into a single projected volume mount for cleaner pod configuration.

⏱ 10 minutes projected-volumessecretsconfigmap
🚀 Deployments intermediate

Kubernetes Rolling Update Strategy

Configure rolling update deployments with maxSurge and maxUnavailable to control rollout speed, minimize downtime, and enable safe progressive delivery.

⏱ 10 minutes rolling-updatedeployment-strategyrollout
🌐 Networking advanced

Topology-Aware Service Routing

Enable zone-aware traffic routing in Kubernetes to reduce cross-zone latency and egress costs. Configure topology hints and traffic distribution.

⏱ 12 minutes topology-routingzone-awaretraffic-distribution
🚀 Deployments intermediate

StatefulSet Headless Service DNS

Configure StatefulSets with headless services for stable network identities. Understand pod DNS, ordered deployment, and persistent storage patterns.

⏱ 12 minutes statefulsetheadless-servicedns
🔒 Security advanced

ValidatingAdmissionPolicy with CEL

Replace admission webhooks with ValidatingAdmissionPolicy and CEL expressions for in-process, low-latency Kubernetes policy enforcement.

⏱ 15 minutes admission-policycelpolicy
🌐 Networking advanced

SR-IOV NetworkNodePolicy for RDMA

Configure SriovNetworkNodePolicy on OpenShift to create RDMA-capable VFs on Mellanox ConnectX NICs for GPUDirect RDMA and high-performance AI networking.

⏱ 20 minutes sriovrdmamellanox
🔒 Security intermediate

cert-manager OVH DNS-01 Wildcard TLS

Configure cert-manager with OVH DNS-01 challenge for automated wildcard TLS certificates on k3s. Let's Encrypt production certificates with zero downtime r.

⏱ 20 minutes cert-managerovhdns-01
🌐 Networking intermediate

Cilium eBPF Gateway API Hubble k3s

Install Cilium with eBPF dataplane, Gateway API support, and Hubble observability on k3s. Replace kube-proxy with eBPF, configure GatewayClass.

⏱ 30 minutes ciliumebpfgateway-api
🚀 Deployments intermediate

CloudNativePG PostgreSQL on Kubernetes

Deploy PostgreSQL on Kubernetes with CloudNativePG operator. Cluster setup, affinity, backups to S3, connection pooling, and high availability configuration.

⏱ 20 minutes cloudnativepgpostgresqldatabase
🔧 Troubleshooting intermediate

Fix 502 Bad Gateway in Kubernetes

Troubleshoot and fix 502 Bad Gateway errors in Kubernetes. Causes include pod readiness timing, ingress misconfiguration, upstream timeouts.

⏱ 15 minutes 502bad-gatewayingress
🚀 Deployments advanced

Full GitOps Pipeline k3s to Production

End-to-end GitOps pipeline: git push triggers Gitea Actions build, pushes to quay.io, Octopus Deploy creates release with ephemeral preview.

⏱ 60 minutes gitopsargocdoctopus-deploy
🌐 Networking intermediate

Gateway API HTTPRoutes TLS on k3s

Configure Gateway API HTTPRoutes with TLS termination on k3s using Cilium. Route traffic to multiple services with wildcard certificates and HTTP-to-HTTPS .

⏱ 20 minutes gateway-apihttproutetls
🚀 Deployments intermediate

Gitea Actions Runner Push to Quay

Deploy Gitea Actions runner on k3s to build container images and push to quay.io. DinD-less builds with Kaniko, automated CI pipelines for every git push.

⏱ 25 minutes giteaactionsci-cd
🚀 Deployments intermediate

Gitea PostgreSQL Valkey on k3s

Deploy self-hosted Gitea with PostgreSQL and Valkey (Redis fork) on k3s. Complete Git forge with Actions CI runner, container registry, and package management.

⏱ 30 minutes giteapostgresqlvalkey
🎯 Helm intermediate

Helm Hook Delete Policy Explained

Configure Helm hook delete policies: before-hook-creation, hook-succeeded, hook-failed. Control Job cleanup after install, upgrade, and test hooks.

⏱ 10 minutes helmhooksdelete-policy
🎯 Helm intermediate

Helm OCI Registry for Charts Explained

Store and manage Helm charts in OCI-compliant registries like GHCR, ECR, ACR, and Quay. Push, pull, and version charts using standard container registries.

⏱ 15 minutes helmociregistry
🚀 Deployments beginner

Hugo nginx Static Site on a k3s Cluster

Deploy a Hugo static site with nginx on k3s. Multi-stage build, Brotli compression, security headers, and automated redeployment on git push via Gitea Actions.

⏱ 15 minutes hugonginxstatic-site
⚙️ Configuration intermediate

Install Kubernetes on Fedora with kubeadm

Step-by-step guide to install Kubernetes on Fedora Linux using kubeadm. Disable swap, configure containerd, install kubeadm kubelet kubectl.

⏱ 30 minutes fedorakubeadminstall
🚀 Deployments intermediate

Kairos k3s on Hetzner CPX42: Immutable Bootstrap

Deploy an immutable Kairos-based k3s cluster on Hetzner Cloud CPX42. Automated provisioning with cloud-init, immutable OS upgrades.

⏱ 45 minutes kairosk3shetzner
🔧 Troubleshooting beginner

kubectl cp Copy Files to and from Pods

Copy files between local machine and Kubernetes pods with kubectl cp. Supports containers, namespaces, tar-based transfer, and common troubleshooting.

⏱ 5 minutes kubectlcpcopy
🔧 Troubleshooting beginner

kubectl logs View Pod Logs Guide

View and stream Kubernetes pod logs with kubectl logs. Multi-container pods, previous crashes, label selectors, timestamps, and log aggregation patterns.

⏱ 10 minutes kubectllogsdebugging
🚀 Deployments beginner

kubectl rollout restart Deployment

Restart Kubernetes Deployments, StatefulSets, and DaemonSets with kubectl rollout restart. Zero-downtime rolling restart without changing pod spec.

⏱ 5 minutes kubectlrolloutrestart
⚡ Autoscaling intermediate

Kubernetes Cluster Autoscaler Configuration

Configure Kubernetes Cluster Autoscaler: scale-down delay, node group settings, priority expander, GPU scaling, and cloud provider integration for EKS, GKE.

⏱ 15 minutes cluster-autoscalerautoscalingnode-scaling
⚙️ Configuration beginner

Create ConfigMap from File in Kubernetes

Create Kubernetes ConfigMaps from files, directories, and env files with kubectl. Mount as volumes or inject as environment variables in pods.

⏱ 10 minutes configmapkubectlconfiguration
📊 Observability intermediate

Continuous Profiling with Pyroscope

Deploy Pyroscope on Kubernetes for continuous CPU and memory profiling. Identify performance bottlenecks in production without overhead.

⏱ 20 minutes profilingpyroscopeperformance
💾 Storage intermediate

CSI Volume Snapshots and Restore

Create and restore volume snapshots using CSI VolumeSnapshot API. Configure VolumeSnapshotClass, take point-in-time backups, and clone PVCs from snapshots.

⏱ 15 minutes csisnapshotsstorage
🌐 Networking intermediate

Kubernetes DNS Policy ClusterFirstWithHostNet

Configure Kubernetes DNS policies: ClusterFirst, ClusterFirstWithHostNet, Default, and None. Fix DNS resolution for hostNetwork pods and custom nameservers.

⏱ 10 minutes dnsdnspolicyhostnetwork
⚙️ Configuration beginner

Kubernetes Downward API: Pod Metadata in Env

Expose pod metadata to containers using Kubernetes Downward API. Access pod name, namespace, node name, labels, annotations.

⏱ 10 minutes downward-apienvironment-variablesmetadata
💾 Storage intermediate

Generic Ephemeral Volumes in Kubernetes

Use generic ephemeral volumes for per-pod temporary storage with CSI driver features. Scratch space, caching, and temp data without pre-provisioned PVCs.

⏱ 10 minutes ephemeral-volumesstoragecsi
⚙️ Configuration intermediate

Kubernetes Finalizers Explained

How Kubernetes finalizers work: prevent resource deletion until cleanup completes. Custom finalizer patterns, stuck resource recovery.

⏱ 10 minutes finalizersdeletioncontrollers
🌐 Networking advanced

Gateway API gRPC Routes on Kubernetes

Configure Kubernetes Gateway API GRPCRoute for gRPC traffic routing. Service-level matching, header-based routing, and traffic splitting for gRPC services.

⏱ 15 minutes gateway-apigrpcnetworking
💾 Storage intermediate

Kubernetes hostPath Volume Guide

Use hostPath volumes to mount node filesystem paths into pods. Types, security risks, use cases for DaemonSets, and safer alternatives like local PVs.

⏱ 10 minutes hostpathvolumesstorage
⚡ Autoscaling advanced

HPA Behavior and Scaling Policies

Configure HPA scaling behavior with stabilization windows, scaling policies, and rate limiting. Fine-tune scale-up and scale-down speed.

⏱ 20 minutes hpaautoscalingscaling-policies
⚡ Autoscaling intermediate

HPA Container Resource Metrics

Configure HPA to scale based on individual container metrics instead of pod-level averages. Target specific containers in multi-container pods.

⏱ 10 minutes hpaautoscalingcontainer-metrics
⚡ Autoscaling advanced

Kubernetes HPA Custom Metrics with Prometheus

Configure Kubernetes HPA with custom Prometheus metrics. Prometheus Adapter setup, custom and external metrics, scaling on request latency, queue depth.

⏱ 20 minutes hpaautoscalingprometheus
⚙️ Configuration intermediate

Kubernetes kustomization.yaml Guide

Write kustomization.yaml files for Kubernetes resource management. Overlays, patches, generators, transformers, and multi-environment deployment patterns.

⏱ 15 minutes kustomizekustomizationconfiguration
🔒 Security intermediate

K8s Let's Encrypt Ingress with cert-manager

Automate TLS certificates for Kubernetes Ingress using cert-manager and Let's Encrypt. ClusterIssuer setup, HTTP-01 and DNS-01 challenges, and auto-renewal.

⏱ 20 minutes cert-managerletsencrypttls
⚙️ Configuration intermediate

Kubernetes Liveness Probe Best Practices

Configure Kubernetes liveness probes correctly. Best practices for httpGet, exec, and tcpSocket probes. Avoid database checks, thundering herd.

⏱ 10 minutes livenessprobeshealth-check
⚡ Autoscaling advanced

Multidimensional Pod Autoscaler (MPA)

Configure Google's Multidimensional Pod Autoscaler to scale both horizontally and vertically simultaneously. Combines HPA and VPA logic in one controller.

⏱ 25 minutes autoscalingmpahpa
🔒 Security intermediate

Kubernetes NetworkPolicy Default Deny Egress

Implement Kubernetes NetworkPolicy default deny egress rules. Block all outbound traffic, then allow specific destinations: DNS, external APIs.

⏱ 10 minutes networkpolicyegressdeny
🔧 Troubleshooting beginner

Check Kubernetes Node Status with kubectl

Check and troubleshoot Kubernetes node status with kubectl. Node conditions (Ready, MemoryPressure, DiskPressure), NotReady debugging, and capacity monitoring.

⏱ 10 minutes nodestatuskubectl
📊 Observability intermediate

OpenTelemetry Auto-Instrumentation

Configure OpenTelemetry Operator auto-instrumentation to inject tracing into pods without code changes. Supports Java, Python, Node.js, .NET, and Go.

⏱ 15 minutes opentelemetrytracingauto-instrumentation
⚙️ Configuration intermediate

K8s PriorityClass and Missing Pod Priority

Fix missing pod priority in Kubernetes. PriorityClass configuration, preemption behavior, system-critical classes, and scheduling order for GPU workloads.

⏱ 10 minutes prioritypriorityclassscheduling
⚙️ Configuration beginner

Kubernetes Release Cycle and Version Support

Kubernetes release cycle explained: 3 releases per year, 14-month support window, patch cadence, version skew policy, and upgrade planning timeline.

⏱ 10 minutes release-cycleversioningupgrade
🔒 Security intermediate

Kubernetes Service Account Token Guide

Create and manage Kubernetes service account tokens. TokenRequest API, projected volumes, long-lived tokens, and RBAC binding for pod-to-API authentication.

⏱ 15 minutes service-accounttokenrbac
🌐 Networking beginner

Kubernetes Service DNS Resolution

How Kubernetes Service DNS works: naming conventions, FQDN format, headless services, cross-namespace resolution, and DNS debugging with nslookup.

⏱ 10 minutes dnsservicecoredns
⚙️ Configuration beginner

terminationGracePeriodSeconds Default

Configure Kubernetes terminationGracePeriodSeconds for graceful pod shutdown. Default 30s, SIGTERM handling, preStop hooks, and per-container settings.

⏱ 10 minutes terminationgraceful-shutdownsigterm
🚀 Deployments advanced

Multi-Cluster Fleet Management on Kubernetes

Manage multiple Kubernetes clusters with kubectl contexts, federation, GitOps fleet patterns, and tools like Rancher, ArgoCD, and Cluster API.

⏱ 20 minutes multi-clusterfederationfleet
🚀 Deployments intermediate

Mutagen Kubernetes File Sync Guide

Sync files between local machine and Kubernetes pods with Mutagen. Real-time bidirectional sync for development, hot-reload workflows.

⏱ 15 minutes mutagenfile-syncdevelopment
🤖 AI & GPU advanced

NCCL Topology Dump File for GPU Debugging

Use NCCL_TOPO_DUMP_FILE to capture and analyze GPU interconnect topology in Kubernetes. Debug NVLink, NVSwitch, and PCIe connection paths.

⏱ 15 minutes nccltopologygpu
🚀 Deployments advanced

Octopus Deploy 2025.4 on Kubernetes

Deploy Octopus Deploy 2025.4 with MSSQL and Kubernetes agent on k3s. Release orchestration with ephemeral preview environments, approval gates.

⏱ 40 minutes octopus-deploymssqlrelease-management
⚙️ Configuration beginner

Record kubectl Sessions for Kubernetes

Record and replay kubectl sessions for auditing, documentation, and training. Terminal recording with asciinema, script, and kubectl plugins for OpenShift.

⏱ 10 minutes kubectlrecordingaudit
🤖 AI & GPU advanced

Run:ai Distrib. vLLM Inference Multimodal LLMs

Deploy multimodal LLMs with Run:ai distributed inference and vLLM on Kubernetes. Tensor parallelism, NCCL over NVLink, GPUDirect RDMA.

⏱ 30 minutes runaivllmdistributed-inference
🌐 Networking advanced

DCB on Mellanox ConnectX: Lossless Ethernet...

Configure Data Center Bridging (DCB) on Mellanox ConnectX NICs. DCBX negotiation, PFC, ETS, and CN for lossless RoCE Ethernet in Kubernetes AI clusters.

⏱ 30 minutes dcbdcbxpfc
🌐 Networking intermediate

ETS Queue, PFC, DSCP Trust on Mellanox Quic...

Quick reference for enabling ETS queues, PFC, DSCP trust, and DSCP-to-priority mapping on Mellanox ConnectX NICs. Three commands for lossless RoCE Ethernet.

⏱ 10 minutes etspfcdscp
🚀 Deployments beginner

Kubernetes Day 2: Where the Leverage Kicks In

Why Kubernetes pays off after initial setup. Day 2 operations leverage: auto-scaling, self-healing, rolling updates, observability.

⏱ 15 minutes day-2-operationsplatform-engineeringautoscaling
🚀 Deployments beginner

Deploy a New App in 5 Minutes on Kubernetes

Deploy a production-ready application in 5 minutes on an existing Kubernetes cluster. Deployment, Service, Ingress, TLS, autoscaling.

⏱ 15 minutes quick-startdeploymentdeveloper-experience
⚙️ Configuration beginner

Namespace Templates: Instant Envs in K8s

Create production-ready namespace templates for instant environment provisioning. One command deploys namespace, RBAC, quotas, network policies, and monitoring.

⏱ 15 minutes namespacetemplatesonboarding
⚙️ Configuration beginner

Platform Engineering: Golden Paths in K8s

Build golden paths for developers on Kubernetes. Internal developer platform with Backstage, self-service namespaces, pre-built Helm charts.

⏱ 15 minutes platform-engineeringgolden-pathbackstage
🚀 Deployments beginner

Reusable CI/CD Pipeline Templates for K8s

Build once, deploy anything. Reusable CI/CD pipeline templates for Kubernetes using GitHub Actions, GitLab CI, and Tekton.

⏱ 15 minutes cicdpipelinegithub-actions
🌐 Networking intermediate

NMState & nmstatectl: Node Network Management

Manage node networking with NMState declarative API and nmstatectl CLI. Create NodeNetworkConfigurationPolicy manifests, verify with nmstatectl.

⏱ 20 minutes nmstatenmstatectlnncp
🌐 Networking advanced

PFC Configuration on Mellanox ConnectX NICs

Enable Priority Flow Control on Mellanox ConnectX-6/7 NICs for lossless RoCE. mlnx_qos, cma_roce_mode, DSCP trust, ECN, and firmware-level PFC verification.

⏱ 25 minutes pfcmellanoxconnectx
💾 Storage advanced

Access Zones on Scale-Out NAS for Kubernetes

Configure access zones on scale-out NAS (Dell PowerScale/Isilon) for Kubernetes persistent storage. Multi-tenant isolation, CSI driver setup.

⏱ 30 minutes access-zonesscale-out-naspowerscale
🌐 Networking advanced

Extended Resources & RDMA Shared Device Plugin

Kubernetes extended resources for RDMA devices using the shared device plugin. Advertise and schedule InfiniBand and RoCE NICs without SR-IOV using k8s-rdm.

⏱ 25 minutes extended-resourcesrdmashared-device-plugin
🌐 Networking advanced

Kubernetes Route and Ingress Management Guide

Manage OpenShift Routes and Kubernetes Ingress resources. TLS termination, path-based routing, weighted traffic splitting.

⏱ 25 minutes routeingressopenshift-route
🔒 Security advanced

Automate Secret and Key Rotation in Kubernetes

Automate TLS certificate and secret key rotation in Kubernetes. CronJob-based rotation, external-secrets-operator, cert-manager auto-renewal.

⏱ 25 minutes secret-rotationcert-managerexternal-secrets
🔒 Security advanced

Automate User Onboarding & Offboarding in K8s

Automate Kubernetes user onboarding and offboarding. RBAC provisioning, namespace creation, quota assignment, OIDC group sync, and access revocation scripts.

⏱ 25 minutes rbaconboardingoffboarding
⚙️ Configuration advanced

IOMMU on K8s: GPU Passthrough and SR-IOV

Enable and configure IOMMU for GPU passthrough, SR-IOV, and VFIO on Kubernetes. Kernel parameters, IOMMU groups, device isolation, and troubleshooting guide.

⏱ 35 minutes iommuvfiogpu-passthrough
🚀 Deployments intermediate

Kubernetes and OpenShift Major Version Upgrade

Upgrade Kubernetes minor versions (1.31→1.32) and OpenShift (4.16→4.17, EUS-to-EUS). API deprecation migration, etcd backup.

⏱ 30 minutes major-upgrademinor-upgradeapi-deprecation
🚀 Deployments intermediate

Kubernetes and OpenShift Patch Updates

Apply patch updates to Kubernetes and OpenShift clusters safely. Patch version upgrades for control plane, kubeadm, kubelet.

⏱ 30 minutes patch-updateupgradekubeadm
🚀 Deployments intermediate

Kubernetes and OpenShift Upgrade Strategy

Complete upgrade strategy for Kubernetes and OpenShift clusters. Understand patch, minor, and major versions, upgrade paths.

⏱ 30 minutes upgradeopenshiftkubernetes
🚀 Deployments intermediate

Deploy MariaDB on OpenShift with SCC

Deploy MariaDB on OpenShift with proper Security Context Constraints. Configure anyuid SCC, persistent storage, custom my.

⏱ 25 minutes mariadbopenshiftscc
🚀 Deployments intermediate

OpenShift 4.20: New Features and Upgrade Guide

OpenShift 4.20 (EUS) new features, Kubernetes 1.33 alignment, the upgrade path from 4.18, and what administrators need to know before upgrading.

⏱ 20 minutes openshiftopenshift-4.20eus
🚀 Deployments intermediate

OpenShift 4.21: New Features and Upgrade Guide

OpenShift 4.21 new features, K8s 1.34 alignment, upgrade from 4.20. Non-EUS release with latest innovations: in-place pod resize GA, DRA improvements.

⏱ 20 minutes openshiftopenshift-4.21upgrade
⚙️ Configuration intermediate

OpenShift MachineConfig and MCP Deep Dive

Master MachineConfig and MachineConfigPool on OpenShift. Configure kernel args, files, systemd units, and manage rolling node updates with MCP strategies.

⏱ 25 minutes machineconfigmcpopenshift
🔒 Security intermediate

OpenShift SCC: Security Context Constraints

Configure Security Context Constraints on OpenShift. Manage SCCs for pods requiring privileged access, host networking, custom UID/GID, and volume types.

⏱ 25 minutes sccopenshiftsecurity-context
🌐 Networking advanced

Configure PFC with NMState on Kubernetes

Enable Priority Flow Control (PFC) for lossless RDMA using NMState and NodeNetworkConfigurationPolicy. Configure DSCP-to-priority mapping, ECN, and RoCEv2 QoS.

⏱ 40 minutes pfcnmstatenncp
🤖 AI & GPU advanced

Inter-Node Tensor Parallelism on Kubernetes

Split a single LLM across multiple physical servers using tensor parallelism. Configure vLLM, NIM, and Ray for inter-node TP with NCCL over RDMA or TCP.

⏱ 45 minutes tensor-parallelismdistributed-inferencemulti-node
⚙️ Configuration intermediate

kubectl Config: Manage Contexts and Clusters

Manage kubectl contexts with kubectl config commands. Switch clusters, delete contexts, rename entries, and merge multiple kubeconfig files safely.

⏱ 15 minutes kubectlkubeconfigcontext
⚙️ Configuration intermediate

K8s imagePullSecrets: Private Registry Auth

Configure imagePullSecrets for pulling container images from private registries. Create docker-registry secrets, attach to pods and ServiceAccounts.

⏱ 15 minutes imagepullsecretsprivate-registrydocker-registry
🤖 AI & GPU intermediate

Triton Inference Server vs vLLM: Which to C...

Compare NVIDIA Triton Inference Server vs vLLM for LLM serving on Kubernetes. Performance, multi-model support, batching, GPU utilization.

⏱ 15 minutes tritonvllminference
🤖 AI & GPU advanced

Verify NCCL RDMA Traffic with Debug Logging

Prove NCCL uses RDMA for GPU communication on Kubernetes. Use NCCL_DEBUG and NCCL_DEBUG_SUBSYS=ALL to verify InfiniBand, RoCE.

⏱ 30 minutes ncclrdmainfiniband
🚀 Deployments intermediate

Cluster API on AWS: Provision EKS Clusters

Use Cluster API (CAPI) to provision and manage EKS clusters declaratively. Install clusterctl, configure CAPA provider, and automate cluster lifecycle on AWS.

⏱ 25 minutes cluster-apicapiaws
🚀 Deployments intermediate

ClusterClass: Reusable Cluster Templates in...

Define reusable ClusterClass templates in Cluster API for consistent multi-cluster provisioning. Variables, patches, and topology-based cluster creation.

⏱ 25 minutes cluster-apicapiclusterclass
🚀 Deployments intermediate

Cluster API on vSphere: On-Prem K8s Clusters

Provision on-premises Kubernetes clusters on vSphere using Cluster API (CAPV). VM templates, control plane HA, node scaling, and day-2 operations.

⏱ 25 minutes cluster-apicapivsphere
🔒 Security intermediate

Hardware Attestation for Kubernetes Workloads

Implement remote attestation for Kubernetes workloads. Verify TEE integrity with attestation services, release secrets to verified enclaves.

⏱ 25 minutes attestationconfidential-computingzero-trust
🔒 Security intermediate

Confidential Containers with Kata

Deploy confidential containers using Kata Containers and TEEs on Kubernetes. Hardware attestation, encrypted container images.

⏱ 25 minutes confidential-containerskata-containerstee
🔒 Security intermediate

CVE-2026-3865: CSI SMB Driver Path Traversa...

Fix CVE-2026-3865 Kubernetes CSI SMB driver path traversal vulnerability. Upgrade to v1.20.1, detect malicious PersistentVolumes.

⏱ 15 minutes cvecsismb
📊 Observability intermediate

Alertmanager Routing, Grouping, and Silences

Configure Alertmanager routing trees, receiver integrations, inhibition rules, silences, and alert grouping for production Kubernetes monitoring stacks.

⏱ 20 minutes alertmanagerroutingsilences
📊 Observability intermediate

K8s Golden Signals: SLI and SLO Monitoring

Implement Google SRE golden signals on Kubernetes. Define SLIs, set SLO targets, configure error budgets, and build SLO dashboards with Prometheus and Sloth.

⏱ 20 minutes slislogolden-signals
🔒 Security intermediate

gVisor RuntimeClass on K8s: Sandbox Pods

Deploy gVisor sandbox containers on Kubernetes using RuntimeClass. Install runsc, configure containerd, and isolate untrusted workloads with application-le.

⏱ 20 minutes gvisorruntimeclasssandbox
📊 Observability intermediate

Kubernetes Log Aggregation with Grafana Loki

Aggregate Kubernetes logs with Grafana Loki and Promtail. Install Loki stack, LogQL queries, label-based filtering, and Grafana log exploration dashboards.

⏱ 20 minutes lokiloggingpromtail
📊 Observability beginner

K8s Metrics Server: Install and Configure

Install and configure Kubernetes Metrics Server for kubectl top, HPA autoscaling, and resource monitoring. Troubleshoot common metrics-server errors and TL.

⏱ 15 minutes metrics-servermonitoringkubectl-top
📊 Observability intermediate

Network Observability with Cilium Hubble

Monitor Kubernetes network traffic with Cilium Hubble. Service maps, DNS visibility, HTTP flow logs, network policy auditing, and Hubble UI dashboards.

⏱ 20 minutes ciliumhubblenetwork-observability
📊 Observability intermediate

K8s Pod Resource Monitoring with Grafana

Monitor Kubernetes pod CPU and memory with Grafana dashboards. Prometheus queries for resource usage, request vs limit tracking.

⏱ 20 minutes grafanaprometheusresource-monitoring
🤖 AI & GPU intermediate

NCCL_IB_DISABLE Environment Variable

NCCL_IB_DISABLE environment variable explained. Set NCCL_IB_DISABLE=1 for Ethernet-only clusters, debug InfiniBand errors, and tune GPU communication.

⏱ 20 minutes ncclinfinibandrdma
🤖 AI & GPU advanced

vLLM on Huawei Ascend NPU: K8s Deployment

Deploy vLLM inference on Huawei Ascend NPUs in Kubernetes. Atlas 300I/910B device plugin, vllm-ascend container image, tensor parallelism, and model serving.

⏱ 30 minutes vllmascendnpu
🤖 AI & GPU intermediate

Deploy vLLM OpenAI Container on Kubernetes

Deploy the vLLM OpenAI-compatible server container on Kubernetes. Pull ghcr.io/vllm-project/vllm-openai, configure GPU resources, model loading.

⏱ 20 minutes vllmopenai-apiinference
🤖 AI & GPU advanced

AI-Native Development Platforms on Kubernetes

Build AI-native development platforms on Kubernetes. AI coding agents, automated testing, Copilot infrastructure, dev containers, and AI-driven CI/CD pipelines.

⏱ 20 minutes ai-nativedevelopment-platformscopilot
🤖 AI & GPU advanced

Agentic AI and Multi-Agent Systems

Deploy autonomous AI agents and multi-agent orchestration on Kubernetes. LangGraph, CrewAI, AutoGen, tool-calling agents, agent-to-agent communication.

⏱ 25 minutes agentic-aimulti-agentlangchain
🤖 AI & GPU intermediate

AI Infrastructure Cost Optimization

Optimize AI infrastructure costs on Kubernetes. GPU sharing, spot instances, inference batching, model quantization, token economics.

⏱ 20 minutes cost-optimizationgpu-sharingspot-instances
🤖 AI & GPU advanced

AI Content Watermarking on Kubernetes

Deploy AI-generated content watermarking on Kubernetes. Invisible watermarks, SynthID integration, detection APIs, image and text watermarking pipelines.

⏱ 20 minutes watermarkingsynthidai-generated-content
🔒 Security advanced

AI Security Platforms on Kubernetes

Secure AI workloads on Kubernetes. Model supply chain security, prompt injection defense, LLM output filtering, AI RBAC, GPU isolation.

⏱ 25 minutes ai-securityllm-securityprompt-injection
🤖 AI & GPU advanced

AI Supercomputing on Kubernetes GPU Clusters

Build AI supercomputing platforms on Kubernetes. Multi-node GPU training, NVIDIA DGX SuperPOD, InfiniBand RDMA, NCCL tuning, Blackwell clusters.

⏱ 30 minutes supercomputinggpu-clustersnvidia-dgx
🤖 AI & GPU advanced

Autonomous Industrial Systems on Kubernetes

Orchestrate autonomous factories and logistics with Kubernetes. Digital twins, robot fleet coordination, industrial IoT pipelines, predictive maintenance.

⏱ 25 minutes industrial-aidigital-twiniot
🌐 Networking advanced

Cilium Service Mesh: eBPF-Powered Kubernetes

Deploy Cilium service mesh on Kubernetes with eBPF. Sidecar-free mTLS, L7 traffic management, network policies, Hubble observability, and Gateway API support.

⏱ 25 minutes ciliumservice-meshebpf
🔒 Security advanced

Confidential Computing: SGX and SEV-SNP

Deploy confidential containers on Kubernetes with Intel SGX and AMD SEV-SNP. Encrypted memory, attestation, confidential VMs, Kata Containers.

⏱ 25 minutes confidential-computingsgxsev-snp
🚀 Deployments advanced

Crossplane K8s Infrastructure Management

Manage cloud infrastructure from Kubernetes with Crossplane. Providers, Compositions, Claims, XRDs, and GitOps-driven infrastructure as code for AWS, GCP.

⏱ 30 minutes crossplaneinfrastructure-as-codecloud-providers
🚀 Deployments advanced

Data Monetization Platforms on Kubernetes

Build data monetization platforms on Kubernetes. Data marketplace APIs, usage-based billing, data mesh architecture, secure data sharing, and catalog services.

⏱ 20 minutes data-monetizationdata-meshdata-marketplace
🔒 Security advanced

Data Sovereignty and Geopatriation

Implement data sovereignty and geopatriation on Kubernetes. Multi-region clusters, data residency policies, sovereign cloud, GDPR compliance.

⏱ 20 minutes data-sovereigntygeopatriationgdpr
🔒 Security advanced

Digital Provenance and Content Authenticity

Implement digital provenance on Kubernetes with C2PA content credentials. Verify AI-generated content, sign media pipelines.

⏱ 20 minutes digital-provenancec2pacontent-authenticity
🤖 AI & GPU advanced

Domain-Specific Language Models on Kubernetes

Deploy and fine-tune domain-specific LLMs on Kubernetes. Legal, healthcare, finance, and code models with LoRA fine-tuning, NIM serving, and RAG pipelines.

⏱ 25 minutes domain-specific-llmfine-tuninglora
🚀 Deployments intermediate

Flux vs ArgoCD: Kubernetes GitOps Compared

Compare Flux and ArgoCD for Kubernetes GitOps. Architecture, multi-tenancy, Helm support, UI, scalability, and when to choose each for production GitOps de.

⏱ 20 minutes fluxargocdgitops
🤖 AI & GPU advanced

GitOps for AI Workloads on Kubernetes

Deploy AI models with GitOps on Kubernetes. Version ML models in Git, ArgoCD for model rollouts, Flux for GPU cluster sync.

⏱ 25 minutes gitopsai-workloadsargocd
📊 Observability beginner

Grafana Dashboard 6417: Node Exporter Setup

Import Grafana Dashboard 6417 for Kubernetes pod monitoring. Node Exporter Full setup with Prometheus, CPU, memory, disk, and network metrics.

⏱ 10 minutes grafanadashboard-6417node-exporter
🎯 Helm beginner

Helm Sprig add1 trim merge Functions

Helm Sprig add1 function increments integers in templates. Plus trim for whitespace removal and merge for combining dictionaries in Helm charts.

⏱ 10 minutes helmsprigtemplating
🎯 Helm beginner

Helm Sprig print quote default Functions

Helm Sprig print function concatenates without spaces, quote wraps in double quotes, default provides fallback values. Template examples and patterns.

⏱ 10 minutes helmsprigtemplating
⚡ Autoscaling intermediate

KEDA vs HPA: Event-Driven Autoscaling Expla...

Compare KEDA and HPA for Kubernetes autoscaling. Scale on Kafka lag, Prometheus metrics, queue depth, cron, and custom events. KEDA vs HPA decision guide.

⏱ 20 minutes kedahpaevent-driven
⚙️ Configuration intermediate

Kubernetes 1.35 and 1.36 Upgrade Checklist

Kubernetes 1.35 and 1.36 upgrade checklist with deprecated APIs, removed features, new GA capabilities, and step-by-step migration guide for production clu.

⏱ 25 minutes kubernetes-upgradedeprecated-apismigration
🤖 AI & GPU advanced

K8s AI Gateway: Inference Extension Guide

Use the Kubernetes AI Gateway and Inference Extension to route LLM traffic. Model-aware routing, load balancing across inference backends.

⏱ 25 minutes ai-gatewaygateway-apiinference
⚙️ Configuration intermediate

K8s ConfigMap Hot Reload Without Restart

Reload Kubernetes ConfigMaps without pod restarts. Volume-mounted auto-update, Reloader controller, checksum annotations.

⏱ 10 minutes configmaphot-reloadconfiguration
⚙️ Configuration beginner

Kubernetes CronJob concurrencyPolicy Explained

Configure Kubernetes CronJob concurrencyPolicy: Allow, Forbid, and Replace. Control overlapping job execution, prevent duplicate runs, and handle slow jobs.

⏱ 10 minutes cronjobconcurrencyscheduling
🌐 Networking intermediate

Kubernetes dnsPolicy and dnsConfig Explained

Configure Kubernetes dnsPolicy: ClusterFirst, Default, None, ClusterFirstWithHostNet. Custom dnsConfig with nameservers, searches, and ndots options.

⏱ 10 minutes dnsdnspolicycoredns
🤖 AI & GPU advanced

Dynamic Resource Allocation for GPUs

Use Kubernetes Dynamic Resource Allocation to schedule GPUs. DRA ResourceClaims, partitionable devices, GPU sharing, and structured parameters for accelerators.

⏱ 25 minutes dragpu-schedulingresource-allocation
⚙️ Configuration intermediate

K8s Finalizers: Prevent Premature Deletion

How Kubernetes finalizers work to prevent premature resource deletion. Add, remove, and troubleshoot stuck finalizers on PVCs, namespaces, and custom resources.

⏱ 10 minutes finalizersdeletioncontrollers
💾 Storage intermediate

K8s fsGroupChangePolicy: Fix Slow Mounts

Configure fsGroupChangePolicy OnRootMismatch to skip recursive chown on volume mounts. Fix slow pod starts caused by large persistent volumes with millions.

⏱ 10 minutes fsgroupchangepolicysecurity-contextpersistent-volumes
🚀 Deployments intermediate

Kubernetes Job Completions and Parallelism

Configure Kubernetes Job completions, parallelism, backoffLimit, and indexed jobs. Parallel batch processing, work queue patterns, and job failure handling.

⏱ 10 minutes jobsbatch-processingparallelism
🚀 Deployments intermediate

Native Sidecar Containers in K8s: Complete ...

Use native sidecar containers in Kubernetes v1.33+. InitContainer restartPolicy Always, lifecycle ordering, logging sidecars, service mesh.

⏱ 20 minutes sidecarinit-containersservice-mesh
🔒 Security intermediate

Kubernetes NetworkPolicy Default Deny Examples

Create Kubernetes NetworkPolicy default deny rules for ingress and egress. Block all traffic, allow specific pods, DNS exceptions, and namespace isolation.

⏱ 15 minutes networkpolicydefault-denynetwork-security
🚀 Deployments intermediate

Kubernetes Pod Priority and Preemption Guide

Configure Kubernetes PriorityClasses for pod scheduling priority. Preemption, system-critical pods, resource guarantee hierarchy, and non-preempting priority.

⏱ 10 minutes prioritypreemptionscheduling
🚀 Deployments intermediate

Kubernetes topologySpreadConstraints Guide

Configure pod topology spread constraints for even distribution across zones, nodes, and racks. maxSkew, topologyKey, whenUnsatisfiable.

⏱ 15 minutes topology-spreadschedulinghigh-availability
🚀 Deployments intermediate

Kubernetes PodDisruptionBudget (PDB) Guide

Configure PodDisruptionBudgets to protect workloads during node drains, upgrades, and maintenance. minAvailable, maxUnavailable, and eviction policies.

⏱ 10 minutes poddisruptionbudgetpdbavailability
⚙️ Configuration beginner

Kubernetes Resource Limits CPU Memory Format

Kubernetes container resource limits and requests syntax. CPU units (200m, 500m, 1), memory units (256Mi, 1Gi), QoS classes, and YAML format examples.

⏱ 10 minutes resource-limitscpumemory
🚀 Deployments intermediate

Kubernetes Rolling Update Zero Downtime Guide

Configure Kubernetes rolling updates for zero-downtime deployments. maxSurge, maxUnavailable, readiness probes, preStop hooks, and graceful shutdown strategies.

⏱ 15 minutes rolling-updatezero-downtimedeployment-strategy
🌐 Networking beginner

Kubernetes Service Types Comparison

Compare Kubernetes Service types: ClusterIP for internal access, NodePort for direct port exposure, LoadBalancer for external traffic.

⏱ 10 minutes servicesclusteripnodeport
⚙️ Configuration beginner

Kubernetes Startup Probes for Slow Containers

Configure Kubernetes startup probes for containers with long initialization. Separate startup from liveness checks, failureThreshold tuning.

⏱ 10 minutes startup-probehealth-checksliveness
🤖 AI & GPU advanced

Kueue for Batch Jobs and GPU Queues

Use Kueue to manage batch job queues on Kubernetes. GPU quota, fair sharing, priority queues, ML training workloads, and multi-tenant cluster scheduling.

⏱ 25 minutes kueuebatch-jobsgpu-scheduling
🤖 AI & GPU intermediate

Llama 2 70B FP16 Model Size 140GB Guide

Llama 2 70B FP16 model size is 140GB. Complete GPU memory requirements for FP16, FP8, INT4 quantization, and multi-GPU tensor parallelism on Kubernetes.

⏱ 15 minutes llamamodel-sizinggpu-requirements
🌐 Networking advanced

NCCL_SOCKET_IFNAME Environment Variable Guide

Configure NCCL_SOCKET_IFNAME for multi-node GPU training on Kubernetes. Network interface selection, bonding, InfiniBand, and troubleshooting NCCL timeouts.

⏱ 15 minutes ncclgpu-trainingdistributed-training
🚀 Deployments intermediate

OpenShift Support Lifecycle: Versions, EOL,...

OpenShift lifecycle: version support matrix, EOL dates for OCP 4.14-4.18, EUS upgrade paths, and end-of-life schedule. Updated for 2026.

⏱ 15 minutes openshiftlifecyclesupport-matrix
⚙️ Configuration intermediate

OpenShift Upgrade Planning for 2026

Plan OpenShift upgrades for 2026. EUS-to-EUS paths, operator compatibility, pre-upgrade checks, canary node pools, and rollback strategy for OCP 4.14 to 4.18.

⏱ 25 minutes openshiftupgradeseus
🤖 AI & GPU advanced

Physical AI and Robotics Orchestration

Orchestrate physical AI and robotics fleets with Kubernetes. ROS 2 on K8s, robot fleet management, edge-cloud hybrid, NVIDIA Isaac.

⏱ 25 minutes physical-airoboticsros2
🚀 Deployments advanced

Platform Engineering on K8s: Build an IDP

Build an internal developer platform on Kubernetes. Backstage, Crossplane, ArgoCD, self-service templates, golden paths.

⏱ 30 minutes platform-engineeringdeveloper-experiencebackstage
🔒 Security advanced

Post-Quantum Cryptography on Kubernetes

Prepare Kubernetes clusters for post-quantum cryptography. NIST PQC standards, hybrid TLS certificates, quantum-safe mTLS, Istio/Cilium integration.

⏱ 20 minutes post-quantumcryptographypqc
🔒 Security advanced

Preemptive Cybersecurity on Kubernetes

Implement preemptive cybersecurity on Kubernetes. Threat prediction, automated vulnerability patching, runtime behavior analysis, CNAPP.

⏱ 20 minutes preemptive-securitythreat-detectioncnapp
🤖 AI & GPU advanced

Quantum Computing on K8s: Hybrid Workflows

Run quantum computing workloads on Kubernetes. Qiskit, Cirq, PennyLane hybrid classical-quantum pipelines, quantum job scheduling, and QPU integration patterns.

⏱ 25 minutes quantum-computingqiskithybrid-workflows
🔒 Security advanced

Sovereign Air-Gapped Kubernetes Clusters

Deploy sovereign and air-gapped Kubernetes clusters. Offline installation, private registry mirrors, disconnected GitOps, sovereign cloud.

⏱ 30 minutes air-gappedsovereignoffline
🔧 Troubleshooting intermediate

Troubleshooting Pods with GPU Devices

Fix GPU device issues in Kubernetes pods. Troubleshoot device plugin errors, DRA claims, CUDA failures, driver mismatches.

⏱ 20 minutes gpu-troubleshootingdevice-pluginnvidia
🤖 AI & GPU advanced

Run:ai Topology-Aware Scheduling Deep Dive

Configure Run:ai topology-aware scheduling for distributed AI workloads. Multi-level hierarchies, required vs preferred placement, LeaderWorkerSet.

⏱ 30 minutes run-aitopology-awaregang-scheduling
🤖 AI & GPU intermediate

NIM Model Profiles and Selection on Kubernetes

Configure NIM_MODEL_PROFILE for NVIDIA NIM deployments on Kubernetes. List profiles, select by ID or name, tune VRAM, and override with vLLM CLI args.

⏱ 25 minutes nvidia-nimmodel-profilesgpu
🤖 AI & GPU advanced

NIM Multi-Node Deployment with Helm on K8s

Deploy NVIDIA NIM across multiple Kubernetes nodes using Helm, LeaderWorkerSet, Ray, and vLLM. Run Llama 405B and DeepSeek-R1 on 16+ GPUs.

⏱ 45 minutes nvidia-nimmulti-nodeleaderworkerset
🤖 AI & GPU intermediate

NIM LLM Support Matrix and GPU Compatibility

Complete NVIDIA NIM support matrix for Kubernetes. Supported models, profiles, precision formats, GPU compatibility, and hardware requirements per model.

⏱ 15 minutes nvidia-nimgpu-compatibilitysupport-matrix
🤖 AI & GPU advanced

NVIDIA Dynamo Distributed Inference

Deploy NVIDIA Dynamo on Kubernetes for disaggregated LLM inference. KV-aware routing, prefill/decode splitting, Grove operator, and zero-config deployment.

⏱ 45 minutes nvidia-dynamodistributed-inferencedisaggregated-serving
🤖 AI & GPU advanced

Rebuild NIM with Custom Model on Kubernetes

Step-by-step guide to deploy custom, fine-tuned, or self-hosted models with NVIDIA NIM on Kubernetes. Model-free NIM from HuggingFace, S3, NGC, or local path.

⏱ 40 minutes nvidia-nimcustom-modelfine-tuning
🤖 AI & GPU advanced

Run:ai + Dynamo Multi-Node Scheduling on K8s

Deploy NVIDIA Dynamo with Run:ai v2.23 for gang scheduling and topology-aware placement. Atomic pod launches, zone co-location, and disaggregated inference.

⏱ 40 minutes nvidia-dynamorun-aigang-scheduling
🤖 AI & GPU intermediate

Copy NVIDIA NIM Images to Internal Quay Reg...

Pull NIM container images from nvcr.io and push to an internal Quay registry. Covers authentication, tagging, air-gapped workflows, and curl token issues.

⏱ 20 minutes nvidia-nimquay-registrycontainer-images
🔒 Security intermediate

CVE-2026-4342: ingress-nginx Code Execution...

Patch CVE-2026-4342 in ingress-nginx — a CVSS 8.8 configuration injection vulnerability enabling arbitrary code execution. Upgrade to v1.13.9, v1.14.

⏱ 15 minutes cveingress-nginxsecurity
🤖 AI & GPU advanced

Deploy Multinode NIM Models on Kubernetes

Run large language models across multiple GPU nodes with NVIDIA NIM. Tensor parallelism, NCCL, InfiniBand, and Kubernetes Job orchestration.

⏱ 45 minutes nvidia-nimmultinodetensor-parallelism
🤖 AI & GPU advanced

Distributed Inference with Run:ai

Deploy distributed AI inference with NVIDIA Run:ai on Kubernetes. Single-node Knative, multinode LeaderWorkerSet, NIM, autoscaling, and observability.

⏱ 40 minutes nvidia-runaidistributed-inferenceknative
💾 Storage intermediate

K8s-IO Benchmark CLI for fio and HammerDB

Run distributed fio and HammerDB storage benchmarks on Kubernetes with K8s-IO, a lightweight Go CLI tool that replaces heavy benchmark operators.

⏱ 15 minutes k8s-iofiohammerdb
🔒 Security advanced

K8s Audit Logging for Enterprise Compliance

Configure API server audit logging for SOC2, HIPAA, and PCI-DSS compliance. Structured audit policies, log shipping, and alerting on suspicious activity.

⏱ 35 minutes audit-loggingcompliancesoc2
⚙️ Configuration intermediate

K8s Change Mgmt for Enterprise Operations

Implement ITIL-aligned change management for Kubernetes with approval gates, maintenance windows, rollback procedures, and change audit trails.

⏱ 35 minutes change-managementitilmaintenance-windows
⚙️ Configuration advanced

Kubernetes Disaster Recovery for Enterprise

Kubernetes disaster recovery with Velero backup and restore. Cross-region replication, etcd snapshots, multi-cluster failover, and RTO/RPO strategies.

⏱ 60 minutes disaster-recoveryveleroetcd-backup
⚡ Autoscaling intermediate

K8s Capacity Planning for Enterprise Clusters

Right-size enterprise clusters with data-driven capacity planning. Forecast resource needs, optimize bin-packing, and plan for growth with metrics.

⏱ 35 minutes capacity-planningresource-optimizationcluster-sizing
🚀 Deployments advanced

Enterprise GitOps at Scale with Fleet Mgmt

Manage hundreds of Kubernetes clusters with ArgoCD ApplicationSets, Flux multi-cluster, and fleet-wide policy enforcement using GitOps principles.

⏱ 45 minutes gitopsargocdfleet-management
🔒 Security advanced

Enterprise Container Image Governance

Enforce image policies with admission controllers. Require signed images, block public registries, and automate vulnerability scanning gates.

⏱ 40 minutes image-governanceadmission-controllerscosign
🔒 Security advanced

Automated Secret Rotation on Kubernetes

Implement zero-downtime secret rotation with External Secrets Operator, HashiCorp Vault dynamic secrets, and rolling restarts for enterprise compliance.

⏱ 40 minutes secret-rotationvaultexternal-secrets
🌐 Networking advanced

Enterprise Service Mesh mTLS & Observability

Deploy Istio service mesh for enterprise mTLS, traffic management, circuit breaking, and distributed tracing across microservices on Kubernetes.

⏱ 50 minutes istioservice-meshmtls
🔒 Security advanced

Kubernetes Multi-Tenancy for Enterprise Teams

Implement secure multi-tenancy with namespace isolation, ResourceQuotas, NetworkPolicies, hierarchical namespaces, and vCluster for strong isolation.

⏱ 50 minutes multi-tenancynamespace-isolationresource-quotas
🔒 Security advanced

K8s OIDC Integration with Enterprise SSO

Configure Kubernetes API server OIDC authentication with Keycloak, Azure AD, or Okta for enterprise single sign-on and group-based RBAC.

⏱ 45 minutes oidcenterprise-ssokeycloak
🤖 AI & GPU advanced

Run:ai NIM Distributed Inference Tutorial

Step-by-step guide to deploy DeepSeek-R1 distributed inference on Run:ai with LeaderWorkerSet, SGLang, PVC caching, and OpenShift security.

⏱ 60 minutes nvidia-runainvidia-nimdistributed-inference
🚀 Deployments intermediate

Argo Workflows on Kubernetes: CI/CD Guide

Run CI/CD pipelines and data workflows with Argo Workflows on Kubernetes. Create DAG-based workflows, parallel steps, artifact passing, and cron workflows.

⏱ 15 minutes argo-workflowsci-cdpipeline
💾 Storage advanced

Distributed fio Storage Benchmark K8s

Run distributed fio benchmarks on Kubernetes and OpenShift to test storage performance at scale. Covers fio-distributed with k8s Jobs, Red Hat dbench.

⏱ 15 minutes fiostorage-benchmarkopenshift
🌐 Networking intermediate

External DNS for Kubernetes: Setup Guide

Automate DNS record management with ExternalDNS for Kubernetes. Sync Service and Ingress hostnames to Route53, CloudFlare, Google Cloud DNS, and 30+ providers.

⏱ 15 minutes external-dnsdnsroute53
🔒 Security intermediate

Falco Runtime Security for Kubernetes

Deploy Falco for Kubernetes runtime threat detection. Detect shell spawns in containers, privilege escalation, sensitive file access, and suspicious network

⏱ 15 minutes falcoruntime-securitythreat-detection
🎯 Helm intermediate

Helm Chart Testing & CI/CD Pipeline Integra...

Test Helm charts automatically with ct (chart-testing), helm unittest, and GitHub Actions. Validate templates, lint values.

⏱ 15 minutes helmtestingci-cd
🎯 Helm intermediate

Helm Hooks Database Migrations & Lifecycle ...

Use Helm hooks to run database migrations, backups, and validation jobs during install, upgrade, and rollback. Control execution order with hook weights an.

⏱ 15 minutes helmhooksdatabase-migration
🎯 Helm advanced

Helm Library Charts for Reusable Templates

Create Helm library charts to share common templates across multiple charts. DRY up deployments, services, and config patterns with reusable library functions.

⏱ 15 minutes helmlibrary-charttemplates
🎯 Helm intermediate

Helm OCI Registry for Chart Distribution

Store and distribute Helm charts using OCI registries like GHCR, ECR, ACR, and Harbor. Migrate from ChartMuseum to OCI-native chart management.

⏱ 15 minutes helmociregistry
🎯 Helm intermediate

Helm Secrets Mgmt with SOPS & Age Encryption

Encrypt Helm values files using SOPS with Age or GPG keys. Manage secrets in Git safely with helm-secrets plugin for transparent encrypt/decrypt workflows.

⏱ 15 minutes helmsecretssops
💾 Storage intermediate

OpenShift Storage Benchmark fio Config Prof...

Benchmark OpenShift and Kubernetes storage using fio with reusable YAML config profiles for random and sequential read/write I/O patterns.

⏱ 15 minutes fioopenshiftstorage-benchmark
⚡ Autoscaling advanced

Karpenter Node Autoscaling for K8s on AWS

Deploy Karpenter for fast, flexible node autoscaling on AWS EKS. Configure NodePools, EC2NodeClasses, and consolidation for real cost savings.

⏱ 15 minutes karpenterawseks
🤖 AI & GPU advanced

Kubeflow Operator: Full ML Platform

Deploy the complete Kubeflow platform on Kubernetes with the Kubeflow Operator. Covers Pipelines, Notebooks, KServe, Katib, and multi-tenant ML workflows.

⏱ 15 minutes kubeflowmlopsoperator
⚙️ Configuration intermediate

Kubernetes Affinity and Anti-Affinity Guide

Schedule pods with Kubernetes node affinity, pod affinity, and anti-affinity rules. Spread across zones, co-locate related services, and optimize

⏱ 15 minutes affinityanti-affinityscheduling
⚡ Autoscaling advanced

Advanced Cluster Autoscaler Config & Tuning

Fine-tune the Kubernetes Cluster Autoscaler with expanders, priority-based scaling, mixed instance policies, and GPU node pool autoscaling for production c.

⏱ 15 minutes cluster-autoscalernode-scalinggpu
🌐 Networking intermediate

Kubernetes ClusterIP Service Explained

Understand Kubernetes ClusterIP services for internal communication. How kube-proxy routes traffic, DNS resolution, and when ClusterIP is the right service

⏱ 15 minutes clusteripserviceinternal
⚙️ Configuration intermediate

Essential Kubernetes Commands Reference

Master the most used Kubernetes commands for daily operations. Complete kubectl reference for pods, deployments, services, debugging, and cluster management.

⏱ 15 minutes kubectlcommandsreference
⚙️ Configuration intermediate

ConfigMap Patterns in Kubernetes

Create and use Kubernetes ConfigMaps for application configuration. Mount as files, inject as environment variables, and manage config updates without

⏱ 15 minutes configmapconfigurationenvironment-variables
🚀 Deployments intermediate

Kubernetes CronJob Scheduling Guide

Schedule recurring tasks with Kubernetes CronJobs. Covers cron syntax, timezone support, concurrency policies, job history, manual triggers, and monitoring.

⏱ 15 minutes cronjobschedulingcron
🚀 Deployments intermediate

Kubernetes DaemonSet Complete Guide

Deploy DaemonSets in Kubernetes to run one pod per node. Covers monitoring agents, log collectors, CNI plugins, node affinity, and rolling update strategies.

⏱ 15 minutes daemonsetper-nodemonitoring
🌐 Networking intermediate

Kubernetes DNS and CoreDNS Guide

Understand Kubernetes DNS resolution with CoreDNS. Debug DNS issues, configure custom DNS, and optimize DNS performance for large clusters.

⏱ 15 minutes dnscorednsservice-discovery
🌐 Networking intermediate

Kubernetes Ingress Complete Guide

Configure Kubernetes Ingress for HTTP routing, TLS termination, and path-based routing. Covers NGINX Ingress Controller, cert-manager, and Ingress vs Gateway

⏱ 15 minutes ingressnginx-ingresstls
🚀 Deployments intermediate

Kubernetes Jobs and CronJobs Guide

Run batch workloads with Kubernetes Jobs and CronJobs. Covers one-shot tasks, parallel processing, scheduled jobs, failure handling, and cleanup policies.

⏱ 15 minutes jobcronjobbatch
⚙️ Configuration intermediate

Kubernetes Labels and Selectors Explained

Use Kubernetes labels and selectors to organize and query resources. Covers label conventions, selector types, recommended labels, and label-based operations.

⏱ 15 minutes labelsselectorsorganization
🌐 Networking intermediate

Kubernetes LoadBalancer Service Guide

Expose Kubernetes services with LoadBalancer type for production traffic. Covers cloud providers, MetalLB for bare-metal, health checks, and cost optimization.

⏱ 15 minutes loadbalancerserviceexternal-access
🌐 Networking intermediate

Kubernetes NodePort Service Explained

Expose Kubernetes services externally with NodePort. Understand port ranges, security implications, and when to use NodePort vs LoadBalancer vs Ingress.

⏱ 15 minutes nodeportserviceexternal-access
💾 Storage intermediate

Persistent Volume NFS iSCSI Guide

Master Kubernetes PersistentVolumes: static and dynamic provisioning, reclaim policies, volume modes, and lifecycle. From PV creation to pod mounting and data

⏱ 15 minutes persistent-volumepvstorage
⚙️ Configuration intermediate

Kubernetes Pod Lifecycle Explained

Understand the Kubernetes pod lifecycle from creation to termination. Covers pod phases, container states, init containers, hooks, and graceful shutdown

⏱ 15 minutes pod-lifecyclephaseshooks
💾 Storage intermediate

PVC Storage Provisioning in Kubernetes

Create and manage Kubernetes PersistentVolumeClaims and PersistentVolumes. Covers dynamic provisioning, StorageClasses, access modes, volume

⏱ 15 minutes pvcpersistent-volumestorage
🚀 Deployments intermediate

Kubernetes Rolling Update Strategy Guide

Configure Kubernetes rolling update strategy for zero-downtime deployments. Tune maxSurge, maxUnavailable, minReadySeconds, and rollback procedures.

⏱ 15 minutes rolling-updatedeployment-strategyzero-downtime
🔒 Security intermediate

Secrets Encryption Rotation K8s Guide

Manage Kubernetes Secrets for passwords, tokens, and certificates. Covers creation, encryption at rest, external secret operators, and security best practices.

⏱ 15 minutes secretsencryptionsecurity
🌐 Networking intermediate

Kubernetes Service Types Explained

Compare all Kubernetes service types: ClusterIP, NodePort, LoadBalancer, ExternalName, and headless. Choose the right type for internal, external, and hybrid

⏱ 15 minutes service-typesclusteripnodeport
⚙️ Configuration intermediate

Taints and Tolerations in Kubernetes

Control pod scheduling with Kubernetes taints and tolerations. Dedicate nodes for specific workloads, prevent scheduling on control plane nodes, and handle GPU

⏱ 15 minutes taintstolerationsscheduling
🚀 Deployments intermediate

KubeVirt: Run VMs on Kubernetes

Run virtual machines alongside containers on Kubernetes with KubeVirt. Covers VM creation, live migration, GPU passthrough, and VM-to-container networking.

⏱ 15 minutes kubevirtvirtual-machinesvm
🚀 Deployments intermediate

Tekton Pipelines on Kubernetes

Build cloud-native CI/CD pipelines with Tekton on Kubernetes. Create reusable Tasks, Pipelines, triggers, and integrate with Git webhooks for automated builds.

⏱ 15 minutes tektonci-cdpipeline
🚀 Deployments intermediate

WebAssembly Runtime with Spin and SpinKube

Deploy WebAssembly workloads on Kubernetes using SpinKube and the Spin Operator. Run Wasm components alongside containers with sub-millisecond cold starts.

⏱ 15 minutes wasmspinkubespin
🚀 Deployments advanced

WASI and containerd Wasm Shims on Kubernetes

Run WebAssembly workloads using containerd Wasm shims with WASI support on Kubernetes. Configure runwasi, wasmtime, and WasmEdge as container runtimes.

⏱ 15 minutes wasmwasicontainerd
🚀 Deployments intermediate

Serverless Functions with WebAssembly

Build serverless functions using WebAssembly on Kubernetes with Fermyon Cloud, KEDA, and SpinKube. Achieve sub-millisecond scale-to-zero with Wasm cold starts.

⏱ 15 minutes wasmserverlesskeda
⚡ Autoscaling intermediate

Kubernetes Cluster Autoscaler Setup Guide

Configure the Cluster Autoscaler to automatically add and remove nodes based on pod scheduling demands. Covers AWS, GKE, Azure, and bare-metal setups.

⏱ 15 minutes cluster-autoscalernode-scalingcloud
⚡ Autoscaling intermediate

KEDA: Event-Driven Autoscaling for Kubernetes

Scale Kubernetes workloads with KEDA based on external events: queue depth, cron schedules, Prometheus metrics, HTTP traffic, and 60+ event sources.

⏱ 15 minutes kedaevent-drivenautoscaling
📊 Observability intermediate

Kubernetes Alerting Best Practices

Design effective Kubernetes alerts that reduce noise and catch real issues. Covers severity tiers, golden signals, runbook links, and fatigue prevention.

⏱ 15 minutes alertingprometheusalertmanager
🚀 Deployments intermediate

Blue-Green Deployment in Kubernetes

Implement blue-green deployments in Kubernetes for instant rollback. Covers Service selector switching, Argo Rollouts blue-green, and comparison with canary

⏱ 15 minutes blue-greendeployment-strategyzero-downtime
🚀 Deployments intermediate

Canary Deployment in Kubernetes

Implement canary deployments in Kubernetes to gradually roll out changes. Covers native K8s, Argo Rollouts, Istio traffic splitting, and automated analysis.

⏱ 15 minutes canarydeployment-strategyprogressive-delivery
⚙️ Configuration intermediate

Kubernetes Cordon, Drain, and Uncordon Nodes

Safely manage Kubernetes nodes with cordon, drain, and uncordon. Prepare nodes for maintenance, upgrades, and decommissioning without disrupting workloads.

⏱ 15 minutes cordondrainnode-maintenance
📊 Observability beginner

Kubernetes Cost Monitoring with Kubecost

Monitor and optimize Kubernetes costs with Kubecost. Track per-namespace and per-deployment spend with cloud billing integration and savings tips.

⏱ 15 minutes kubecostcost-monitoringfinops
⚡ Autoscaling advanced

Custom Metrics Autoscaling in Kubernetes

Scale Kubernetes pods on custom application metrics with Prometheus Adapter. Configure HPA with custom and external metrics beyond CPU and memory.

⏱ 15 minutes custom-metricsprometheus-adapterhpa
🔧 Troubleshooting intermediate

Debug Kubernetes Pods: Complete Guide

Debug Kubernetes pods with kubectl debug, ephemeral containers, and netshoot. Troubleshoot distroless images, network issues, and crashed pods step by step.

⏱ 15 minutes debugkubectl-debugephemeral-containers
🌐 Networking intermediate

Kubernetes EndpointSlices Explained

Understand Kubernetes EndpointSlices for scalable service endpoint management. How they improve on Endpoints objects for large clusters with thousands of pods.

⏱ 15 minutes endpointslicesendpointsservice-discovery
🚀 Deployments intermediate

Graceful Shutdown Pod Termination

Implement graceful shutdown in Kubernetes pods. Handle SIGTERM, drain connections, use preStop hooks, and configure terminationGracePeriodSeconds correctly.

⏱ 15 minutes graceful-shutdownsigtermprestop
🌐 Networking intermediate

Kubernetes Headless Service Explained

Create Kubernetes headless services for StatefulSet DNS, direct pod addressing, and service discovery. Understand when clusterIP None is the right choice.

⏱ 15 minutes headless-servicestatefulsetdns
⚙️ Configuration intermediate

Kubernetes Health Checks Best Practices

Design effective Kubernetes health checks with liveness, readiness, and startup probes. Avoid common anti-patterns like database checks in liveness probes.

⏱ 15 minutes health-checksprobesliveness
⚙️ Configuration intermediate

Kubernetes Init Containers Guide

Use Kubernetes init containers to run setup tasks before your main application starts. Covers database migrations, config generation, dependency

⏱ 15 minutes init-containersstartupmigrations
⚙️ Configuration intermediate

Kubernetes LimitRange and ResourceQuota

Configure LimitRange and ResourceQuota in Kubernetes namespaces. Set default resource requests, enforce limits, and prevent resource exhaustion across teams.

⏱ 15 minutes limitrangeresourcequotaresource-management
💾 Storage advanced

Rook-Ceph: Distributed Storage for Kubernetes

Deploy Rook-Ceph on Kubernetes for distributed block, file, and object storage. Covers installation, CephCluster configuration, StorageClasses, and monitoring.

⏱ 15 minutes rookcephdistributed-storage
🔒 Security intermediate

Kubernetes Service Accounts Guide

Create and manage Kubernetes service accounts for pod identity. Covers RBAC binding, token projection, workload identity, and least-privilege access

⏱ 15 minutes service-accountrbactokens
⚙️ Configuration intermediate

Kubernetes Sidecar Containers Pattern

Implement the sidecar pattern in Kubernetes for logging, proxying, syncing, and monitoring alongside your main application container. Covers native K8s 1.28+

⏱ 15 minutes sidecarmulti-containerlogging
💾 Storage intermediate

K8s Storage Best Practices for Production

Production storage best practices for Kubernetes. StorageClass selection, backup strategies, volume expansion, data migration, and performance tuning.

⏱ 15 minutes storagebest-practicesproduction
🔧 Troubleshooting intermediate

Kubernetes Troubleshooting Flowchart

Systematic Kubernetes troubleshooting guide with flowcharts. Debug pods, services, networking, storage, and node issues step by step with kubectl commands.

⏱ 15 minutes troubleshootingdebuggingflowchart
🚀 Deployments intermediate

Zero-Downtime Deployment in Kubernetes

Achieve zero-downtime deployments in Kubernetes. Covers readiness probes, PDBs, preStop hooks, rolling update tuning, and connection draining best practices.

⏱ 15 minutes zero-downtimerolling-updategraceful-shutdown
⚡ Autoscaling advanced

Virtual Kubelet for Serverless K8s Scaling

Deploy Virtual Kubelet to burst Kubernetes workloads to serverless backends like Azure ACI, AWS Fargate, and Hashicorp Nomad for infinite scaling.

⏱ 15 minutes virtual-kubeletserverlessburst-scaling
🚀 Deployments beginner

Deployment vs StatefulSet in Kubernetes

Choose between Deployment and StatefulSet for your Kubernetes workloads. Compare identity, storage, ordering, scaling, and use cases for each controller.

⏱ 15 minutes deploymentstatefulsetcomparison
⚙️ Configuration intermediate

Kubernetes Node and Pod Affinity Guide

Configure node affinity, pod affinity, and anti-affinity rules for advanced Kubernetes scheduling. Control pod placement across zones, nodes, and topologies.

⏱ 15 minutes affinityanti-affinityscheduling
⚙️ Configuration beginner

Kubernetes Annotations Complete Guide

Use Kubernetes annotations for metadata, automation, and controller config. Common patterns for ingress annotations, Helm labels, and triggers.

⏱ 15 minutes annotationsmetadataingress
⚙️ Configuration intermediate

Kubernetes Backup and Restore with Velero

Backup and restore Kubernetes clusters with Velero. Covers namespace backups, scheduled backups, disaster recovery, and migration between clusters.

⏱ 15 minutes backuprestorevelero
🚀 Deployments intermediate

Kubernetes CI/CD Pipeline with GitHub Actions

Build a complete CI/CD pipeline for Kubernetes with GitHub Actions. Covers Docker build, image push, Helm deploy, and automated rollback on failure.

⏱ 15 minutes ci-cdgithub-actionspipeline
⚙️ Configuration advanced

Kubernetes Cluster Upgrade Step-by-Step

Upgrade Kubernetes clusters safely with kubeadm. Covers pre-flight checks, control plane upgrade, worker node drain, and rollback procedures.

⏱ 15 minutes upgradekubeadmcluster-management
🚀 Deployments beginner

Kubernetes Deployment Complete Guide

Create and manage Kubernetes Deployments for stateless applications. Covers replicas, selectors, rolling updates, rollback, and deployment strategies.

⏱ 15 minutes deploymentreplicasrolling-update
🌐 Networking intermediate

Kubernetes DNS: How Service Discovery Works

Understand Kubernetes DNS resolution with CoreDNS. Service discovery, pod DNS, headless services, custom DNS policies, and troubleshooting DNS failures.

⏱ 15 minutes dnscorednsservice-discovery
💾 Storage beginner

Kubernetes emptyDir Volume Explained

Use emptyDir volumes in Kubernetes for temporary storage, shared data between containers, and cache. Covers medium types, size limits, and tmpfs backing.

⏱ 15 minutes emptydirvolumestemporary-storage
⚙️ Configuration beginner

Kubernetes Environment Variables Guide

Set Kubernetes environment variables with envFrom, configMapRef, secretKeyRef, and the Downward API. Variable ordering, fieldRef, and best practices.

⏱ 15 minutes environment-variablesenvconfigmap
🔧 Troubleshooting beginner

kubectl exec: Run Commands Inside K8s Pods

Use kubectl exec to run commands inside Kubernetes pods. Covers interactive sessions, multi-container pods, and ephemeral container debugging.

⏱ 15 minutes kubectl-execdebuggingshell
🎯 Helm beginner

Helm vs Kustomize: Which to Use

Compare Helm and Kustomize for Kubernetes configuration management. Covers templating vs overlays, use cases, pros and cons, and when to use both together.

⏱ 15 minutes helmkustomizecomparison
🔧 Troubleshooting beginner

Fix ImagePullBackOff in Kubernetes

Debug and fix ImagePullBackOff errors in Kubernetes. Covers wrong image names, private registry auth, rate limits, and network connectivity issues.

⏱ 15 minutes imagepullbackofftroubleshootingregistry
🌐 Networking beginner

K8s Ingress: Routing, TLS, and Controllers

Configure Kubernetes Ingress for HTTP routing, TLS termination, and path-based routing. Covers NGINX, Traefik, and HAProxy ingress controllers.

⏱ 15 minutes ingressroutingtls
⚙️ Configuration beginner

Kubernetes Labels and Selectors Guide

Master Kubernetes labels and selectors for organizing and querying resources. Label conventions, equality selectors, set-based selectors, and field selectors.

⏱ 15 minutes labelsselectorsorganization
🌐 Networking intermediate

Kubernetes Load Balancing Strategies

Configure Kubernetes load balancing with Services, Ingress, and Gateway API. Round-robin, session affinity, weighted routing, and traffic policy.

⏱ 15 minutes load-balancingserviceingress
🚀 Deployments beginner

K8s Local Development with Minikube and Kind

Set up local Kubernetes clusters for development with Minikube, Kind, and k3d. Covers installation, configuration, local registries, and hot-reload workflows.

⏱ 15 minutes minikubekindk3d
📊 Observability intermediate

EFK Stack: Kubernetes Centralized Logging

Deploy EFK stack for Kubernetes centralized logging. Elasticsearch, Fluentd, Kibana setup, log collection, parsing, and retention policies.

⏱ 15 minutes loggingelasticsearchfluentd
📊 Observability intermediate

K8s Monitoring with Prometheus and Grafana

Set up Kubernetes monitoring with Prometheus and Grafana. Covers kube-prometheus-stack, custom dashboards, alerting rules, and key metrics to monitor.

⏱ 15 minutes monitoringprometheusgrafana
🔒 Security advanced

Kubernetes Multi-Tenancy Patterns

Implement multi-tenancy in Kubernetes with namespaces, RBAC, quotas, network policies, and virtual clusters. Covers soft and hard tenancy models.

⏱ 15 minutes multi-tenancynamespacesisolation
🔒 Security intermediate

Kubernetes Security Checklist for Production

Production security checklist for Kubernetes clusters. Covers RBAC, network policies, pod security, secrets encryption, audit logging, and image scanning.

⏱ 15 minutes security-checklisthardeningproduction
🔧 Troubleshooting beginner

Debug and Fix OOMKilled Errors in Kubernetes

Debug and fix OOMKilled errors in Kubernetes. Find memory leaks, set correct limits, use VPA for right-sizing, and prevent container OOM kills.

⏱ 15 minutes oomkilledmemoryout-of-memory
🚀 Deployments advanced

Kubernetes Operator Pattern Explained

Build and use Kubernetes Operators for automated application management. Covers the operator pattern, CRDs, controller-runtime, and Operator SDK.

⏱ 15 minutes operatorcrdcustom-resource
🔧 Troubleshooting intermediate

Kubernetes Pod Eviction: Causes and Prevention

Understand why Kubernetes evicts pods and how to prevent it. Covers resource pressure, priority classes, PDBs, and eviction policies.

⏱ 15 minutes evictionresource-pressurepriority-class
⚙️ Configuration beginner

Kubernetes Pod Lifecycle and States Explained

Understand the Kubernetes pod lifecycle from Pending to Terminated. Covers pod phases, container states, restart policies, graceful shutdown, and preStop hooks.

⏱ 15 minutes pod-lifecyclephasesgraceful-shutdown
⚙️ Configuration beginner

kubectl Port-Forward: Access Pods and Services

Use kubectl port-forward to access Kubernetes pods, services, and deployments from your local machine. Debug, test, and access internal services securely.

⏱ 15 minutes port-forwardkubectldebugging
🔒 Security intermediate

K8s RBAC: Roles, ClusterRoles, and Bindings

Configure Kubernetes RBAC with Roles, ClusterRoles, RoleBindings, and service accounts. Least privilege access control for users, groups, and applications.

⏱ 15 minutes rbacrolesclusterrole
🚀 Deployments beginner

Kubernetes ReplicaSet Explained

Understand ReplicaSets in Kubernetes for maintaining pod replicas. Covers selectors, scaling, ownership, and why you should use Deployments instead.

⏱ 15 minutes replicasetreplicasscaling
⚙️ Configuration beginner

Kubernetes Resource Requests and Limits Guide

Configure CPU and memory requests and limits in Kubernetes. Understand QoS classes, OOMKilled, CPU throttling, and right-sizing with VPA recommendations.

⏱ 15 minutes resourcesrequestslimits
🔒 Security beginner

Kubernetes Secrets: Create, Use, and Secure

Create and manage Kubernetes Secrets for sensitive data. Covers types, encoding, mounting, external secrets operators, and encryption at rest best practices.

⏱ 15 minutes secretssecurityencryption
⚙️ Configuration intermediate

Kubernetes Taints and Tolerations Guide

Use Kubernetes taints and tolerations to control pod scheduling. Dedicate nodes for GPU workloads, isolate teams, and prevent scheduling on specific nodes.

⏱ 15 minutes taintstolerationsscheduling
💾 Storage beginner

Kubernetes Volume Types Explained

Compare all Kubernetes volume types: emptyDir, hostPath, PVC, ConfigMap, Secret, NFS, CSI, and projected volumes. When to use each type with examples.

⏱ 15 minutes volumesemptydirhostpath
🚀 Deployments intermediate

Air-Gapped Image Import for OpenShift Clusters

Import container images into disconnected OpenShift clusters. Use podman save/load and internal registries when DNS and TLS block external pulls.

⏱ 15 minutes air-gappeddisconnectedpodman
🔧 Troubleshooting advanced

Fix API Server Timeout and Overload

Debug kubectl timeouts, API server overload, and connection refused errors. Covers etcd latency, webhook timeouts, and rate limiting.

⏱ 15 minutes api-servertimeoutconnectivity
🚀 Deployments advanced

Backstage Developer Portal on Kubernetes

Deploy Spotify Backstage on Kubernetes as an internal developer portal. Covers Helm install, PostgreSQL backend, catalog entities, and TechDocs integration.

⏱ 15 minutes backstagedeveloper-portalidp
🔒 Security advanced

Fix Kubernetes Certificate Expiry Issues

Debug and renew expired Kubernetes certificates for API server, kubelet, and etcd. Covers kubeadm cert renewal, OpenShift auto-rotation, and monitoring expiry.

⏱ 15 minutes certificatestlsexpiry
🚀 Deployments advanced

Cluster API for K8s Lifecycle Management

Manage Kubernetes cluster lifecycle with Cluster API. Declarative cluster creation, upgrades, scaling, and multi-cloud infrastructure provisioning as code.

⏱ 15 minutes cluster-apicapiinfrastructure
🔒 Security advanced

Confidential Computing on Kubernetes

Deploy confidential containers with encrypted memory using Intel SGX, AMD SEV-SNP, and Kata Containers. Protect data in use from even the cluster admin.

⏱ 15 minutes confidential-computingsgxsev-snp
⚙️ Configuration intermediate

Fix ConfigMap Changes Not Applied to Pods

Debug ConfigMap updates not reflected in running pods. Covers volume mount propagation delays, env var immutability, and sidecar-based reload strategies.

⏱ 15 minutes configmaphot-reloadvolumes
🔧 Troubleshooting intermediate

Fix CoreDNS Resolution Failures in Kubernetes

Debug DNS resolution failures in Kubernetes pods. Covers CoreDNS crashes, NXDOMAIN errors, ndots configuration, and upstream DNS timeouts.

⏱ 15 minutes corednsdnsnetworking
🔧 Troubleshooting beginner

How to Fix CrashLoopBackOff in Kubernetes

Fix CrashLoopBackOff in Kubernetes with step-by-step troubleshooting. Debug OOMKilled, failed probes, missing configs, and image errors causing pod crash loops.

⏱ 15 minutes crashloopbackoffpodsdebugging
🔧 Troubleshooting advanced

Fix etcd High Latency and Slow API Server

Debug etcd performance issues causing slow kubectl responses and API server timeouts. Covers disk I/O, compaction, defragmentation, and leader elections.

⏱ 15 minutes etcdperformanceapi-server
🔧 Troubleshooting advanced

Fix fio libaio Silent Exit on OpenShift cru...

Debug fio instantly exiting with no output on crun-based OpenShift nodes. The root cause is seccomp blocking libaio syscalls — fix with psync or unconfined.

⏱ 15 minutes fiolibaioseccomp
🎯 Helm intermediate

Helm Chart Development from Scratch

Build production-ready Helm charts with templates, values, helpers, hooks, tests, and CI validation. Complete guide from chart create to publishing.

⏱ 15 minutes helmchart-developmenttemplates
🎯 Helm intermediate

Fix Helm Upgrade Failed and Rollback

Debug failed Helm releases stuck in pending-upgrade or failed state. Covers atomic upgrades, manual rollback, secret storage cleanup, and history limits.

⏱ 15 minutes helmupgraderollback
🔧 Troubleshooting beginner

ImagePullBackOff Troubleshooting Guide

Debug and resolve ImagePullBackOff errors including auth failures, wrong tags, private registry access, and rate limiting from Docker Hub and Quay.

⏱ 15 minutes imagepullbackoffregistrypull-secret
🌐 Networking intermediate

Fix Ingress 502 and 503 Gateway Errors

Debug 502 Bad Gateway and 503 Service Unavailable from Kubernetes ingress controllers. Fix backend health and timeout issues.

⏱ 15 minutes ingressnginx502
🚀 Deployments beginner

Install ArgoCD on AlmaLinux: Step-by-Step

Deploy ArgoCD on Kubernetes running on AlmaLinux. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on Amazon Linux

Deploy ArgoCD on Kubernetes running on Amazon Linux. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on Arch Linux: Step-by-Step

Deploy ArgoCD on Kubernetes running on Arch Linux. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on CentOS Stream

Deploy ArgoCD on Kubernetes running on CentOS Stream. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on Debian: Step-by-Step Guide

Deploy ArgoCD on Kubernetes running on Debian. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on Fedora: Step-by-Step Guide

Deploy ArgoCD on Kubernetes running on Fedora. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on openSUSE: Step-by-Step

Deploy ArgoCD on Kubernetes running on openSUSE. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on Oracle Linux

Deploy ArgoCD on Kubernetes running on Oracle Linux. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on RHEL: Step-by-Step Guide

Deploy ArgoCD on Kubernetes running on RHEL. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on Rocky Linux Step-by-Step

Deploy ArgoCD on Kubernetes running on Rocky Linux. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on SUSE SLES: Step-by-Step

Deploy ArgoCD on Kubernetes running on SUSE SLES. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🚀 Deployments beginner

Install ArgoCD on Ubuntu: Step-by-Step Guide

Deploy ArgoCD on Kubernetes running on Ubuntu. GitOps continuous delivery with automated sync, self-healing, and multi-cluster support.

⏱ 15 minutes argocdgitopsinstallation
🎯 Helm beginner

Install Helm on AlmaLinux: Setup Guide

Install Helm 3 on AlmaLinux and configure chart repositories. Covers package manager install, script install, and shell completion for AlmaLinux 8/9.

⏱ 15 minutes helminstallationalma-linux
🎯 Helm beginner

Install Helm on Amazon Linux: Setup Guide

Install Helm on Amazon Linux 2023 and AL2. Three install methods, chart repository setup, shell completion, and troubleshooting for Amazon Linux environments.

⏱ 15 minutes helminstallationamazon-linux
🎯 Helm beginner

Install Helm on Arch Linux: Setup Guide

Install Helm 3 on Arch Linux and configure chart repositories. Covers package manager install, script install, and shell completion for Arch Linux rolling.

⏱ 15 minutes helminstallationarch-linux
🎯 Helm beginner

Install Helm on CentOS Stream Setup Guide

Install Helm 3 on CentOS Stream and configure chart repositories. Covers package manager install, script install, and shell completion for CentOS Stream 9.

⏱ 15 minutes helminstallationcentos-stream
🎯 Helm beginner

Install Helm on Debian: Setup Guide

Install Helm 3 on Debian and configure chart repositories. Covers package manager install, script install, and shell completion for Debian 11/12.

⏱ 15 minutes helminstallationdebian
🎯 Helm beginner

Install Helm on Fedora: Setup Guide

Install Helm 3 on Fedora and configure chart repositories. Covers package manager install, script install, and shell completion for Fedora 39/40.

⏱ 15 minutes helminstallationfedora
🎯 Helm beginner

Install Helm on openSUSE: Setup Guide

Install Helm 3 on openSUSE with package manager or script. Configure chart repos and shell completion for openSUSE Leap 15 / Tumbleweed.

⏱ 15 minutes helminstallationopensuse
🎯 Helm beginner

Install Helm on Oracle Linux: Setup Guide

Install Helm 3 on Oracle Linux and configure chart repositories. Covers package manager install, script install, and shell completion for Oracle Linux 8/9.

⏱ 15 minutes helminstallationoracle-linux
🎯 Helm beginner

Install Helm on RHEL: Complete Setup Guide

Install Helm 3 on RHEL and configure chart repositories. Covers package manager install, script install, and shell completion for RHEL 8/9.

⏱ 15 minutes helminstallationrhel
🎯 Helm beginner

Install Helm on Rocky Linux: Setup Guide

Install Helm 3 on Rocky Linux and configure chart repositories. Covers package manager install, script install, and shell completion for Rocky Linux 8/9.

⏱ 15 minutes helminstallationrocky-linux
🎯 Helm beginner

Install Helm on SUSE SLES: Setup Guide

Install Helm 3 on SUSE SLES and configure chart repositories. Covers package manager install, script install, and shell completion for SLES 15.

⏱ 15 minutes helminstallationsuse-sles
🎯 Helm beginner

Install Helm on Ubuntu: Setup Guide

Install Helm 3 on Ubuntu and configure chart repositories. Covers package manager install, script install, and shell completion for Ubuntu 22.04/24.04.

⏱ 15 minutes helminstallationubuntu
🚀 Deployments beginner

Install Kubernetes on AlmaLinux

Step-by-step guide to install Kubernetes on AlmaLinux with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for AlmaLinux 8/9.

⏱ 15 minutes kubernetesinstallationalma-linux
🚀 Deployments beginner

Install Kubernetes on Amazon Linux

Install Kubernetes on Amazon Linux with kubeadm. Covers containerd setup, kubeadm init, Calico CNI, and worker node joining for Amazon Linux 2023.

⏱ 15 minutes kubernetesinstallationamazon-linux
🚀 Deployments beginner

Install Kubernetes on Arch Linux

Step-by-step guide to install Kubernetes on Arch Linux with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for Arch Linux rolling.

⏱ 15 minutes kubernetesinstallationarch-linux
🚀 Deployments beginner

Install Kubernetes on CentOS Stream

Step-by-step guide to install Kubernetes on CentOS Stream with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for CentOS Stream 9.

⏱ 15 minutes kubernetesinstallationcentos-stream
🚀 Deployments beginner

Install Kubernetes on Debian: Setup Guide

Step-by-step guide to install Kubernetes on Debian with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for Debian 11/12.

⏱ 15 minutes kubernetesinstallationdebian
🚀 Deployments beginner

Install Kubernetes on Fedora: Setup Guide

Step-by-step guide to install Kubernetes on Fedora with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for Fedora 39/40.

⏱ 15 minutes kubernetesinstallationfedora
🚀 Deployments beginner

Install Kubernetes on openSUSE

Install Kubernetes on openSUSE with kubeadm. Covers containerd setup, kubeadm init, Calico CNI, and worker node joining for openSUSE Leap 15 / Tumbleweed.

⏱ 15 minutes kubernetesinstallationopensuse
🚀 Deployments beginner

Install Kubernetes on Oracle Linux

Step-by-step guide to install Kubernetes on Oracle Linux with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for Oracle Linux 8/9.

⏱ 15 minutes kubernetesinstallationoracle-linux
🚀 Deployments beginner

Install Kubernetes on RHEL: Setup Guide

Step-by-step guide to install Kubernetes on RHEL with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for RHEL 8/9.

⏱ 15 minutes kubernetesinstallationrhel
🚀 Deployments beginner

Install Kubernetes on Rocky Linux

Step-by-step guide to install Kubernetes on Rocky Linux with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for Rocky Linux 8/9.

⏱ 15 minutes kubernetesinstallationrocky-linux
🚀 Deployments beginner

Install Kubernetes on SUSE SLES

Step-by-step guide to install Kubernetes on SUSE SLES with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for SLES 15.

⏱ 15 minutes kubernetesinstallationsuse-sles
🚀 Deployments beginner

Install Kubernetes on Ubuntu: Setup Guide

Step-by-step guide to install Kubernetes on Ubuntu with kubeadm. Covers containerd, kubeadm init, CNI setup, and worker node joining for Ubuntu 22.04/24.04.

⏱ 15 minutes kubernetesinstallationubuntu
🔧 Troubleshooting intermediate

Fix Kubernetes Job Failures and Retries

Debug Kubernetes Jobs stuck in backoff or hitting retry limits. Covers backoffLimit, activeDeadlineSeconds, and CronJob overlap.

⏱ 15 minutes jobscronjobbackoff
⚡ Autoscaling advanced

Karpenter Node Autoscaling for Kubernetes

Replace Cluster Autoscaler with Karpenter for faster, smarter node provisioning. Right-sized instances, spot fallback, consolidation, and GPU-aware scaling.

⏱ 15 minutes karpenterautoscalingnodes
🔧 Troubleshooting intermediate

Fix Kubelet NotReady and Node Pressure Issues

Debug kubelet NotReady status, node pressure conditions, and eviction issues. Covers disk pressure, memory pressure, PID pressure, and network not ready.

⏱ 15 minutes kubeletnodenotready
🔒 Security advanced

Kubernetes Admission Controllers and Webhooks

Build validating and mutating admission webhooks for Kubernetes. Policy enforcement with OPA Gatekeeper, Kyverno, and custom webhooks.

⏱ 15 minutes admission-controllerswebhooksopa
⚙️ Configuration beginner

Kubernetes API Deprecation Migration Guide

Migrate deprecated Kubernetes APIs before cluster upgrades. Detect deprecated resources with pluto, kubent, and kubectl convert.

⏱ 15 minutes api-deprecationmigrationupgrade
🌐 Networking intermediate

Kubernetes CNI Plugins Compared

Compare Calico, Cilium, Flannel, and Multus CNI plugins for Kubernetes. Performance benchmarks, features, and selection criteria for your cluster.

⏱ 15 minutes cnicalicocilium
🔧 Troubleshooting beginner

Kubernetes Debugging Toolkit and Commands

Essential kubectl debugging commands and tools for Kubernetes troubleshooting. Covers ephemeral containers, debug pods, network debugging, and log analysis.

⏱ 15 minutes debuggingkubectltroubleshooting
⚙️ Configuration advanced

Kubernetes Disaster Recovery Planning

Build a Kubernetes disaster recovery plan with etcd backups, Velero, cross-region replication, and RTO/RPO targets for production clusters.

⏱ 15 minutes disaster-recoverybackupvelero
⚙️ Configuration advanced

Kubernetes etcd Operations and Maintenance

Manage etcd for Kubernetes: backup, restore, compaction, defragmentation, member management, and disaster recovery procedures.

⏱ 15 minutes etcdbackuprestore
🤖 AI & GPU advanced

GPU Sharing with MPS and MIG on Kubernetes

Share NVIDIA GPUs across multiple pods using MPS time-slicing and MIG hardware partitioning. Maximize GPU utilization for inference workloads.

⏱ 15 minutes gpu-sharingmpsmig
🚀 Deployments advanced

Multi-Cluster Mgmt Strategies K8s

Manage multiple Kubernetes clusters with federation, service mesh, and GitOps. Covers Admiralty, Liqo, Skupper, and ArgoCD ApplicationSets.

⏱ 15 minutes multi-clusterfederationgitops
🔒 Security intermediate

Kubernetes Secrets Management Patterns

Kubernetes secrets management best practices 2026: External Secrets Operator, Vault, Sealed Secrets, SOPS, encryption at rest, and rotation.

⏱ 15 minutes secretsvaultexternal-secrets
🔒 Security intermediate

K8s Service Accounts and Token Management

Configure service accounts, bound tokens, OIDC federation, and workload identity for Kubernetes. Migrate from legacy tokens to projected volumes.

⏱ 15 minutes service-accountstokensoidc
⚙️ Configuration intermediate

Kubernetes Sidecar Container Patterns

Implement sidecar containers for logging, proxying, config reload, and security. Built-in sidecar support in Kubernetes 1.28+ with restartPolicy Always.

⏱ 15 minutes sidecarpatternslogging
🚀 Deployments advanced

Kubernetes StatefulSet Advanced Patterns

Advanced StatefulSet patterns for databases, message queues, and distributed systems. Covers ordered deployment, persistent identity, and headless services.

⏱ 15 minutes statefulsetdatabasesordered-deployment
🚀 Deployments intermediate

Run Windows Containers on Kubernetes

Deploy Windows workloads on Kubernetes with mixed Linux and Windows node pools. Covers taints, node selectors, and Windows-specific networking.

⏱ 15 minutes windowsmixed-osnode-selector
💾 Storage intermediate

Longhorn Distributed Storage on Kubernetes

Install Longhorn for distributed block storage on Kubernetes. Replicated volumes, snapshots, backups to S3, and disaster recovery across nodes.

⏱ 15 minutes longhornstoragedistributed
🤖 AI & GPU intermediate

Node Feature Discovery Operator for Kubernetes

Install and configure Node Feature Discovery (NFD) Operator to auto-detect hardware features like GPUs, NICs, CPU flags, and USB devices on Kubernetes nodes.

⏱ 15 minutes nfdnode-feature-discoveryoperator
🔧 Troubleshooting intermediate

Fix OOMKilled Containers in Kubernetes

Debug and resolve OOMKilled container terminations. Understand memory limits, kernel OOM killer behavior, and right-sizing strategies for Kubernetes pods.

⏱ 15 minutes oomkilledmemoryresources
🔧 Troubleshooting advanced

OpenShift crun vs runc Runtime Differences

Understand why pods behave differently on GPU vs CPU nodes in OpenShift. Compare crun and runc container runtimes, seccomp profiles, and syscall filtering.

⏱ 15 minutes crunruncopenshift
📊 Observability advanced

OpenTelemetry Complete Setup on Kubernetes

Deploy OpenTelemetry Collector, auto-instrumentation, and exporters on Kubernetes. Unified traces, metrics, and logs pipeline to Jaeger, Prometheus, and Loki.

⏱ 15 minutes opentelemetryoteltracing
💾 Storage intermediate

Fix PVC Resize Stuck or Failed

Debug PVC expansion failures in Kubernetes. Covers allowVolumeExpansion, filesystem resize, and offline vs online expansion.

⏱ 15 minutes pvcresizeexpansion
🔧 Troubleshooting intermediate

Fix Unexpected Pod Evictions in Kubernetes

Debug pods being evicted due to node pressure, preemption, or taint-based eviction. Understand eviction priorities, QoS classes, and PodDisruptionBudgets.

⏱ 15 minutes evictionpreemptionpdb
🔧 Troubleshooting beginner

Fix Pod Stuck in Pending State

Debug pods stuck in Pending status. Covers insufficient resources, node affinity mismatches, taint/toleration issues, and PVC binding failures.

⏱ 15 minutes pendingschedulingresources
🔧 Troubleshooting intermediate

Fix Podman TLS x509 Behind Corporate Proxy

Resolve podman pull x509 certificate signed by unknown authority errors caused by corporate TLS-intercepting proxies. Extract and install the proxy CA.

⏱ 15 minutes podmantlsx509
🔧 Troubleshooting intermediate

Fix PVC Stuck in Pending State

Debug PersistentVolumeClaims stuck in Pending status. Covers storage class issues, provisioner failures, capacity problems, and access mode mismatches.

⏱ 15 minutes pvcstoragepersistent-volume
🔒 Security intermediate

Fix RBAC Permission Denied Errors

Debug RBAC forbidden and unauthorized errors in Kubernetes. Covers ClusterRole vs Role scope and service account permissions.

⏱ 15 minutes rbacforbiddenpermissions
🚀 Deployments intermediate

Fix Deploy Rollout Stuck at Partial Progress

Debug deployments stuck with unavailable replicas during rollout. Covers readiness probes, resource constraints, and rollback.

⏱ 15 minutes deploymentrolloutstuck
💾 Storage advanced

Rook Ceph Storage Cluster on Kubernetes

Deploy Rook Ceph for enterprise-grade distributed storage on Kubernetes. Block, file, and object storage with self-healing and automatic rebalancing.

⏱ 15 minutes rookcephstorage
🔧 Troubleshooting advanced

Fix Service Mesh Sidecar Injection Failures

Debug Istio and Envoy sidecar injection issues. Covers missing sidecars, port conflicts, init container failures, and mTLS connection errors.

⏱ 15 minutes istioenvoysidecar
🚀 Deployments advanced

Run WebAssembly Workloads on Kubernetes

Deploy WASM workloads on Kubernetes using SpinKube and containerd-shim. Sub-millisecond cold starts, polyglot runtimes, and sandboxed edge computing.

⏱ 15 minutes wasmwebassemblyspinkube
💾 Storage intermediate

Fio NFS Benchmark on OpenShift Nodes

Run fio NFS storage benchmarks on OpenShift using parallel pods with hostPath mounts. Measure IOPS, bandwidth, and latency across multiple NFS endpoints.

⏱ 30 minutes fionfsbenchmark
💾 Storage intermediate

MachineConfig NFS Mount on OpenShift Nodes

Mount NFS shares on OpenShift worker nodes using MachineConfig systemd mount units. The only production-safe way to persist NFS mounts on RHCOS nodes.

⏱ 25 minutes openshiftmachineconfignfs
🔧 Troubleshooting intermediate

OpenShift oc debug Mount Limitation

Why NFS and filesystem mounts via oc debug node disappear after the debug pod exits. Understand the container namespace isolation and use MachineConfig instead.

⏱ 10 minutes openshiftoc-debugmount
⚙️ Configuration beginner

KubeCon EU 2026 Book Giveaway Recap

Recap of the Kubernetes Recipes book giveaway at KubeCon EU 2026 Amsterdam. Photos from the signing sessions, community highlights, and how to get your copy.

⏱ 5 minutes kubeconbookcommunity
🌐 Networking intermediate

Configure Knative Ingress Networking

Set up Knative Serving ingress with Kourier, Istio, or Contour. Custom domains, TLS, path routing, and external visibility.

⏱ 25 minutes knativeingresskourier
🚀 Deployments intermediate

Detect ArgoCD Shadow Updates Out-of-Band

Detect and prevent ArgoCD shadow updates where manual kubectl changes bypass GitOps. Configure self-heal, sync, and drift detection.

⏱ 20 minutes argocdgitopsdrift-detection
🌐 Networking intermediate

Migrate Ingress to Gateway API ingress2gateway

Migrate Ingress to Gateway API using ingress2gateway. Convert HTTPRoute and TLSRoute with zero-downtime parallel migration.

⏱ 30 minutes gateway-apiingressmigration
🚀 Deployments advanced

Build a K8s Operator with Docker Testing

Build a Kubernetes operator with Operator SDK and Kubebuilder. Test with Docker, Kind, and envtest. Full TDD workflow to OLM bundle.

⏱ 60 minutes operatoroperator-sdkkubebuilder
🔧 Troubleshooting beginner

Fix the Kubernetes ConfigMap Too Large Error

Resolve the 1MB ConfigMap size limit error. Split configs, use Secrets for binary data, mount volumes, or use external stores.

⏱ 15 minutes configmapsize-limitconfiguration
🔧 Troubleshooting intermediate

Debug CRI-O Container Runtime Errors

Troubleshoot CRI-O issues on OpenShift nodes. Fix image pull failures, container start errors, storage driver problems, and CNI networking plugin failures.

⏱ 15 minutes cri-ocontainer-runtimeopenshift
🔧 Troubleshooting advanced

Debug Degraded MachineConfigPool Nodes

Fix nodes stuck Degraded after MachineConfig updates. Check MCD logs, on-disk validation, and recovery for degraded workers.

⏱ 15 minutes openshiftmachineconfigdegraded
🔧 Troubleshooting intermediate

Debug Kubernetes Pod Eviction Reasons

Investigate why pods were evicted from Kubernetes nodes. Check node pressure conditions, resource limits, priority classes, and preemption events.

⏱ 15 minutes evictionnode-pressureresources
🔧 Troubleshooting intermediate

Debug DNS Resolution Failures in Pods

Troubleshoot pods unable to resolve DNS names. Check CoreDNS health, ndots configuration, search domains, and NetworkPolicies blocking UDP port 53 DNS traffic.

⏱ 15 minutes dnscorednsresolution
🔧 Troubleshooting advanced

Debug etcd Performance Issues in Kubernetes

Diagnose slow etcd causing API latency and leader election storms. Check disk IOPS, compaction, defrag, and network latency.

⏱ 15 minutes etcdperformancelatency
🔧 Troubleshooting advanced

Fix Expired Certificates in Kubernetes

Renew expired certificates causing API server failures and kubelet disconnections. Manual and automatic renewal for kubeadm and OpenShift.

⏱ 15 minutes certificatestlsexpiration
🤖 AI & GPU advanced

Enable GPUDirect Storage in ClusterPolicy

Enable NVIDIA GPUDirect Storage (GDS) in the GPU Operator ClusterPolicy for direct GPU-to-NVMe data paths. Driver module configuration and verification.

⏱ 20 minutes nvidiagdsgpu-operator
🤖 AI & GPU intermediate

GPU Time-Slicing on Kubernetes

Share GPUs across multiple workloads using NVIDIA time-slicing on Kubernetes. Configure the device plugin, set replica counts, and manage fairness.

⏱ 20 minutes nvidiagputime-slicing
🎯 Helm intermediate

Helm before-hook-creation Hook

Use Helm before-hook-creation for database migrations and pre-install checks. Complete hook lifecycle, delete policies, and ordering.

⏱ 15 minutes helmhooksbefore-hook-creation
🎯 Helm beginner

Helm Sprig cat Function: Concatenate Strings

Helm Sprig cat function concatenates strings with spaces between arguments. Syntax, why cat inserts spaces, conditionals, and template examples.

⏱ 10 minutes helmsprigcat
🎯 Helm beginner

Helm Sprig join Function: List to String

Helm Sprig join function converts lists to delimited strings. Join list example with CSV output, label values, and multi-value template patterns.

⏱ 10 minutes helmsprigjoin
🎯 Helm beginner

Helm Sprig toString Function Guide

Helm Sprig toString function converts values to strings in templates. Handle integers, booleans, lists, and nil values safely in Helm charts.

⏱ 10 minutes helmsprigtostring
🔧 Troubleshooting intermediate

Fix OpenShift ImageStream Import Errors

Debug ImageStream import failures in OpenShift. Resolve DNS errors, auth issues, TLS problems, and registry rate limiting.

⏱ 15 minutes openshiftimagestreamimport
🔧 Troubleshooting advanced

ITMS Race Condition with Ingress Controllers

Resolve the ITMS race condition where ImageTagMirrorSet rollouts deadlock with hostNetwork ingress controllers during MCO drain.

⏱ 25 minutes openshiftitmsingress
🚀 Deployments advanced

Kubernetes Resiliency Patterns Guide

Build resilient Kubernetes apps with PDBs, topology spread, anti-affinity, health probes, and graceful shutdown patterns.

⏱ 30 minutes resiliencyhigh-availabilitypdb
⚡ Autoscaling intermediate

K8s Resource Optimization Strategies

Kubernetes resource optimization strategies and best practices. Right-size pods with VPA, Goldilocks dashboards, and resource allocation techniques.

⏱ 30 minutes resourcesoptimizationvpa
🔒 Security advanced

Harden Kubernetes Security Posture

Kubernetes security hardening: Pod Security Standards, RBAC least-privilege, network policies, secret encryption, and audit logging.

⏱ 30 minutes securityhardeningpss
⚙️ Configuration intermediate

Inspect MachineConfig Annotations on Nodes

Read and interpret MachineConfig annotations on OpenShift nodes. Check desired vs current config, node state, and rendered config hashes to diagnose MCP issues.

⏱ 15 minutes openshiftmachineconfigannotations
⚙️ Configuration intermediate

Configure NTP Chrony via MachineConfig

Set custom NTP servers on OpenShift RHCOS nodes using MachineConfig. Fix time drift, configure chrony, and verify time synchronization across your cluster.

⏱ 15 minutes openshiftmachineconfigchrony
⚙️ Configuration intermediate

Set Kernel Parameters via MachineConfig

Tune kernel sysctl parameters on OpenShift nodes using MachineConfig. Set networking, memory, and performance sysctls on RHCOS.

⏱ 15 minutes openshiftmachineconfigkernel
⚙️ Configuration intermediate

Configure Container Registries via MachineC...

Set up mirror registries and blocked registries on OpenShift nodes using MachineConfig to control CRI-O image pull on RHCOS.

⏱ 15 minutes openshiftmachineconfigregistries
🔧 Troubleshooting advanced

Fix Stale MachineConfigPool Updates

Debug and resolve stale OpenShift MachineConfigPool updates. Identify blocked nodes, check MachineConfigDaemon logs, and unblock stuck MCP rollouts.

⏱ 20 minutes openshiftmachineconfigmcp
🔧 Troubleshooting advanced

MCP Drain Blocked by PDB: Workaround

Resolve OpenShift MachineConfigPool drain failures caused by PodDisruptionBudget violations. Scale down and restore after update.

⏱ 15 minutes openshiftpdbdrain
⚙️ Configuration intermediate

Configure MCP maxUnavailable for Rollouts

Control how many nodes the MachineConfig Operator updates simultaneously. Set maxUnavailable for faster rollouts or safer one-at-a-time updates in production.

⏱ 15 minutes openshiftmachineconfigmcp
⚙️ Configuration intermediate

Pause and Unpause MCP Rollouts

Temporarily pause MachineConfigPool rollouts to batch multiple MachineConfig changes or coordinate with maintenance windows. Unpause to resume node updates.

⏱ 15 minutes openshiftmachineconfigmcp
⚙️ Configuration advanced

Automate MCP Updates with Drain Script

Bash script to automate OpenShift MachineConfigPool updates when drains are blocked by PDB violations. Auto-detects blockers, scales down, drains, and restores.

⏱ 30 minutes openshiftmachineconfigautomation
⚙️ Configuration intermediate

Separate Worker and Infra MachineConfigPools

Create dedicated MachineConfigPools for infrastructure and GPU nodes. Isolate MCP rollout blast radius and control update order for different node types.

⏱ 15 minutes openshiftmachineconfigmcp
🔧 Troubleshooting beginner

Fix Namespace Stuck in Terminating

Remove Kubernetes namespaces stuck in Terminating state. Identify blocking finalizers, orphaned API resources, and safely force namespace cleanup procedures.

⏱ 15 minutes namespaceterminatingfinalizer
🔧 Troubleshooting intermediate

Debug NetworkPolicy Connectivity Issues

Troubleshoot pods unable to communicate despite correct Services. Verify NetworkPolicy rules, label selectors, and default deny.

⏱ 15 minutes networkpolicyconnectivitydebugging
🔧 Troubleshooting advanced

Node Drain Blocked by hostNetwork Port Conf...

Debug and fix OpenShift node drains that fail because hostNetwork pods cannot schedule replacements due to port exhaustion across the cluster.

⏱ 15 minutes openshifthostnetworkdrain
🔧 Troubleshooting intermediate

Debug Node NotReady Status in Kubernetes

Diagnose Kubernetes nodes stuck in NotReady state. Check kubelet logs, container runtime, network, disk pressure, and certificates.

⏱ 15 minutes nodenot-readykubelet
🤖 AI & GPU intermediate

NVIDIA GPU Operator Setup on Kubernetes

Install and configure NVIDIA GPU Operator on Kubernetes. Driver containers, toolkit, device plugin, DCGM monitoring, and ClusterPolicy setup.

⏱ 30 minutes nvidiagpu-operatorgpu
🤖 AI & GPU advanced

NVIDIA Open GPU + GPUDirect RDMA + DOCA-OFE...

Deploy NVIDIA AI networking on Kubernetes: Open GPU driver with DMA-BUF, GPUDirect RDMA, DOCA-OFED, and SR-IOV VF isolation.

⏱ 45 minutes nvidiagpu-operatorgpudirect
⚙️ Configuration beginner

Use oc adm drain Dry-Run for Diagnostics

Preview node drain impact without evicting pods. Identify PDB violations, unmanaged pods, and local storage blockers before maintenance.

⏱ 15 minutes draindry-runmaintenance
🚀 Deployments advanced

OpenClaw GitOps Deployment with ArgoCD

Deploy OpenClaw on Kubernetes using ArgoCD for GitOps automation. Application definition, sync policies, drift detection, and secrets.

⏱ 25 minutes openclawargocdgitops
🔒 Security advanced

OpenClaw API Keys External Secrets Operator

Manage OpenClaw API keys and gateway tokens using External Secrets Operator with AWS Secrets Manager, Vault, or GCP Secret Manager on Kubernetes.

⏱ 30 minutes openclawexternal-secretsvault
🚀 Deployments beginner

OpenClaw Local Development with Kind

Set up a local Kind cluster for OpenClaw development and testing. Auto-detect Docker or Podman, create a single-node cluster, and deploy OpenClaw in minutes.

⏱ 15 minutes openclawkindlocal-development
🎯 Helm intermediate

OpenClaw Helm Chart with Chromium Sidecar

Deploy OpenClaw using the community Helm chart with Chromium browser sidecar for web automation, declarative skill installation, and custom values overlays.

⏱ 25 minutes openclawhelmchromium
🌐 Networking intermediate

Expose OpenClaw via K8s Ingress with TLS

Configure Kubernetes Ingress with TLS to expose OpenClaw gateway securely. Covers cert-manager, NGINX Ingress, and allowed origins.

⏱ 25 minutes openclawingresstls
🚀 Deployments intermediate

OpenClaw Multi-Env Deploy with Kustomize

Deploy OpenClaw across dev, staging, and production Kubernetes environments using Kustomize overlays for configs and secrets.

⏱ 30 minutes openclawkustomizemulti-environment
📊 Observability beginner

OpenClaw Health Probes on Kubernetes

Configure liveness and readiness probes for OpenClaw on Kubernetes. Custom Node.js health checks against /healthz and /readyz endpoints with proper timing.

⏱ 15 minutes openclawhealth-probesliveness
🚀 Deployments advanced

OpenClaw Multi-Agent Team Deployment

Deploy multiple specialized OpenClaw agents as Kubernetes pods. Dedicated DevOps, security, and writing agents with shared workspace.

⏱ 35 minutes openclawmulti-agentteam
⚙️ Configuration intermediate

OpenClaw Multi-Model Provider Setup

Configure OpenClaw with multiple AI providers on Kubernetes. Anthropic, OpenAI, Gemini, OpenRouter with fallback chains and cost control.

⏱ 20 minutes openclawai-modelsmulti-provider
⚙️ Configuration advanced

OpenClaw Node Pairing for IoT and Edge Devices

Pair phones, Raspberry Pi, and edge devices with OpenClaw on Kubernetes. Camera, location, screen control, and remote command execution.

⏱ 30 minutes openclawiotedge
🚀 Deployments intermediate

OpenClaw on OpenShift with SCCs and Routes

Deploy OpenClaw on OpenShift with Security Context Constraints, Routes for TLS termination, and OpenShift-specific considerations for non-root containers.

⏱ 20 minutes openclawopenshiftscc
🚀 Deployments intermediate

OpenClaw Operator for Kubernetes

Deploy OpenClaw AI agents on Kubernetes using the official operator. CRD-based lifecycle, Chromium sidecar, auto-update, and backup.

⏱ 25 minutes openclawoperatorai-agents
💾 Storage intermediate

OpenClaw Persistent State Management

Manage OpenClaw agent state and workspace data with Kubernetes PVCs. Init container config seeding, backups, and storage classes.

⏱ 20 minutes openclawpersistent-volumesstate-management
⚡ Autoscaling intermediate

OpenClaw Resource Limits and Tuning

Size CPU, memory, and storage for OpenClaw on Kubernetes. Tuning profiles for light usage, browser automation, and production deployments.

⏱ 15 minutes openclawresource-limitstuning
🔒 Security intermediate

OpenClaw Pod Security Hardening on Kubernetes

Harden OpenClaw pods with read-only filesystem, dropped capabilities, non-root user, seccomp profiles, and resource limits.

⏱ 20 minutes openclawpod-securityhardening
🚀 Deployments advanced

OpenClaw Webhook Automation on Kubernetes

Configure OpenClaw webhooks on Kubernetes for GitHub, Jira, and PagerDuty event-driven automation. Ingress routing, HMAC validation, and hook handler patterns.

⏱ 35 minutes openclawwebhooksautomation
🔧 Troubleshooting intermediate

OpenShift Ingress Router Troubleshooting

Debug OpenShift HAProxy router issues: pods stuck Pending, hostPort conflicts, PDB violations during maintenance, and custom router deployment scaling problems.

⏱ 20 minutes openshiftingresshaproxy
🔧 Troubleshooting intermediate

Debug MachineConfigDaemon Logs

Read and interpret OpenShift MachineConfigDaemon logs to diagnose node update failures. Common error patterns, drain issues, and config application problems.

⏱ 15 minutes openshiftmachineconfigmcd
⚙️ Configuration beginner

Cordon, Drain, and Uncordon Nodes

Safely remove workloads from OpenShift and Kubernetes nodes for maintenance. Cordon to prevent scheduling, drain to evict pods, uncordon to restore.

⏱ 10 minutes maintenancenode-managementdrain
🔧 Troubleshooting intermediate

Debug OpenShift OAuth Login Failures

Troubleshoot OpenShift console and CLI login failures. Check OAuth server pods, identity provider config, and expired tokens.

⏱ 15 minutes openshiftoauthauthentication
⚙️ Configuration intermediate

Configure PDBs for OpenShift Routers

Set PodDisruptionBudgets for OpenShift IngressController routers. Balance availability during maintenance with node drain ability.

⏱ 15 minutes openshiftpdbingress
📊 Observability intermediate

Enable User Workload Monitoring OpenShift

Enable user workload monitoring on OpenShift. Deploy ServiceMonitor, PodMonitor, alerting rules, and Grafana dashboards.

⏱ 20 minutes openshiftmonitoringprometheus
🔧 Troubleshooting intermediate

Fix Stuck OLM Operator Subscriptions

Debug Operator Lifecycle Manager subscriptions stuck in pending or failed state. Resolve catalog source issues, approval policies, and CSV dependency conflicts.

⏱ 15 minutes openshiftolmoperator
🔧 Troubleshooting intermediate

PDB Allowed Disruptions Zero: Debugging

Debug PodDisruptionBudgets stuck at zero allowed disruptions. Understand minAvailable vs maxUnavailable, fix eviction failures, and plan for maintenance.

⏱ 15 minutes pdbdisruption-budgeteviction
🔧 Troubleshooting intermediate

Fix PV Stuck in Terminating State

Resolve PVs and PVCs stuck in Terminating status. Remove finalizers safely, check volume detachment, and handle storage issues.

⏱ 15 minutes pvpvcterminating
🌐 Networking intermediate

Manage hostNetwork Pod Port Allocation

Plan and manage host port usage for hostNetwork pods. Prevent port conflicts, track allocations, and handle port exhaustion.

⏱ 15 minutes hostnetworkportsscheduling
🔧 Troubleshooting beginner

Fix ResourceQuota Exceeded Errors

Debug resource quota violations preventing pod scheduling. Understand LimitRange defaults, ResourceQuota, and namespace management.

⏱ 15 minutes resourcequotalimitrangescheduling
⚙️ Configuration beginner

Restore Scaled Deployments After Node Drain

Restore deployments scaled down for maintenance. Verify node health, check pod scheduling, and confirm service availability.

⏱ 15 minutes scalingrestoremaintenance
⚙️ Configuration intermediate

Scale Deployments to Unblock Node Drains

Safely scale down deployments that block node drains due to PDB violations. Record original replicas, scale to zero, drain, then restore after the node returns.

⏱ 15 minutes scalingdrainpdb
🔧 Troubleshooting beginner

Debug Service with No Ready Endpoints

Troubleshoot Services showing zero endpoints. Verify label selectors, readiness probes, pod status, and port configuration.

⏱ 15 minutes serviceendpointsreadiness
🔧 Troubleshooting beginner

Fix Node Untolerated Taint Scheduling Errors

Fix node untolerated taint errors causing pods stuck in Pending. NoSchedule, PreferNoSchedule, NoExecute effects, and toleration syntax guide.

⏱ 15 minutes taintstolerationsscheduling
🔧 Troubleshooting intermediate

Fix Admission Webhook Timeout Errors

Debug admission webhook failures blocking pod creation. Identify failing webhooks, check timeouts, and set failurePolicy.

⏱ 15 minutes webhookadmissiontimeout
⚙️ Configuration intermediate

ITMS External-to-External Registry Mirroring

Configure OpenShift ImageTagMirrorSet to map external registries to your private registry. Mirror Docker Hub, GHCR, Quay.io, and NVIDIA NGC.

⏱ 20 minutes openshiftitmsimagetagmirrorset
⚙️ Configuration advanced

How ITMS Updates registries.conf via Machin...

How ITMS and IDMS update /etc/containers/registries.conf on immutable CoreOS nodes via MCO and MachineConfig. Full chain deep-dive.

⏱ 25 minutes openshiftitmsidms
⚙️ Configuration beginner

400 Recipes Milestone: What We Built & What...

Kubernetes Recipes reaches 400 articles. Explore new AI/GPU infrastructure, NVIDIA networking, ArgoCD GitOps, OpenShift, and RHACS security recipes.

⏱ 10 minutes communitymilestonekubernetes
🤖 AI & GPU intermediate

AI Model Storage: hostPath vs PVC Inference

Deploy AI models on Kubernetes using hostPath and PVC storage. Compare performance, security trade-offs, and production patterns for model serving.

⏱ 30 minutes model-servingstoragehostpath
🔒 Security intermediate

Quay Default Permissions for Robot Accounts

Configure Quay Registry default permissions to auto-grant read access to robot accounts on every new repository. API and team patterns.

⏱ 15 minutes quayrobot-accountpermissions
⚙️ Configuration beginner

KubeCon EU 2026 Book Signing Events

Join Luca Berton at two KubeCon Amsterdam events: Signal Overflow at Booking.com HQ (Mon 23 Mar) and book signing at vCluster booth #521 (Tue 24 Mar).

⏱ 15 minutes kubeconbookcommunity
🤖 AI & GPU advanced

Volcano Job minAvailable Gang Scheduling

Configure Volcano job minAvailable for gang scheduling on Kubernetes. Batch AI training, fair-share queues, job plugins, and GPU preemption guide.

⏱ 35 minutes volcanobatchgang-scheduling
🌐 Networking intermediate

Configure SR-IOV agent-config.yaml Device b...

Use agent-config.yaml to select network devices by PCI path for SR-IOV VF creation, ensuring deterministic NIC targeting across OpenShift nodes.

⏱ 25 minutes sr-iovnetworkingopenshift
🤖 AI & GPU intermediate

AIPerf Benchmark LLMs on Kubernetes

Deploy NVIDIA AIPerf to benchmark LLM inference performance on Kubernetes. Measure TTFT, ITL, throughput with real-time dashboard and GPU telemetry.

⏱ 20 minutes aiperfbenchmarkingnvidia
🤖 AI & GPU advanced

AIPerf Concurrency Sweep on K8s

Run AIPerf concurrency sweeps on Kubernetes to find optimal LLM serving capacity. Automate 1-128 concurrent user benchmarks with batch Jobs.

⏱ 30 minutes aiperfbenchmarkingconcurrency
🤖 AI & GPU advanced

AIPerf Goodput and SLO Benchmarks

Measure LLM goodput with AIPerf on Kubernetes. Define SLOs for TTFT and ITL, calculate effective throughput, and benchmark with timeslice analysis.

⏱ 25 minutes aiperfbenchmarkinggoodput
🤖 AI & GPU advanced

AIPerf Multi-Model Benchmark on K8s

Compare multiple LLM models and backends with AIPerf on Kubernetes. Benchmark vLLM vs TGI vs Triton with automated multi-run confidence intervals.

⏱ 30 minutes aiperfbenchmarkingcomparison
🤖 AI & GPU advanced

AIPerf Trace Replay Benchmarks on K8s

Replay production traffic traces with AIPerf on Kubernetes. Use moon_cake format, ShareGPT datasets, and fixed schedules for realistic LLM benchmarks.

⏱ 25 minutes aiperfbenchmarkingtrace-replay
🚀 Deployments advanced

Air-Gapped OpenShift with Quay Mirror

Deploy OpenShift in air-gapped environments with local Quay registry mirror, ImageDigestMirrorSet, and custom CatalogSources.

⏱ 15 minutes air-gapopenshiftquay
🎯 Helm intermediate

ArgoCD App of Apps with Helm Values

Use the ArgoCD App of Apps pattern with Helm value overrides per environment, enabling templated Application manifests and DRY multi-environment configurations.

⏱ 20 minutes argocdgitopshelm
🚀 Deployments intermediate

ArgoCD App of Apps Pattern Explained

Implement the ArgoCD App of Apps pattern to manage multiple applications from a parent Application for cluster bootstrapping.

⏱ 20 minutes argocdgitopsapp-of-apps
🚀 Deployments advanced

ArgoCD App of Apps with Sync Waves

Combine the ArgoCD App of Apps pattern with sync waves to bootstrap entire clusters in dependency order, from CRDs and operators to application workloads.

⏱ 25 minutes argocdgitopsapp-of-apps
🚀 Deployments intermediate

ArgoCD ApplicationSets for Multi-Tenant GPUs

Use ArgoCD ApplicationSets to auto-discover and provision GPU tenant overlays from Git directories with per-tenant sync policies.

⏱ 15 minutes argocdapplicationsetsmulti-tenant
🚀 Deployments beginner

ArgoCD Declarative Application Setup

Define ArgoCD Applications, Projects, and repository credentials declaratively using Kubernetes manifests for reproducible GitOps configuration.

⏱ 15 minutes argocdgitopsdeclarative
🚀 Deployments advanced

ArgoCD Multi-Cluster App of Apps

Manage multiple Kubernetes clusters with ArgoCD App of Apps, deploying shared infrastructure and cluster-specific workloads from a single GitOps repository.

⏱ 25 minutes argocdgitopsmulti-cluster
🚀 Deployments intermediate

Manage OperatorGroups with ArgoCD

Deploy and manage OLM OperatorGroup resources via ArgoCD for GitOps-driven operator lifecycle management in OpenShift namespaces.

⏱ 20 minutes operatorgroupolmargocd
🚀 Deployments intermediate

ArgoCD PreSync and PostSync Hooks

Use ArgoCD PreSync hooks for database migrations and PostSync hooks for smoke tests, with SyncFail hooks for automated rollback and cleanup.

⏱ 15 minutes argocdgitopshooks
🚀 Deployments advanced

ArgoCD Sync Waves for Canary Deployments

Use ArgoCD sync waves for canary deployments with Istio traffic splitting, automated validation, and progressive rollout strategies.

⏱ 20 minutes argocdgitopscanary
🚀 Deployments intermediate

ArgoCD Sync Waves for CRD & Operator Ordering

Use ArgoCD sync waves to deploy Custom Resource Definitions before operators and custom resources, preventing CRD race conditions in GitOps pipelines.

⏱ 15 minutes argocdgitopscrds
🚀 Deployments intermediate

ArgoCD Sync Waves for Ordered Deployments

Use ArgoCD sync waves to control the order of Kubernetes resource deployment, ensuring dependencies like namespaces and CRDs are created before workloads.

⏱ 15 minutes argocdgitopssync-waves
🚀 Deployments intermediate

ArgoCD Sync Waves for Database Migrations

Use ArgoCD sync waves and PreSync hooks to run database migrations before deploying application code, with rollback strategies.

⏱ 20 minutes argocdgitopsdatabase
⚙️ Configuration advanced

ClusterPolicy MOFED Upgrade Strategy

Configure safe MOFED driver upgrade policies in the NVIDIA GPU Operator ClusterPolicy with rolling updates, node draining, and rollback procedures.

⏱ 20 minutes nvidiagpu-operatormofed
💾 Storage advanced

CNPG Disaster Recovery and Replication

Set up cross-region PostgreSQL disaster recovery with CloudNativePG using replica clusters, WAL shipping, and automated failover.

⏱ 15 minutes cnpgpostgresqldisaster-recovery
🚀 Deployments intermediate

CloudNativePG PostgreSQL Operator

Deploy highly available PostgreSQL clusters on Kubernetes using CloudNativePG operator with automated failover and backups.

⏱ 15 minutes cnpgpostgresqldatabase
🚀 Deployments advanced

CNPG Cluster Scaling and Upgrades

Scale CloudNativePG clusters, perform rolling PostgreSQL major upgrades, and manage storage expansion without downtime in Kubernetes.

⏱ 15 minutes cnpgpostgresqlscaling
🔒 Security intermediate

Add Custom CA Certificates in Kubernetes

Configure custom Certificate Authority trust in vanilla Kubernetes using ConfigMap mounts, node-level trust stores, and containerd registry configuration.

⏱ 20 minutes certificatescatls
🔒 Security intermediate

Add Custom CA in OpenShift and Kubernetes

Configure custom Certificate Authority trust in both OpenShift and vanilla Kubernetes for private registries, internal services, and corporate PKI.

⏱ 25 minutes certificatescatls
🔒 Security intermediate

Add Custom CA Certificates in OpenShift

Configure custom Certificate Authority trust across an OpenShift cluster using proxy config, image config, and automatic CA bundle injection into pods.

⏱ 20 minutes openshiftcertificatesca
🔧 Troubleshooting beginner

Decode and Inspect Kubernetes Docker Secrets

Decode base64-encoded dockerconfigjson secrets to verify registry credentials, troubleshoot ImagePullBackOff errors, and audit pull secret configurations.

⏱ 10 minutes secretsbase64troubleshooting
🤖 AI & GPU advanced

Dell PowerEdge XE7740 GPU Node Setup

Configure Dell PowerEdge XE7740 GPU nodes with H200 GPUs for OpenShift and Kubernetes including BIOS, power, cooling, and network setup.

⏱ 15 minutes dellpoweredgexe7740
🤖 AI & GPU intermediate

Deploy Fish Audio TTS on Kubernetes

Deploy Fish Audio S2-Pro 5B text-to-speech model on Kubernetes for high-quality voice synthesis with multi-speaker support and streaming audio.

⏱ 20 minutes fish-audiotext-to-speechtts
🤖 AI & GPU advanced

Deploy GLM-5 754B on Kubernetes

Deploy Zhipu AI GLM-5 754B model on Kubernetes with vLLM. One of the largest open-weight models with multi-node tensor parallelism across 8+ GPUs.

⏱ 45 minutes glm-5zhipullm
🤖 AI & GPU beginner

Deploy Granite 4.0 Speech on Kubernetes

Deploy IBM Granite 4.0 1B Speech model on Kubernetes for automatic speech recognition. Lightweight 2B model runs on CPU or small GPU for STT workloads.

⏱ 15 minutes graniteibmspeech-recognition
🤖 AI & GPU advanced

Deploy Kimi K2.5 1.1T MoE on Kubernetes

Deploy Moonshot AI Kimi-K2.5 1.1T MoE multimodal model on Kubernetes. The largest open MoE model with 2.69M downloads for frontier AI tasks.

⏱ 45 minutes kimimoonshotmixture-of-experts
🤖 AI & GPU advanced

Deploy Llama 2 70B on Kubernetes

Deploy Meta Llama 2 70B on Kubernetes with multi-GPU tensor parallelism, vLLM serving, and production-ready health checks and resource limits.

⏱ 30 minutes llamallmvllm
🤖 AI & GPU intermediate

Deploy Llama 3.1 8B Instruct on K8s

Deploy Meta Llama 3.1 8B Instruct on Kubernetes with vLLM. Production-ready single-GPU deployment with 128K context, tool calling, and autoscaling.

⏱ 15 minutes llamallama-3.1meta
🤖 AI & GPU advanced

Deploy LTX Video Generation on K8s

Deploy Lightricks LTX-2.3 image-to-video model on Kubernetes for AI video generation with batch processing and S3 output storage.

⏱ 25 minutes ltxvideo-generationimage-to-video
🤖 AI & GPU advanced

Deploy MiniMax M2.5 229B on Kubernetes

Deploy MiniMax M2.5 229B model on Kubernetes with vLLM. High-performance LLM with 485K downloads, optimized for multi-turn conversation and long context.

⏱ 30 minutes minimaxllmmulti-gpu
🤖 AI & GPU advanced

Deploy NVIDIA Nemotron 120B MoE on K8s

Deploy NVIDIA Nemotron-3-Super-120B-A12B MoE model on Kubernetes. 120B total parameters with 12B active for enterprise-grade inference.

⏱ 25 minutes nemotronnvidiamixture-of-experts
🤖 AI & GPU intermediate

Deploy Microsoft Phi-4 on Kubernetes

Deploy Microsoft Phi-4 small language model on Kubernetes with vLLM. Efficient 14B model with GPT-4 level reasoning on a single GPU.

⏱ 20 minutes phi-4microsoftsmall-language-model
🤖 AI & GPU intermediate

Deploy Phi-4 Reasoning Vision on K8s

Deploy Microsoft Phi-4-reasoning-vision-15B on Kubernetes for multimodal chain-of-thought reasoning with visual understanding on a single GPU.

⏱ 20 minutes phi-4microsoftreasoning
🤖 AI & GPU advanced

Deploy Qwen3 235B MoE on Kubernetes

Deploy Alibaba Qwen3-235B-A22B mixture-of-experts model on Kubernetes. Only 22B parameters active per token for efficient 235B-class inference.

⏱ 30 minutes qwen3mixture-of-expertsmoe
🤖 AI & GPU advanced

Deploy Qwen3 Coder 80B on Kubernetes

Deploy Qwen3-Coder-Next 80B on Kubernetes for code generation, review, and refactoring. Production-ready AI coding assistant with multi-GPU serving.

⏱ 25 minutes qwen3code-generationcoding-assistant
🤖 AI & GPU intermediate

Deploy Qwen3 TTS on Kubernetes

Deploy Qwen3-TTS-12Hz-1.7B-CustomVoice on Kubernetes for text-to-speech with custom voice cloning. 1.13M downloads, lightweight single-GPU deployment.

⏱ 15 minutes qwen3text-to-speechtts
🤖 AI & GPU intermediate

Deploy Qwen3.5 35B MoE on Kubernetes

Deploy Alibaba Qwen3.5-35B-A3B mixture-of-experts multimodal model on Kubernetes. 35B total parameters with only 3B active for ultra-efficient inference.

⏱ 20 minutes qwen3.5mixture-of-expertsmoe
🤖 AI & GPU advanced

Deploy Qwen3.5 397B MoE on Kubernetes

Deploy Alibaba Qwen3.5-397B-A17B MoE multimodal model on Kubernetes. 397B total parameters with only 17B active per token for frontier VLM inference.

⏱ 30 minutes qwen3.5mixture-of-expertsmoe
🤖 AI & GPU intermediate

Deploy Qwen3.5 9B Multimodal on K8s

Deploy Alibaba Qwen3.5-9B vision-language model on Kubernetes with vLLM. Process images and text with a single GPU deployment.

⏱ 20 minutes qwen3.5multimodalvision-language
🤖 AI & GPU advanced

RetinaNet Object Detection on K8s

Deploy RetinaNet object detection model on Kubernetes with Triton Inference Server, TensorRT optimization, and batch processing pipelines.

⏱ 25 minutes retinanetobject-detectioncomputer-vision
🤖 AI & GPU advanced

Deploy Sarvam 105B on Kubernetes

Deploy Sarvam 105B multilingual LLM on Kubernetes with vLLM. India's largest open language model with native support for 10+ Indic languages.

⏱ 25 minutes sarvammultilingualindic-languages
🤖 AI & GPU advanced

Stable Diffusion XL on Kubernetes

Deploy Stable Diffusion XL for image generation on Kubernetes with TensorRT acceleration, queued batch processing, and S3 output storage.

⏱ 30 minutes stable-diffusionsdxlimage-generation
🤖 AI & GPU intermediate

Deploy Whisper Speech-to-Text on K8s

Deploy OpenAI Whisper for speech-to-text on Kubernetes with faster-whisper, batch transcription Jobs, and real-time streaming endpoints.

⏱ 20 minutes whisperspeech-to-texttranscription
🤖 AI & GPU advanced

Distributed Inference Kubernetes

Deploy distributed LLM inference with tensor parallelism across multiple GPUs and pipeline parallelism across nodes on Kubernetes.

⏱ 15 minutes distributed-inferencetensor-parallelismpipeline-parallelism
⚙️ Configuration advanced

NVIDIA DOCA Driver Container in Kubernetes

Deploy and configure NVIDIA DOCA Driver containers via NicClusterPolicy for RDMA, NFS-RDMA, and precompiled driver builds.

⏱ 15 minutes nvidiadocardma
⚙️ Configuration advanced

DOCA Driver on OpenShift with DTK

Build and deploy precompiled NVIDIA DOCA Driver containers on OpenShift using DriverToolKit, MachineConfig, and upgrade lifecycle.

⏱ 15 minutes nvidiadocaopenshift
💾 Storage advanced

GPU Operator GDS with NVMe and NFS RDMA

Configure GPUDirect Storage for local NVMe drives and NFS over RDMA in Kubernetes, including cuFile verification and performance benchmarking.

⏱ 25 minutes nvidiagdsnvme
🤖 AI & GPU intermediate

GenAI-Perf Benchmark LLM Serving

Benchmark LLM inference endpoints with NVIDIA GenAI-Perf for throughput, latency percentiles, time-to-first-token, and ITL metrics.

⏱ 15 minutes genai-perfbenchmarkllm
🤖 AI & GPU intermediate

GenAI-Perf Benchmark Triton on K8s

Benchmark NVIDIA Triton Inference Server performance on Kubernetes using GenAI-Perf. Measure TTFT, inter-token latency, throughput, and GPU telemetry.

⏱ 25 minutes genai-perftritonbenchmarking
🚀 Deployments advanced

GitOps Bootstrap for Bare-Metal GPU Clusters

Bootstrap bare-metal GPU clusters with ArgoCD and Kustomize in air-gapped environments with NVIDIA GPU and Network Operators.

⏱ 15 minutes gitopsargocdbare-metal
💾 Storage advanced

GPU Operator GPUDirect Storage GDS Module

Enable the GPUDirect Storage GDS module in the NVIDIA GPU Operator ClusterPolicy for direct GPU-to-storage data transfers bypassing CPU and system memory.

⏱ 25 minutes nvidiagpu-operatorgds
⚙️ Configuration advanced

GPU Operator ClusterPolicy Complete Reference

Complete reference for the NVIDIA GPU Operator ClusterPolicy CRD covering driver, toolkit, device plugin, MOFED, GDS, MIG, and DCGM configuration options.

⏱ 20 minutes nvidiagpu-operatorclusterpolicy
⚙️ Configuration advanced

NVIDIA GPU Operator MOFED Driver Configuration

Configure the NVIDIA GPU Operator to deploy Mellanox OFED drivers for high-performance RDMA networking on Kubernetes GPU nodes with InfiniBand and RoCE support.

⏱ 30 minutes nvidiagpu-operatormofed
🚀 Deployments advanced

GPU Operator Canary Upgrade Strategy

Safely upgrade NVIDIA GPU Operator using canary node pools, 48-hour bake periods, validation gates, and Git-based rollback.

⏱ 15 minutes gpu-operatorupgradecanary
🔒 Security intermediate

GPU Tenant Bootstrap Bundle for Kubernetes

Provision GPU tenants with a single Kustomize bundle containing namespace, RBAC, NetworkPolicy, quotas, and HAProxy VIP config.

⏱ 15 minutes multi-tenantkustomizegpu
📊 Observability intermediate

Per-Tenant GPU Monitoring and Chargeback

Build per-tenant GPU monitoring dashboards with queue time, utilization, thermal metrics, and GPU-hour chargeback on Kubernetes.

⏱ 15 minutes monitoringgpuchargeback
📊 Observability intermediate

GPU Tenant SLO Observability on Kubernetes

Define and monitor GPU tenant SLOs for queue time, inference latency, GPU utilization, and job completion rate with Prometheus alerting.

⏱ 15 minutes slogpuobservability
⚙️ Configuration advanced

GPU Cluster Upgrade Version Matrix

Maintain a version compatibility matrix for GPU Operator, Network Operator, drivers, firmware, CUDA, and OpenShift for safe upgrades.

⏱ 15 minutes upgradeversion-matrixgpu-operator
🌐 Networking advanced

GPUDirect RDMA via DMA-BUF on Kubernetes

Configure GPUDirect RDMA using DMA-BUF kernel subsystem for zero-copy GPU-to-GPU transfers over InfiniBand and RoCE networks.

⏱ 15 minutes gpudirectrdmadma-buf
🌐 Networking advanced

HAProxy Keepalived Multi-Tenant GPU Ingress

Configure HAProxy with Keepalived VIPs for per-tenant GPU cluster ingress with Jinja2 templates and per-tenant access logging.

⏱ 15 minutes haproxykeepalivedmulti-tenant
🌐 Networking advanced

InfiniBand vs Ethernet for AI on Kubernetes

Compare InfiniBand and Ethernet networking for GPU AI workloads on Kubernetes, including RDMA, RoCE, latency, and throughput considerations.

⏱ 15 minutes infinibandethernetrdma
🤖 AI & GPU advanced

Distrib. Training Kubeflow Training Operator

Run multi-node distributed PyTorch and TensorFlow training jobs using Kubeflow Training Operator with NCCL, RDMA, and shared storage.

⏱ 15 minutes kubeflowdistributed-trainingpytorch
🤖 AI & GPU intermediate

Kubeflow Training Operator on Kubernetes

Install Kubeflow Training Operator for distributed ML training with PyTorchJob, TFJob, and MPIJob on GPU-enabled Kubernetes clusters.

⏱ 15 minutes kubeflowtraining-operatordistributed-training
🤖 AI & GPU advanced

LeaderWorkerSet Operator for AI Workloads

Deploy distributed AI training with LeaderWorkerSet Operator on Kubernetes and OpenShift for leader-worker topology with gang scheduling.

⏱ 15 minutes leaderworkersetlwsdistributed-training
🤖 AI & GPU advanced

Llama Stack on Kubernetes with NVIDIA NIM

Deploy Meta Llama Stack on Kubernetes for unified inference, RAG, agents, and safety APIs using NVIDIA NIM as the inference backend.

⏱ 15 minutes llama-stacknvidia-nimllama
🚀 Deployments intermediate

MariaDB Operator on Kubernetes

Deploy highly available MariaDB clusters on Kubernetes using MariaDB Operator with Galera replication, automated backups, and connection pooling.

⏱ 15 minutes mariadboperatordatabase
🤖 AI & GPU advanced

MLPerf Benchmarking on Kubernetes

Run MLPerf inference and training benchmarks on Kubernetes GPU clusters to validate AI workload performance and compare hardware configurations.

⏱ 15 minutes mlperfbenchmarkinginference
🤖 AI & GPU intermediate

Shared Model Caching Across Pods on Kubernetes

Optimize LLM inference startup and reduce storage costs by sharing model weights across pods using emptyDir, hostPath, ReadWriteMany PVCs, and init.

⏱ 25 minutes model-cachingshared-memorypvc
⚙️ Configuration advanced

MOFED and DOCA Driver Building for OpenShift

Build NVIDIA MOFED and DOCA drivers for OpenShift using DriverToolKit, Buildah, and MachineConfig for RDMA and GPU networking.

⏱ 15 minutes mofeddocaopenshift
🤖 AI & GPU advanced

MPI Operator for Distributed Training

Deploy MPI Operator on Kubernetes for distributed GPU training with Horovod and NCCL. Run multi-node MPI jobs natively in Kubernetes pods.

⏱ 30 minutes mpimpi-operatordistributed-training
🔒 Security intermediate

Multi-Tenant GPU Namespace Isolation

Isolate GPU workloads across tenants using namespaces, RBAC, NetworkPolicy, and ResourceQuotas on OpenShift and Kubernetes.

⏱ 15 minutes multi-tenantgpunamespace
🔒 Security intermediate

NetworkPolicy Deny-Default for GPU Tenants

Implement deny-by-default NetworkPolicy for GPU tenant namespaces with NCCL port exceptions and DNS egress on Kubernetes.

⏱ 15 minutes networkpolicymulti-tenantgpu
🌐 Networking advanced

NFSoRDMA Bond with Access Mode Switch

Configure bonded NICs for NFS over RDMA using switch access mode for VLAN assignment. Aggregation on untagged interfaces for RDMA redundancy.

⏱ 25 minutes nfsordmardmabonding
🌐 Networking advanced

NFSoRDMA Dedicated NIC Configuration

Configure dedicated NICs for NFS over RDMA on Kubernetes worker nodes. NFSoRDMA requires untagged interfaces — no VLAN tagging supported.

⏱ 25 minutes nfsordmardmanfs
🌐 Networking advanced

NFSoRDMA Jumbo Frames MTU Configuration

Configure 9000 MTU jumbo frames for NFSoRDMA interfaces using NNCP to maximize RDMA throughput on Kubernetes worker nodes.

⏱ 15 minutes nfsordmardmamtu
🌐 Networking advanced

NFSoRDMA Multi-VLAN Switch Access Mode

Design multi-VLAN NFSoRDMA networks using switch access mode ports. Separate storage, replication, and backup traffic with dedicated NICs per VLAN.

⏱ 30 minutes nfsordmardmavlan
💾 Storage intermediate

NFSoRDMA Persistent Volume for Kubernetes

Create PersistentVolumes and StorageClasses for NFSoRDMA storage with RDMA transport, optimized mount options, and ReadWriteMany access.

⏱ 15 minutes nfsordmardmapersistent-volume
🌐 Networking advanced

NFSoRDMA Troubleshooting and Performance

Troubleshoot NFS over RDMA connectivity issues, diagnose TCP fallback, tune performance, and benchmark RDMA throughput on Kubernetes workers.

⏱ 20 minutes nfsordmardmatroubleshooting
🌐 Networking advanced

NFSoRDMA Worker Node Setup Guide

Complete worker node setup for NFS over RDMA including kernel modules, NFS client configuration, PersistentVolume mounts, and RDMA transport verification.

⏱ 30 minutes nfsordmardmanfs
⚙️ Configuration intermediate

NicClusterPolicy MOFED Affinity & Node Sele...

Configure NicClusterPolicy node selectors and affinity rules to deploy MOFED drivers only on RDMA-capable nodes in Kubernetes clusters.

⏱ 15 minutes nvidiamofednode-selection
🌐 Networking intermediate

NNCP Bond Interfaces on Worker Nodes

Create bonded network interfaces on Kubernetes worker nodes using NodeNetworkConfigurationPolicy for NIC redundancy and link aggregation.

⏱ 20 minutes nncpnmstatebonding
🌐 Networking intermediate

NNCP DNS and Static Routes on Workers

Configure static routes, DNS servers, and policy-based routing on worker nodes using NodeNetworkConfigurationPolicy for multi-network setups.

⏱ 15 minutes nncpnmstatedns
🌐 Networking intermediate

NNCP Linux Bridge on Worker Nodes

Create Linux bridges on Kubernetes worker nodes using NodeNetworkConfigurationPolicy for KubeVirt VM networking and pod bridging.

⏱ 20 minutes nncpnmstatelinux-bridge
🌐 Networking intermediate

NNCP MTU and Jumbo Frames on Workers

Set MTU and enable jumbo frames on worker node interfaces using NodeNetworkConfigurationPolicy for high-throughput storage and AI networking.

⏱ 15 minutes nncpnmstatemtu
🌐 Networking advanced

NNCP Multi-NIC Architecture for Workers

Design a complete multi-NIC worker node architecture with NNCP for separated management, storage, tenant, and GPU traffic using bonds, VLANs, and bridges.

⏱ 30 minutes nncpnmstatemulti-nic
🌐 Networking advanced

NNCP OVS Bridge on Worker Nodes

Configure Open vSwitch bridges on Kubernetes worker nodes using NodeNetworkConfigurationPolicy for advanced SDN and DPDK networking.

⏱ 25 minutes nncpnmstateovs
🌐 Networking intermediate

NNCP Rollback and Troubleshooting

Troubleshoot NodeNetworkConfigurationPolicy failures, monitor enactments, configure rollback timeouts, and recover from bad network configurations.

⏱ 15 minutes nncpnmstatetroubleshooting
🌐 Networking advanced

NNCP SR-IOV and Macvlan on Workers

Configure SR-IOV virtual functions and macvlan interfaces on worker nodes using NodeNetworkConfigurationPolicy for high-performance networking.

⏱ 25 minutes nncpnmstatesriov
🌐 Networking intermediate

NNCP Static IP Assignment on Worker Nodes

Use NodeNetworkConfigurationPolicy to assign static IPv4 and IPv6 addresses to worker node interfaces with nodeSelector targeting.

⏱ 15 minutes nncpnmstatenetworking
🌐 Networking intermediate

NNCP VLAN Tagging on Worker Nodes

Configure VLAN interfaces on Kubernetes worker nodes using NodeNetworkConfigurationPolicy for network segmentation and traffic isolation.

⏱ 15 minutes nncpnmstatevlan
🌐 Networking intermediate

NodePort Raw Traffic vs HTTPS Ingress

Route raw GPU inference traffic via NodePort for low-latency gRPC and HTTPS model serving via OpenShift ingress controller.

⏱ 15 minutes nodeportingressgrpc
🤖 AI & GPU advanced

Deploy NVIDIA Clara on Kubernetes

Deploy NVIDIA Clara medical AI and drug discovery platform on Kubernetes. Run digital biology and medtech inference workloads with GPU acceleration.

⏱ 30 minutes nvidiaclaramedical-ai
🤖 AI & GPU advanced

NVIDIA H200 GPU Workloads on Kubernetes

Deploy and optimize AI workloads on NVIDIA H200 GPUs with 141GB HBM3e memory for large model inference and training on Kubernetes.

⏱ 15 minutes nvidiah200gpu
🤖 AI & GPU advanced

NVIDIA NeMo Training on Kubernetes

Deploy NVIDIA NeMo framework on Kubernetes for large language model pre-training, fine-tuning, and RLHF with multi-node GPU clusters.

⏱ 15 minutes nvidianemotraining
🌐 Networking advanced

NVIDIA NIC Driver Container Entrypoint

Understand and customize the NVIDIA NIC driver container entrypoint for MOFED and DOCA driver lifecycle on Kubernetes and OpenShift.

⏱ 15 minutes nvidiamofeddoca
🤖 AI & GPU advanced

NVIDIA Pyxis and Enroot for SLURM

Use NVIDIA Pyxis and Enroot to run GPU containers in SLURM jobs. Bridge SLURM HPC scheduling with container-native AI workloads and NGC images.

⏱ 30 minutes pyxisenrootslurm
⚙️ Configuration advanced

Open Kernel Modules and DMA-BUF for GPUs

Migrate from proprietary NVIDIA kernel modules and nvidia-peermem to open kernel modules with DMA-BUF for safer GPU upgrades.

⏱ 15 minutes nvidiakernel-modulesdma-buf
⚡ Autoscaling advanced

OpenClaw Auto-Scaling with KEDA

Scale OpenClaw agents based on message queue depth using KEDA event-driven autoscaling for Discord, Telegram, and Slack.

⏱ 15 minutes openclawkedaautoscaling
💾 Storage intermediate

OpenClaw Backup Restore Command Guide

OpenClaw backup and restore command guide. VolumeSnapshots, CronJobs to S3, disaster recovery procedures, and session state management on Kubernetes.

⏱ 20 minutes openclawbackuprestore
⚙️ Configuration intermediate

OpenClaw Cron Jobs and Heartbeats

Configure OpenClaw's built-in cron scheduling and heartbeat system on Kubernetes for proactive notifications, periodic checks, and automated background.

⏱ 20 minutes openclawcronheartbeat
🚀 Deployments advanced

OpenClaw Blue-Green Deployment

Implement zero-downtime OpenClaw upgrades using blue-green deployments with traffic switching and rollback in Kubernetes.

⏱ 15 minutes openclawblue-greenzero-downtime
🚀 Deployments beginner

Build a Custom OpenClaw Docker Image for K8s

Create an optimized Docker image for OpenClaw with pre-installed dependencies, custom skills, and workspace files for faster Kubernetes deployments.

⏱ 15 minutes openclawdockercontainer-image
🚀 Deployments beginner

Run an OpenClaw Discord Bot on Kubernetes

Deploy OpenClaw as a Discord bot on Kubernetes with channel routing, mention handling, group chat rules, and persistent conversation memory.

⏱ 15 minutes openclawdiscordbot
🚀 Deployments intermediate

High Availability OpenClaw with Kubernetes

Run OpenClaw in a high-availability configuration on Kubernetes with health checks, automatic restarts, backup strategies, and monitoring for.

⏱ 25 minutes openclawhigh-availabilityhealth-checks
🚀 Deployments intermediate

Deploy OpenClaw AI Gateway on Kubernetes

Deploy the OpenClaw multi-channel AI gateway on Kubernetes with persistent storage, TLS ingress, and high availability for WhatsApp, Telegram, Discord.

⏱ 25 minutes openclawai-gatewaydeployment
📊 Observability intermediate

OpenClaw Logging with EFK Stack

Collect and analyze OpenClaw agent logs using Elasticsearch, Fluent Bit, and Kibana (EFK stack) for debugging and audit trails.

⏱ 15 minutes openclawloggingelasticsearch
📊 Observability intermediate

Monitor OpenClaw with Prometheus and Grafana

Set up monitoring for OpenClaw AI gateway on Kubernetes with Prometheus metrics, Grafana dashboards, and alerting for uptime, message throughput, and.

⏱ 20 minutes openclawprometheusgrafana
🚀 Deployments advanced

Multi-Agent Routing with OpenClaw

Configure multiple isolated AI agents in a single OpenClaw gateway on Kubernetes with per-agent workspaces, channel bindings, and session isolation.

⏱ 30 minutes openclawmulti-agentrouting
🔒 Security intermediate

Network Policies for OpenClaw on Kubernetes

Secure OpenClaw deployments with Kubernetes NetworkPolicies to restrict egress to messaging APIs, block unauthorized ingress, and isolate the gateway.

⏱ 15 minutes openclawnetwork-policysecurity
💾 Storage intermediate

OpenClaw with Persistent Storage

Configure persistent storage for OpenClaw workspaces using PVCs, StorageClasses, and backup strategies in Kubernetes clusters.

⏱ 15 minutes openclawpersistent-storagepvc
🔒 Security advanced

OpenClaw RBAC and Multi-Tenant Isolation

Configure OpenClaw RBAC policies and namespace isolation for multi-tenant Kubernetes clusters with per-team agent access controls.

⏱ 15 minutes openclawrbacmulti-tenancy
🔒 Security intermediate

Secure Secrets Management for OpenClaw

Manage API keys, bot tokens, and credentials for OpenClaw on Kubernetes using Kubernetes Secrets, External Secrets Operator, and Sealed Secrets.

⏱ 20 minutes openclawsecretssecurity
🚀 Deployments intermediate

Deploy an OpenClaw Signal Messenger Bot

Run OpenClaw as a Signal messenger AI assistant on Kubernetes with linked device pairing, end-to-end encryption, and persistent sessions.

⏱ 20 minutes openclawsignalmessaging
⚙️ Configuration intermediate

Manage OpenClaw Skills on Kubernetes

Deploy and manage OpenClaw agent skills (tools, automations, integrations) on Kubernetes using ConfigMaps, PVCs, and git-sync for dynamic capability.

⏱ 20 minutes openclawskillstools
🚀 Deployments beginner

Deploy an OpenClaw Telegram Bot on Kubernetes

Run OpenClaw as a Telegram bot on Kubernetes with BotFather setup, webhook configuration, inline commands, and persistent conversation history.

⏱ 15 minutes openclawtelegrambot
🚀 Deployments intermediate

Self-Host an OpenClaw WhatsApp AI Assistant

Deploy OpenClaw on Kubernetes to run a personal WhatsApp AI assistant with QR code pairing, persistent sessions, media support, and allow-list security.

⏱ 20 minutes openclawwhatsappai-assistant
⚙️ Configuration intermediate

GitOps for OpenClaw Workspaces on Kubernetes

Manage OpenClaw agent workspaces (SOUL.md, skills, memory) with GitOps using Flux or ArgoCD, enabling version-controlled AI persona management on.

⏱ 25 minutes openclawgitopsworkspace
🔒 Security advanced

OpenShift ACS Security for Kubernetes

Deploy and configure Red Hat Advanced Cluster Security (ACS/RHACS) for vulnerability scanning, compliance, network policies, and runtime threat detection.

⏱ 15 minutes openshiftacsrhacs
🚀 Deployments intermediate

OpenShift BuildConfig with ImageStream

Build container images on OpenShift using BuildConfig with ImageStream triggers, pushing to internal registry or local Quay.

⏱ 15 minutes openshiftbuildconfigimagestream
🚀 Deployments intermediate

OpenShift BuildConfig with Local Quay Registry

Build container images on OpenShift and push to a local Quay registry using BuildConfig, ImageStream, and robot account credentials.

⏱ 15 minutes openshiftbuildconfigquay
⚙️ Configuration intermediate

Create Custom CatalogSources for OLM Operators

Configure CatalogSource in OpenShift to serve custom operator catalogs from private registries or air-gapped environments.

⏱ 20 minutes catalogsourceolmoperators
🔒 Security intermediate

Filter CatalogSource Operators by Package

Curate a minimal CatalogSource with only approved operators using opm index pruning and file-based catalog filtering for security and compliance.

⏱ 25 minutes catalogsourceolmoperators
🔧 Troubleshooting intermediate

Troubleshoot CatalogSource and OLM Issues

Debug CatalogSource failures including pod crashes, gRPC errors, stale caches, and operator install problems in OpenShift OLM environments.

⏱ 15 minutes catalogsourceolmtroubleshooting
🔒 Security intermediate

OpenShift Cluster-Wide Pull Secret Robot Ac...

Replace admin credentials in the OpenShift cluster-wide pull secret with a Quay robot account for secure, auditable container image pulls across all namespaces.

⏱ 20 minutes openshiftquaypull-secret
🔒 Security intermediate

OpenShift Custom CA for Private Registries

Configure OpenShift to trust a custom Certificate Authority for private container registries using additionalTrustedCA and image.config.openshift.io settings.

⏱ 15 minutes openshiftcertificatestls
🚀 Deployments intermediate

Kustomize Deployments with OpenShift GitOps

Use Kustomize overlays with the OpenShift GitOps Operator (ArgoCD) to manage environment-specific configurations across dev, staging, and production clusters.

⏱ 25 minutes kustomizegitopsargocd
🚀 Deployments advanced

OpenShift IDMS & install-config.yaml Mirror...

Configure ImageDigestMirrorSet and install-config.yaml imageContentSources for OpenShift disconnected installations with mirror registries.

⏱ 30 minutes openshiftidmsmirror-registry
🚀 Deployments advanced

OpenShift ITMS ImageTagMirrorSet

Configure ImageTagMirrorSet in OpenShift 4.13+ for tag-based image mirroring. Mirror container images by tag instead of digest for disconnected clusters.

⏱ 25 minutes openshiftitmsimage-mirroring
⚙️ Configuration intermediate

OpenShift Lifecycle and Version Support

OpenShift support lifecycle guide covering version support phases, EUS releases, end-of-life dates, and upgrade planning for production clusters.

⏱ 15 minutes openshiftlifecycleupgrades
🚀 Deployments advanced

OpenShift MachineConfigPool After ITMS

Monitor and manage MachineConfigPool rollouts after applying ImageTagMirrorSet in OpenShift. Handle node restarts, paused pools, and degraded states.

⏱ 20 minutes openshiftmachineconfigpoolmcp
⚙️ Configuration intermediate

OpenShift Project Request Template Pull Sec...

Configure an OpenShift Project Request Template so every new namespace automatically gets a ServiceAccount with imagePullSecrets for your private Quay registry.

⏱ 15 minutes openshifttemplatesnamespaces
🚀 Deployments intermediate

OpenShift Serverless KnativeServing

Deploy and configure OpenShift Serverless Operator with KnativeServing for autoscaling, scale-to-zero, and traffic splitting on Kubernetes.

⏱ 15 minutes openshiftserverlessknative
⚙️ Configuration intermediate

PriorityClasses for GPU Workloads

Configure Kubernetes PriorityClasses for GPU workloads with training, serving, batch, and interactive tiers and preemption policies.

⏱ 15 minutes priorityclassgpuscheduling
🚀 Deployments beginner

Quay Robot Accounts for Kubernetes Image Pulls

Create Quay robot accounts and configure Kubernetes imagePullSecrets for automated container image pulls from private registries.

⏱ 20 minutes quaycontainer-registrysecurity
⚙️ Configuration intermediate

ResourceQuota and LimitRange for GPUs

Configure ResourceQuota and LimitRange for GPU workloads with per-tenant caps on GPU, CPU, memory, and object counts in Kubernetes.

⏱ 15 minutes resourcequotalimitrangegpu
🔒 Security intermediate

RHACS Compliance Scanning in OpenShift

Run CIS, NIST, PCI DSS, and HIPAA compliance scans with Red Hat Advanced Cluster Security and automate reporting for audits.

⏱ 15 minutes openshiftacsrhacs
🔒 Security advanced

RHACS Custom Security Policies Guide

Create and manage custom security policies in Red Hat Advanced Cluster Security for image scanning, deployment config, and runtime enforcement.

⏱ 15 minutes openshiftacsrhacs
🔒 Security advanced

RHACS Multi-Cluster Management

Manage security across multiple Kubernetes clusters with RHACS Central hub, secured cluster registration, and unified policy enforcement.

⏱ 15 minutes openshiftacsrhacs
🔒 Security advanced

RHACS Network Segmentation Policies

Use Red Hat Advanced Cluster Security network graph to discover traffic flows, generate NetworkPolicies, and enforce micro-segmentation.

⏱ 15 minutes openshiftacsrhacs
⚙️ Configuration advanced

RHCOS Node Management for OpenShift

Understand and manage Red Hat Enterprise Linux CoreOS (RHCOS) for OpenShift nodes including MachineConfig, ignition, OS updates, and node customization.

⏱ 15 minutes openshiftrhcoscoreos
🔒 Security intermediate

RHACS CI/CD Pipeline Integration

Integrate Red Hat Advanced Cluster Security into CI/CD pipelines with roxctl for image scanning, policy checks, and deployment validation.

⏱ 15 minutes openshiftacsrhacs
🔒 Security intermediate

Rotate Quay Robot Tokens in Kubernetes

Automate Quay robot account token rotation across Kubernetes namespaces with zero-downtime credential updates and validation scripts.

⏱ 15 minutes quaysecuritysecrets
🤖 AI & GPU advanced

Run:AI GPU Quotas on OpenShift

Configure Run:AI scheduler quotas for fair GPU sharing with guaranteed, over-quota borrowing, and per-tenant GPU allocation policies.

⏱ 15 minutes runaigpuquotas
🤖 AI & GPU advanced

SLURM and Kubernetes Integration

Integrate SLURM HPC workload manager with Kubernetes for hybrid AI and scientific computing. Bridge HPC batch scheduling with container orchestration.

⏱ 45 minutes slurmhpcbatch-scheduling
🌐 Networking advanced

SR-IOV Mixed NICs for GPU Nodes

Configure SR-IOV with mixed ConnectX-7 and ConnectX-6 NICs for RDMA data plane and management traffic on GPU worker nodes.

⏱ 15 minutes sriovconnectx-7connectx-6
🌐 Networking advanced

SR-IOV NicClusterPolicy for VF Configuration

Configure SR-IOV Virtual Functions on Mellanox ConnectX NICs using the NVIDIA Network Operator NicClusterPolicy for high-performance Kubernetes networking.

⏱ 25 minutes sriovnetworkingnvidia
🌐 Networking advanced

SR-IOV VF Networking for AI Workloads

Deploy SR-IOV Virtual Functions with RDMA support for distributed AI training on Kubernetes, including multi-NIC pod configuration and NCCL tuning.

⏱ 30 minutes sriovrdmaai
🔧 Troubleshooting advanced

SR-IOV VF Troubleshooting on Kubernetes

Diagnose and fix SR-IOV Virtual Function issues including VF creation failures, device plugin errors, RDMA problems, and network attachment failures.

⏱ 20 minutes sriovtroubleshootingnetworking
🤖 AI & GPU intermediate

Time-Slicing vs MIG vs Full GPU Allocation

Compare GPU sharing strategies: time-slicing for notebooks, MIG for isolated inference, and full GPU for training workloads.

⏱ 15 minutes time-slicingmiggpu-sharing
🤖 AI & GPU advanced

Triton Autoscaling with GPU Metrics

Autoscale Triton Inference Server on Kubernetes using GPU utilization, request queue depth, and inference latency metrics with KEDA and HPA.

⏱ 30 minutes tritonautoscalinggpu-metrics
🤖 AI & GPU advanced

Triton Multi-Model Serving on Kubernetes

Serve multiple LLMs simultaneously on Triton Inference Server using TensorRT-LLM and vLLM backends with model routing and GPU scheduling.

⏱ 35 minutes tritonmulti-modeltensorrt-llm
🤖 AI & GPU advanced

Triton TensorRT-LLM on Kubernetes

Deploy NVIDIA Triton Inference Server with TensorRT-LLM backend on Kubernetes for optimized large language model serving with GPU acceleration.

⏱ 45 minutes tritontensorrt-llmnvidia
🤖 AI & GPU intermediate

TensorRT-LLM vs vLLM on Triton

Compare TensorRT-LLM and vLLM backends on Triton Inference Server. When to use each, performance benchmarks, and migration strategies.

⏱ 20 minutes tritontensorrt-llmvllm
🤖 AI & GPU advanced

Triton with vLLM Backend on Kubernetes

Deploy NVIDIA Triton Inference Server with vLLM backend on Kubernetes for flexible LLM serving with PagedAttention and continuous batching.

⏱ 30 minutes tritonvllmnvidia
🔒 Security advanced

Update CA Certificates in Kubernetes

Rotate and update Certificate Authority (CA) certificates in Kubernetes clusters including kube-apiserver, etcd, kubelet, and custom CA bundles for TLS.

⏱ 45 minutes certificatescatls
🤖 AI & GPU intermediate

Deploying Vector Databases on Kubernetes

Deploy and operate vector databases (Milvus, Weaviate, Qdrant) on Kubernetes for RAG pipelines, semantic search, and AI applications with persistent.

⏱ 30 minutes vector-databasemilvusweaviate
⚙️ Configuration intermediate

Configure ClusterPolicy kernelModuleType GP...

Understand and configure the driver.kernelModuleType field in the NVIDIA GPU Operator ClusterPolicy to choose between auto, open, and proprietary kernel.

⏱ 20 minutes nvidiagpu-operatorclusterpolicy
🌐 Networking advanced

Configure GPUDirect RDMA the NVIDIA GPU Ope...

Set up GPUDirect RDMA on Kubernetes using the NVIDIA GPU Operator with either DMA-BUF or legacy nvidia-peermem, including Network Operator integration.

⏱ 60 minutes nvidiagpurdma
🔧 Troubleshooting advanced

Diagnose NVIDIA Memory-Only Kernel Modules ...

Understand why lsmod shows NVIDIA modules loaded but modinfo fails, and how the GPU Operator's proprietary driver container inserts modules without.

⏱ 15 minutes nvidiagpukernel-modules
💾 Storage advanced

Enable GPUDirect Storage on OpenShift

Configure GPUDirect Storage (GDS) with the NVIDIA GPU Operator on OpenShift, including the Open Kernel Module requirement and nvidia-fs verification.

⏱ 45 minutes nvidiagpugds
🔧 Troubleshooting advanced

Fix NVIDIA Peer Memory Driver Not Detected

Diagnose and resolve the 'NVIDIA peer memory driver not detected' error when running GPU workloads with RDMA on Kubernetes and OpenShift.

⏱ 30 minutes nvidiagpurdma
🔒 Security intermediate

SELinux and SCC Config for GPU Operator

Understand SELinux device relabeling and Security Context Constraints (SCC) requirements for the NVIDIA GPU Operator driver pods on OpenShift.

⏱ 20 minutes nvidiagpu-operatorselinux
🌐 Networking advanced

Switch GPUDirect RDMA from nvidia-peermem t...

Migrate from the legacy nvidia-peermem kernel module to the recommended DMA-BUF GPUDirect RDMA path using the NVIDIA GPU Operator.

⏱ 45 minutes nvidiagpurdma
⚙️ Configuration advanced

Switch to Open NVIDIA Kernel Modules on Ope...

Step-by-step guide to migrate the NVIDIA GPU Operator from proprietary to open kernel modules on OpenShift, enabling DMA-BUF and GPUDirect Storage support.

⏱ 60 minutes nvidiagpu-operatorkernel-modules
🔧 Troubleshooting advanced

Fix nvidia-fs Module Conflict on OpenShift

Diagnose and fix the 'insmod: ERROR: could not insert module nvidia-fs.ko: File exists' error when enabling GPUDirect Storage with the NVIDIA GPU Operator.

⏱ 30 minutes nvidiagpugds
🌐 Networking advanced

Validate GPUDirect RDMA Performance DMA-BUF

Run ib_write_bw with CUDA DMA-BUF to verify GPUDirect RDMA data transfer rates between GPU pods and validate network operator configuration.

⏱ 30 minutes nvidiagpurdma
🚀 Deployments advanced

Automate NCCL Preflight Checks in CI/CD Pipelines

Run NCCL smoke benchmarks automatically in CI/CD pipelines before promoting GPU cluster changes to production, catching regressions early.

⏱ 30 minutes ncclci-cdpreflight
🤖 AI & GPU intermediate

Compare NCCL Intra-Node vs Inter-Node Perfo...

Build a repeatable comparison between local and cross-node NCCL throughput to validate GPU cluster interconnect scaling and identify bottlenecks early.

⏱ 20 minutes ncclintra-nodeinter-node
🔧 Troubleshooting advanced

Debug NCCL Timeouts and Hangs in Kubernetes

Systematically troubleshoot NCCL runs that stall or timeout across multi-GPU and multi-node Kubernetes jobs with step-by-step diagnostic commands.

⏱ 30 minutes nccltimeouthang
📊 Observability intermediate

Monitor NCCL Benchmark Runs Prometheus & Gr...

Track NCCL benchmark outcomes and GPU telemetry over time with Prometheus and Grafana dashboards to detect communication regressions early.

⏱ 30 minutes ncclprometheusgrafana
🤖 AI & GPU intermediate

Run NCCL AllGather Benchmarks Model Paralle...

Use all-gather NCCL tests to evaluate GPU communication behavior and throughput for tensor-parallel and model-parallel distributed AI workloads on Kubernetes.

⏱ 20 minutes ncclallgatherai
🤖 AI & GPU intermediate

Benchmark NCCL AllReduce Performance

Measure NCCL AllReduce bandwidth and latency on Kubernetes to validate distributed training network performance across multi-GPU clusters.

⏱ 20 minutes ncclallreducegpu
🔧 Troubleshooting advanced

Diagnose GPU Peer-to-Peer Latency NCCL Tests

Use NCCL point-to-point and collective tests to isolate GPU peer-to-peer latency issues between GPU pairs in multi-node Kubernetes clusters.

⏱ 25 minutes nccllatencyp2p
🤖 AI & GPU intermediate

Run NCCL Tests for GPU Network Validation

Benchmark GPU-to-GPU communication using NVIDIA nccl-tests on Kubernetes or OpenShift to validate bandwidth and latency.

⏱ 25 minutes ncclnccl-testsgpu
🚀 Deployments advanced

Run NCCL Tests with MPIJob on Kubernetes

Launch multi-pod NCCL benchmarks using MPIJob on Kubernetes for repeatable, automated distributed GPU communication testing across nodes.

⏱ 35 minutes ncclmpijobkubeflow
⚙️ Configuration advanced

Tune NCCL Env Variables for RDMA & Ethernet

Apply safe NCCL environment variable profiles for RDMA-capable and Ethernet-only GPU clusters to maximize collective communication throughput.

⏱ 20 minutes ncclrdmaethernet
🔧 Troubleshooting intermediate

Validate GPU & NIC Topology Before NCCL Ben...

Inspect node-level GPU, NIC, and PCI topology on Kubernetes workers to predict and explain NCCL benchmark performance before running tests.

⏱ 15 minutes nccltopologypci
🔧 Troubleshooting intermediate

Check Bonding and Interface Status for SR-IOV

Inspect bond membership, interface state, and link aggregation to confirm which NICs can be correctly targeted by SR-IOV network policies on Kubernetes.

⏱ 15 minutes bondingnetworkingsriov
🌐 Networking advanced

Configure SriovNetwork with NVIDIA nv-ipam

Create a SriovNetwork resource that auto-generates a Multus NetworkAttachmentDefinition using nv-ipam for high-performance SR-IOV secondary interfaces.

⏱ 20 minutes sriovnetworknv-ipammultus
🌐 Networking advanced

Create an NVIDIA nv-ipam IPPool SR-IOV Netw...

Define a valid nv-ipam IPPool and node-aware sizing strategy so SR-IOV workloads can reliably obtain secondary interface IP addresses on Kubernetes.

⏱ 15 minutes nv-ipamippoolsriov
🤖 AI & GPU advanced

Deploy Mistral 7B with NVIDIA NIM

Step-by-step guide to deploy Mistral-7B using NVIDIA NIM with TensorRT-LLM backend on Kubernetes for optimized GPU inference.

⏱ 30 minutes nvidia-nimtensorrt-llmmistral
🤖 AI & GPU intermediate

Deploy Mistral 7B with vLLM on Kubernetes

Step-by-step guide to deploy Mistral-7B-v0.1 using vLLM as an OpenAI-compatible inference server on Kubernetes with GPU fractioning.

⏱ 30 minutes vllmmistralllm
🌐 Networking intermediate

Enable NIC Feature Discovery in NVIDIA Netw...

Enable NIC Feature Discovery through NicClusterPolicy and verify the node labels required by SR-IOV and RDMA GPU networking workflows on Kubernetes.

⏱ 20 minutes nvidianetwork-operatornic-feature-discovery
🔧 Troubleshooting intermediate

Identify Mellanox Interface Models from Lin...

Map interface names to PCI addresses and Mellanox model generations to build accurate SR-IOV policies and GPU networking configurations on Kubernetes.

⏱ 15 minutes mellanoxconnectxpci
🤖 AI & GPU advanced

Autoscale LLM Inference on Kubernetes

Configure Horizontal Pod Autoscaling and KEDA for LLM workloads using GPU utilization, request queue depth, and custom metrics.

⏱ 30 minutes autoscalinghpakeda
🤖 AI & GPU intermediate

Quantize LLMs for Efficient GPU Inference

Run quantized LLM models (GPTQ, AWQ, GGUF) on Kubernetes to reduce GPU memory requirements and serve models on smaller GPUs.

⏱ 20 minutes quantizationgptqawq
🤖 AI & GPU intermediate

Kubernetes LLM Serving Frameworks Compared

Compare vLLM, NVIDIA NIM, Triton, Ollama, and llama.cpp for serving LLMs on Kubernetes — features, performance, and when to use each.

⏱ 15 minutes vllmnvidia-nimtriton
🚀 Deployments beginner

Push a Podman-Saved Image to Local Quay

Load a Podman image tar archive, tag it for your Local Quay registry, authenticate with robot accounts, and push it safely to your private repo.

⏱ 15 minutes quaypodmancontainer-registry
🚀 Deployments beginner

Retag and Push an Image in Local Quay

Pull an existing container image from Local Quay, retag it for a new repository path or version, and push the updated tag back to the registry.

⏱ 10 minutes quaypodmanretag
🤖 AI & GPU advanced

Multi-GPU and Tensor Parallel LLM Inference

Deploy large language models across multiple GPUs using tensor parallelism with vLLM and NVIDIA NIM on Kubernetes for high-throughput inference serving.

⏱ 30 minutes multi-gputensor-parallelismpipeline-parallelism
🤖 AI & GPU intermediate

Install NVIDIA GPU Operator on Kubernetes

Deploy the NVIDIA GPU Operator to automate GPU driver, container toolkit, and device plugin management across your Kubernetes cluster.

⏱ 25 minutes nvidiagpu-operatorgpu
🔒 Security intermediate

Deploy a New Certificate Each OpenShift Tenant

Replace and activate new TLS certificates tenant by tenant in OpenShift IngressController deployments with verification steps and rollback guidance.

⏱ 30 minutes openshifttlscertificates
🔒 Security intermediate

OpenShift Multi-Tenant TLS per IngressContr...

Set up tenant-isolated TLS in OpenShift by assigning a dedicated certificate Secret to each IngressController for multi-tenant routing security.

⏱ 20 minutes openshiftmulti-tenantingress
🌐 Networking intermediate

Create SR-IOV VFs on OpenShift SriovNetwork...

Use the OpenShift SR-IOV Network Operator to create and manage Virtual Functions from selected Physical Functions on GPU worker nodes.

⏱ 25 minutes openshiftsriovvf
🔒 Security intermediate

Rotate OpenShift Tenant Secrets Safely

Implement low-risk secret rotation in OpenShift multi-tenant environments using versioned Secrets and controlled rollouts.

⏱ 25 minutes openshiftmulti-tenantsecrets
🤖 AI & GPU advanced

Build a RAG Pipeline on Kubernetes

Deploy a Retrieval-Augmented Generation pipeline on Kubernetes using a vector database, embedding model, and LLM inference server.

⏱ 45 minutes ragretrieval-augmented-generationvector-database
💾 Storage beginner

Configure S3 Storage Permissions for ML Models

Set up S3 bucket ACLs, IAM roles, and PVC permissions so Kubernetes inference pods can securely read large ML model weights from object storage.

⏱ 15 minutes s3storagepermissions
🤖 AI & GPU beginner

Test LLM Inference Endpoints with curl

Validate Kubernetes-hosted LLM inference services using curl against OpenAI-compatible /v1/models, /v1/completions, and /v1/chat/completions endpoints.

⏱ 10 minutes llminferencecurl
🔧 Troubleshooting advanced

Fix NVIDIA NIM TensorRT-LLM Initialization ...

Diagnose and fix common NIM TensorRT-LLM executor failures including DecoderState mismatch, version incompatibilities, and engine build errors.

⏱ 20 minutes nvidia-nimtensorrt-llmtroubleshooting
🔧 Troubleshooting advanced

Fix 'No Supported NIC Is Selected' in SR-IOV

Diagnose SR-IOV operator webhook rejections by validating node state, label selectors, PF eligibility, and SriovNetworkNodePolicy configuration.

⏱ 30 minutes sriovtroubleshootingwebhook
🔧 Troubleshooting advanced

Fix nv-ipam 'Pool Not Found' Errors in Multus

Fix nv-ipam IPPool lookup failures in Multus by aligning SriovNetwork, NetworkAttachmentDefinition, and IPPool names and namespaces correctly.

⏱ 20 minutes nv-ipammultussriov
🔧 Troubleshooting intermediate

Validate SR-IOV Operator Health Across Mult...

Run a full checklist to confirm SR-IOV discovery, VF creation, scheduler resources, and pod attachment on multiple nodes.

⏱ 30 minutes sriovvalidationmultinode
🌐 Networking intermediate

Verify Which Interface Carries OVN Underlay...

Confirm the actual OVN underlay network path by checking ovn-encap-ip, bridge port ownership, and physical route associations on Kubernetes nodes.

⏱ 15 minutes ovnunderlayopenshift
🚀 Deployments intermediate

How to Configure CronJob Concurrency Policy

Master Kubernetes CronJob concurrency policies to control parallel execution. Learn when to use Allow, Forbid, and Replace with real-world examples and.

⏱ 15 minutes cronjobconcurrencyscheduling
🚀 Deployments intermediate

How to Implement GitOps with Argo CD

Deploy and manage Kubernetes applications declaratively with Argo CD GitOps. Learn application deployment, sync strategies, multi-cluster management.

⏱ 35 minutes argocdgitopscontinuous-deployment
⚙️ Configuration advanced

Crossplane for Cloud Infrastructure Management

Use Crossplane to provision and manage cloud infrastructure resources like databases, storage, and networking using Kubernetes-native APIs and GitOps.

⏱ 55 minutes crossplaneinfrastructure-as-codecloud-resources
⚙️ Configuration advanced

Multi-Node NVLink with ComputeDomains

Configure ComputeDomains for robust and secure Multi-Node NVLink (MNNVL) workloads on NVIDIA GB200 and similar systems using DRA

⏱ 50 minutes dracomputedomainsnvlink
⚙️ Configuration advanced

Dynamic Resource Allocation GPUs NVIDIA DRA...

Learn to use Kubernetes Dynamic Resource Allocation (DRA) for flexible GPU allocation, sharing, and configuration with the NVIDIA DRA Driver

⏱ 40 minutes dragpunvidia
⚙️ Configuration advanced

MIG GPU Partitioning with DRA on Kubernetes

Dynamically partition NVIDIA A100 and H100 GPUs using Multi-Instance GPU (MIG) technology with Dynamic Resource Allocation for flexible workload isolation

⏱ 40 minutes dragpumig
⚙️ Configuration advanced

Mixed Accelerator Workloads with DRA

Orchestrate heterogeneous accelerator workloads combining GPUs, TPUs, FPGAs, and custom AI chips using Dynamic Resource Allocation

⏱ 50 minutes dragputpu
⚙️ Configuration advanced

TPU Allocation Dynamic Resource Allocation

Configure Google Cloud TPUs in Kubernetes using DRA for flexible allocation, multi-slice workloads, and optimized machine learning training

⏱ 45 minutes dratpugoogle-cloud
💾 Storage advanced

How to Backup and Restore etcd

Protect your Kubernetes cluster with etcd backup strategies. Learn to create snapshots, automate backups, and restore etcd data for disaster recovery.

⏱ 30 minutes etcdbackuprestore
🚀 Deployments intermediate

GitOps with Flux CD for Continuous Delivery

Implement GitOps workflows using Flux CD to automate Kubernetes deployments, manage infrastructure as code, and maintain desired cluster state from Git.

⏱ 45 minutes gitopsfluxcontinuous-delivery
🔒 Security advanced

gVisor Runtime Sandboxed Containers K8s

Deploy gVisor with Kubernetes RuntimeClass for sandboxed containers. Configure runsc runtime, pod isolation, and security hardening for untrusted code.

⏱ 45 minutes gvisorcontainer-runtimesandbox
🔒 Security advanced

How to Integrate HashiCorp Vault with K8s

Securely manage secrets with HashiCorp Vault in Kubernetes. Learn to inject secrets into pods using the Vault Agent Injector and CSI Provider.

⏱ 40 minutes vaultsecretssecurity
🌐 Networking advanced

Istio Traffic Management and Routing

Implement advanced traffic management with Istio service mesh including traffic splitting, fault injection, circuit breaking, and intelligent routing.

⏱ 55 minutes istioservice-meshtraffic-management
🤖 AI & GPU advanced

GPU Sharing and Bin Packing with KAI Scheduler

Maximize GPU utilization with KAI Scheduler GPU sharing, fractional GPUs, and bin packing strategies for Kubernetes AI workloads.

⏱ 35 minutes kai-schedulernvidiagpu
🤖 AI & GPU intermediate

Installing NVIDIA KAI Scheduler AI Workloads

Deploy KAI Scheduler for optimized GPU resource allocation in Kubernetes AI/ML clusters with hierarchical queues and batch scheduling

⏱ 30 minutes kai-schedulernvidiagpu
🤖 AI & GPU intermediate

Hierarchical Queues & Resource Fairness KAI...

Configure hierarchical queues in KAI Scheduler for multi-tenant GPU clusters with quotas, limits, and Dominant Resource Fairness (DRF)

⏱ 35 minutes kai-schedulernvidiagpu
🤖 AI & GPU advanced

Batch Scheduling PodGroups in KAI Scheduler

Implement gang scheduling for distributed training jobs using KAI Scheduler PodGroups to ensure all-or-nothing pod scheduling

⏱ 40 minutes kai-schedulernvidiagpu
🤖 AI & GPU advanced

Topology-Aware Scheduling with KAI Scheduler

Optimize GPU workload placement using KAI Scheduler's Topology-Aware Scheduling (TAS) for NVLink, NVSwitch, and disaggregated serving architectures

⏱ 45 minutes kai-schedulernvidiagpu
⚙️ Configuration advanced

Kubernetes API Aggregation Layer

Extend the Kubernetes API with custom API servers using the aggregation layer to add new resource types and functionality without modifying core components

⏱ 60 minutes api-aggregationapi-serverextension-apiserver
⚙️ Configuration advanced

How to Upgrade Kubernetes Clusters Safely

Perform Kubernetes cluster upgrades with zero downtime. Learn upgrade strategies, pre-flight checks, rollback procedures, and best practices for.

⏱ 45 minutes upgradecluster-managementmaintenance
🌐 Networking intermediate

Kubernetes Gateway API: HTTPRoute Guide

Deploy Kubernetes Gateway API for HTTP routing. GatewayClass, Gateway, HTTPRoute, TLSRoute, traffic splitting, and migration from Ingress resources.

⏱ 30 minutes gateway-apinetworkingingress
🔧 Troubleshooting intermediate

How to Troubleshoot Kubernetes Networking

Debug and resolve Kubernetes networking issues systematically. Learn to diagnose DNS problems, service connectivity, network policies, and CNI issues.

⏱ 30 minutes networkingtroubleshootingdns
🚀 Deployments advanced

How to Create and Use Kubernetes Operators

Learn to build Kubernetes Operators for automating application management. Understand custom controllers, the Operator pattern, and frameworks like.

⏱ 45 minutes operatorscontrollerscrd
🔒 Security intermediate

Kyverno Policy Management and Enforcement

Implement Kubernetes-native policy management using Kyverno to validate, mutate, and generate resources with declarative policies written in YAML

⏱ 45 minutes kyvernopolicy-as-codeadmission-control
🌐 Networking intermediate

Linkerd Service Mesh: mTLS and Observability

Deploy Linkerd service mesh on Kubernetes. Automatic mTLS, traffic management, observability dashboards, service profiles, and traffic splitting.

⏱ 35 minutes linkerdservice-meshmtls
🚀 Deployments intermediate

How to Use Multi-Container Pod Patterns

Master Kubernetes multi-container pod patterns including sidecar, ambassador, and adapter. Learn when and how to use each pattern for microservices.

⏱ 25 minutes multi-containersidecarambassador
📊 Observability intermediate

How to Set Up Node Problem Detector

Detect and report node-level issues automatically with Node Problem Detector. Learn to identify kernel problems, hardware failures, and container.

⏱ 20 minutes node-problem-detectorobservabilitymonitoring
🔒 Security advanced

OIDC Authentication for Kubernetes

Configure OpenID Connect (OIDC) authentication to integrate Kubernetes with identity providers like Keycloak, Okta, Azure AD, and Google for secure user.

⏱ 50 minutes oidcauthenticationidentity-provider
🚀 Deployments intermediate

Pod Priority and Preemption Scheduling Guide

Control Kubernetes scheduling with Pod Priority and Preemption. Learn to prioritize critical workloads and ensure important pods get scheduled first.

⏱ 20 minutes prioritypreemptionscheduling
🚀 Deployments intermediate

Pod Readiness Gates for Custom Conditions

Implement Pod Readiness Gates to add custom conditions that must be satisfied before a pod is considered ready for traffic, enabling integration with.

⏱ 35 minutes readiness-gatespod-conditionsload-balancer
🔒 Security intermediate

Pod Security Context and Admission Standards

Configure Pod Security Context and Admission labels. Privileged, Baseline, Restricted standards, runAsUser, fsGroup, capabilities, and seccomp profiles.

⏱ 20 minutes security-contextsecuritypod-security
⚙️ Configuration advanced

Kubernetes Scheduler Configuration and Tuning

Customize the Kubernetes scheduler with scheduling profiles, plugins, and advanced placement strategies for optimal pod placement and resource utilization

⏱ 50 minutes schedulerscheduling-profilescustom-scheduler
🔒 Security intermediate

How to Use Sealed Secrets for GitOps

Encrypt Kubernetes secrets for safe Git storage with Sealed Secrets. Learn to seal, manage, and rotate secrets in GitOps workflows securely.

⏱ 25 minutes sealed-secretsgitopssecurity
💾 Storage intermediate

K8s Backup and Disaster Recovery with Velero

Implement comprehensive backup and disaster recovery strategies for Kubernetes clusters using Velero to protect workloads, configurations, and.

⏱ 45 minutes velerobackupdisaster-recovery
🔒 Security intermediate

How to Use Workload Identity for Cloud Access

Securely access cloud services from Kubernetes pods without static credentials. Configure Workload Identity for AWS, Azure, and GCP with IRSA, Workload.

⏱ 30 minutes workload-identityiamcloud-security
🔒 Security advanced

How to Create Admission Webhooks

Build validating and mutating admission webhooks to enforce policies and modify resources. Implement custom admission controllers for Kubernetes.

⏱ 15 minutes admission-webhookssecurityvalidation
🚀 Deployments advanced

How to Implement A/B Testing with Kubernetes

Route traffic between application versions for A/B testing. Use service mesh, ingress, and custom routing rules to validate features with real users.

⏱ 15 minutes a-b-testingtraffic-routingfeature-flags
📊 Observability intermediate

How to Set Up Alertmanager for Prometheus

Configure Alertmanager to route and manage Prometheus alerts. Set up notification channels including Slack, PagerDuty, and email with routing rules.

⏱ 15 minutes alertmanagerprometheusalerts
🔒 Security advanced

How to Configure Kubernetes API Access Control

Set up secure API server access with authentication and authorization. Configure RBAC, API groups, and audit logging for cluster security.

⏱ 15 minutes api-serverauthenticationauthorization
⚙️ Configuration intermediate

Manage K8s API Versions and Deprecations

Handle Kubernetes API version changes and deprecations. Migrate resources to stable APIs and ensure cluster upgrade compatibility.

⏱ 15 minutes apideprecationmigration
🚀 Deployments intermediate

How to Deploy with Argo CD GitOps

Implement GitOps continuous deployment with Argo CD. Sync Kubernetes manifests from Git repositories automatically with declarative application management.

⏱ 15 minutes argocdgitopscontinuous-deployment
🚀 Deployments advanced

How to Implement Canary Deployments

Learn to implement canary deployments in Kubernetes for gradual rollouts. Use native features and Ingress-based traffic splitting for safe releases.

⏱ 15 minutes canarydeploymentsrollout
🔒 Security intermediate

Manage K8s Certificates with cert-manager

Automate TLS certificate management with cert-manager. Configure issuers, request certificates from Let's Encrypt, and enable automatic renewal.

⏱ 15 minutes cert-managertlscertificates
🔒 Security intermediate

How to Implement Container Security Scanning

Scan container images for vulnerabilities before deployment. Integrate Trivy and other tools into CI/CD pipelines and runtime admission control.

⏱ 15 minutes securityscanningvulnerabilities
📊 Observability intermediate

How to Implement Container Logging Patterns

Configure logging for Kubernetes applications. Implement sidecar logging, log aggregation, and structured logging best practices.

⏱ 15 minutes loggingobservabilitysidecar
🌐 Networking intermediate

How to Configure Kubernetes Cluster DNS

Customize CoreDNS configuration for your cluster. Add custom DNS entries, configure forwarding, and optimize DNS resolution.

⏱ 15 minutes corednsdnsnetworking
💾 Storage intermediate

How to Configure CSI Drivers for Storage

Install and configure Container Storage Interface (CSI) drivers for cloud and on-premises storage. Set up dynamic provisioning with AWS EBS, GCP PD, and.

⏱ 15 minutes csistorageebs
🌐 Networking intermediate

How to Customize DNS Configuration in K8s

Configure custom DNS settings in Kubernetes. Learn CoreDNS customization, stub domains, upstream servers, and pod DNS policies.

⏱ 15 minutes dnscorednsnetworking
⚙️ Configuration advanced

Create Custom Resource Definitions (CRDs)

Extend Kubernetes API with Custom Resource Definitions. Define custom objects, configure validation schemas, and manage CRD lifecycle.

⏱ 15 minutes crdcustom-resourcesapi
🔧 Troubleshooting beginner

How to Debug ImagePullBackOff Errors

Troubleshoot Kubernetes ImagePullBackOff and ErrImagePull errors. Learn to diagnose registry authentication, image tags, and network connectivity issues.

⏱ 15 minutes imagepulltroubleshootingregistry
🔧 Troubleshooting intermediate

How to Debug Kubernetes Node Issues

Diagnose and troubleshoot node problems in Kubernetes clusters. Identify resource pressure, connectivity issues, and component failures.

⏱ 15 minutes nodesdebuggingtroubleshooting
🔧 Troubleshooting intermediate

Fix OOMKilled in Kubernetes Pods

Fix OOMKilled errors in Kubernetes pods (exit code 137). Debug memory leaks, set correct memory limits, and prevent OOM kills in containers.

⏱ 15 minutes oomkilledoommemory
🔧 Troubleshooting intermediate

How to Debug Pod Networking Issues

Diagnose and fix Kubernetes networking problems. Troubleshoot connectivity, DNS resolution, service discovery, and network policies with practical tools.

⏱ 15 minutes networkingdebuggingtroubleshooting
🔧 Troubleshooting intermediate

Debug Pod Scheduling Failures in K8s

Fix pods stuck in Pending from scheduling failures. Diagnose resource constraints, node affinity, taints, tolerations, and topology spread conflicts.

⏱ 15 minutes schedulingpendingtroubleshooting
🚀 Deployments intermediate

Implement Blue-Green and Canary Deployments

Deploy applications with zero downtime using blue-green and canary strategies. Configure traffic splitting, rollbacks, and progressive delivery.

⏱ 15 minutes blue-greencanarydeployment
📊 Observability advanced

Implement Distributed Tracing with Jaeger

Deploy Jaeger for distributed tracing in Kubernetes. Learn to instrument applications, trace requests across services, and identify performance.

⏱ 15 minutes tracingjaegeropentelemetry
🌐 Networking intermediate

How to Configure Kubernetes DNS Policies

Control pod DNS resolution with DNS policies and configs. Configure custom nameservers, search domains, and optimize DNS for your workloads.

⏱ 15 minutes dnsnetworkingcoredns
⚙️ Configuration beginner

K8s Downward API: Pod Metadata Access

Use Kubernetes Downward API to expose pod metadata to containers. Access labels, annotations, resource limits, and node information as env vars or files.

⏱ 15 minutes downward-apimetadataenvironment
💾 Storage intermediate

How to Configure Dynamic Volume Provisioning

Set up dynamic volume provisioning in Kubernetes with StorageClasses. Learn to configure provisioners for AWS EBS, GCP PD, Azure Disk, and NFS.

⏱ 15 minutes storagepvpvc
🔧 Troubleshooting intermediate

Ephemeral Containers: Debug Running Pods

Debug running pods with ephemeral containers using kubectl debug. Attach debug containers without restart for production troubleshooting on Kubernetes.

⏱ 15 minutes debuggingephemeralkubectl
⚙️ Configuration beginner

Configure Environment Variables and ConfigMaps

Manage application configuration with environment variables and ConfigMaps. Learn injection methods, mounting as files, and dynamic configuration updates.

⏱ 15 minutes configmapenvironment-variablesconfiguration
🔒 Security intermediate

How to Use External Secrets Operator

Sync secrets from external providers like AWS Secrets Manager, HashiCorp Vault, and Azure Key Vault into Kubernetes using External Secrets Operator.

⏱ 15 minutes secretsexternal-secretsvault
🚀 Deployments intermediate

How to Deploy with Flux GitOps

Implement GitOps continuous deployment with Flux CD. Automatically sync Kubernetes manifests and Helm releases from Git repositories.

⏱ 15 minutes fluxgitopscontinuous-deployment
🚀 Deployments intermediate

How to Implement Graceful Shutdown

Ensure zero-downtime deployments with proper graceful shutdown. Handle SIGTERM signals, drain connections, and configure termination settings.

⏱ 15 minutes graceful-shutdownzero-downtimeSIGTERM
📊 Observability intermediate

Grafana Dashboard 6417: K8s Pod Monitoring

Set up Grafana dashboard 6417 for Kubernetes pod monitoring. Import, customize panels, PromQL queries, and cluster-wide resource visualization.

⏱ 15 minutes grafanamonitoringdashboards
🎯 Helm intermediate

How to Create Helm Charts from Scratch

Build custom Helm charts for your applications. Learn chart structure, templates, values, dependencies, and best practices for packaging Kubernetes.

⏱ 15 minutes helmchartspackaging
🎯 Helm intermediate

How to Create Helm Chart Repositories

Set up and manage Helm chart repositories. Learn to host charts on GitHub Pages, S3, GCS, and OCI registries for team distribution.

⏱ 15 minutes helmrepositorycharts
🎯 Helm intermediate

How to Manage Helm Chart Dependencies

Learn to manage Helm chart dependencies effectively. Configure subcharts, override values, and build complex applications with reusable components.

⏱ 15 minutes helmdependenciessubcharts
🎯 Helm intermediate

How to Use Helm Hooks for Lifecycle Management

Master Helm hooks for pre-install, post-install, pre-upgrade, and post-delete operations. Learn to run database migrations, backups, and cleanup tasks.

⏱ 15 minutes helmhookslifecycle
🎯 Helm advanced

Helm Sprig Functions: cat, print, toString

Master Helm Sprig functions: cat, print, toString, add1, join, and quote. String manipulation, conditionals, and advanced templating patterns.

⏱ 15 minutes helmtemplatingsprig
⚡ Autoscaling advanced

HPA Custom Metrics: Scale on Queue Depth

Configure Kubernetes HPA with custom and external metrics. Scale pods on queue depth, request latency, and Prometheus metrics via autoscaling/v2.

⏱ 15 minutes hpaautoscalingcustom-metrics
⚙️ Configuration beginner

How to Configure Image Pull Secrets

Pull container images from private registries using image pull secrets. Configure authentication for Docker Hub, GCR, ECR, ACR, and private registries.

⏱ 15 minutes image-pull-secretsregistriesdocker
🌐 Networking intermediate

How to Implement Request Routing with Ingress

Configure advanced routing rules with Kubernetes Ingress. Implement path-based routing, host-based routing, and traffic management.

⏱ 15 minutes ingressroutingtraffic
🌐 Networking intermediate

Secure Ingress with SSL/TLS Certificates

Configure TLS termination for Kubernetes Ingress using cert-manager and Let's Encrypt. Automate certificate issuance and renewal.

⏱ 15 minutes tlssslcertificates
🌐 Networking advanced

How to Implement Service Mesh with Istio

Deploy Istio service mesh for traffic management, security, and observability. Learn to configure virtual services, destination rules, and mTLS.

⏱ 15 minutes istioservice-meshtraffic
📊 Observability intermediate

Jaeger Distributed Tracing on Kubernetes

Deploy Jaeger for distributed tracing in Kubernetes. Trace requests across microservices to identify latency issues and debug complex systems.

⏱ 15 minutes jaegertracingobservability
🔧 Troubleshooting beginner

How to Run Kubernetes in Docker (kind)

Create local Kubernetes clusters using kind (Kubernetes in Docker). Set up multi-node clusters, configure networking, and test applications locally.

⏱ 15 minutes kindlocal-developmentdocker
⚙️ Configuration beginner

How to Manage Kubernetes Contexts and Clusters

Switch between multiple clusters efficiently. Configure kubeconfig, manage contexts, and set up secure multi-cluster access.

⏱ 15 minutes kubeconfigcontextsclusters
🔧 Troubleshooting beginner

Essential kubectl Commands for Debugging

Master kubectl debugging commands to troubleshoot Kubernetes issues. Learn to inspect pods, view logs, debug networking, and diagnose cluster problems.

⏱ 15 minutes kubectldebuggingtroubleshooting
🔧 Troubleshooting beginner

How to Extend kubectl with Plugins

Enhance kubectl with custom plugins using Krew package manager. Discover, install, and create plugins to boost K8s productivity.

⏱ 15 minutes kubectlkrewplugins
🔒 Security advanced

How to Configure Kubernetes Audit Logging

Enable and configure Kubernetes API audit logging. Track who did what, when, and to which resources for security compliance and troubleshooting.

⏱ 15 minutes auditloggingsecurity
⚙️ Configuration intermediate

How to Optimize Kubernetes Costs

Reduce cloud costs in Kubernetes clusters. Right-size resources, use spot instances, implement autoscaling, and monitor spending effectively.

⏱ 15 minutes costoptimizationresources
🌐 Networking intermediate

How to Configure DNS in Kubernetes

Understand and configure Kubernetes DNS with CoreDNS. Customize DNS policies, configure external DNS resolution, and troubleshoot DNS issues.

⏱ 15 minutes dnscorednsnetworking
🌐 Networking intermediate

How to Use Kubernetes EndpointSlices

Understand and manage EndpointSlices for scalable service discovery. Configure endpoint slicing, troubleshoot connectivity, and optimize large clusters.

⏱ 15 minutes endpointslicesservicesnetworking
📊 Observability beginner

How to Use Kubernetes Events for Monitoring

Monitor cluster activity through Kubernetes events. Capture, filter, and alert on events for troubleshooting and operational visibility.

⏱ 15 minutes eventsmonitoringtroubleshooting
⚙️ Configuration advanced

How to Use Kubernetes Finalizers

Manage resource cleanup with Kubernetes finalizers. Implement custom cleanup logic and understand how finalizers prevent premature resource deletion.

⏱ 15 minutes finalizerscleanupdeletion
⚙️ Configuration beginner

How to Use Labels and Annotations Effectively

Organize and manage Kubernetes resources with labels and annotations. Implement labeling strategies for selection, filtering, and metadata.

⏱ 15 minutes labelsannotationsorganization
🚀 Deployments advanced

How to Use K8s Leases for Leader Election

Implement distributed coordination with Kubernetes Leases. Configure leader election, distributed locks, and high availability patterns.

⏱ 15 minutes leasesleader-electioncoordination
🚀 Deployments beginner

K8s Probes: Liveness, Readiness, Startup

Configure Kubernetes probes for reliable apps. Complete guide to liveness, readiness, and startup probes with httpGet, tcpSocket, exec, and gRPC examples.

⏱ 15 minutes probeshealth-checksliveness
🔒 Security advanced

K8s RuntimeClass: gVisor and Kata Containers

Configure different container runtimes for workloads. Use gVisor, Kata Containers, or other runtimes for enhanced security and isolation.

⏱ 15 minutes runtimeclassgvisorkata
⚙️ Configuration intermediate

Use Kustomize for Configuration Management

Manage Kubernetes configurations with Kustomize overlays. Customize base manifests for different environments without template duplication.

⏱ 15 minutes kustomizeconfigurationoverlays
💾 Storage intermediate

How to Configure Local Persistent Volumes

Use local persistent volumes for high-performance storage with node-local SSDs. Configure local storage classes and handle node affinity constraints.

⏱ 15 minutes local-storagepersistent-volumesssd
📊 Observability advanced

Set Up Centralized Logging with EFK Stack

Deploy Elasticsearch, Fluentd, and Kibana for centralized Kubernetes logging. Learn to collect, parse, and visualize container logs at scale.

⏱ 15 minutes loggingelasticsearchfluentd
🔒 Security advanced

How to Implement Advanced NetworkPolicies

Master advanced Kubernetes NetworkPolicies for fine-grained traffic control. Learn egress rules, CIDR blocks, namespace isolation, and common security.

⏱ 15 minutes networkpolicysecuritynetworking
🌐 Networking intermediate

How to Implement Network Policies

Secure pod-to-pod communication with Kubernetes Network Policies. Learn to create ingress and egress rules, isolate namespaces, and implement zero-trust.

⏱ 15 minutes network-policiessecuritynetworking
⚙️ Configuration intermediate

How to Implement K8s Taints and Tolerations

Control pod scheduling with taints and tolerations. Dedicate nodes for specific workloads, handle node conditions, and implement scheduling constraints.

⏱ 15 minutes taintstolerationsscheduling
📊 Observability advanced

Collect Metrics with OpenTelemetry Collector

Deploy OpenTelemetry Collector for unified metrics, traces, and logs collection in Kubernetes. Learn pipelines, processors, and exporters configuration.

⏱ 15 minutes opentelemetryotelmetrics
🚀 Deployments intermediate

Configure Pod Affinity and Anti-Affinity

Control pod placement using affinity and anti-affinity rules. Co-locate related pods or spread them across nodes and zones for high availability.

⏱ 15 minutes affinityschedulingplacement
🚀 Deployments intermediate

How to Configure Pod Disruption Budgets

Protect application availability during voluntary disruptions. Configure PDBs to ensure minimum replicas during node drains, upgrades, and maintenance.

⏱ 15 minutes pdbavailabilitydisruption
🚀 Deployments intermediate

How to Implement Pod Disruption Budgets

Configure Pod Disruption Budgets (PDB) for high availability during voluntary disruptions. Ensure minimum availability during node maintenance and.

⏱ 15 minutes pdbdisruptionavailability
🚀 Deployments intermediate

How to Configure Pod Lifecycle Hooks

Execute custom actions during pod startup and shutdown with lifecycle hooks. Implement graceful shutdown, initialization tasks, and cleanup operations.

⏱ 15 minutes lifecyclehookspreStop
⚙️ Configuration advanced

How to Use Pod Presets and Mutations

Automatically inject configurations into pods using admission controllers. Configure environment variables, volumes, and annotations at deployment time.

⏱ 15 minutes admission-controllermutationinjection
🚀 Deployments intermediate

How to Configure Pod Priority and Preemption

Set pod priorities to ensure critical workloads get scheduled first. Configure preemption to evict lower-priority pods when resources are scarce.

⏱ 15 minutes prioritypreemptionscheduling
⚙️ Configuration beginner

How to Configure Pod Resource Management

Set CPU and memory requests and limits effectively. Understand QoS classes, resource quotas, and optimize container resource allocation.

⏱ 15 minutes resourcescpumemory
🔒 Security intermediate

How to Configure Pod Security Admission

Enforce security standards with Pod Security Admission. Configure privileged, baseline, and restricted policies at namespace level for cluster-wide.

⏱ 15 minutes pod-securitypsasecurity
🚀 Deployments intermediate

How to Use Pod Topology Spread Constraints

Distribute pods evenly across failure domains using topology spread constraints. Ensure high availability across zones, nodes, and custom topologies.

⏱ 15 minutes topologyschedulinghigh-availability
📊 Observability intermediate

How to Monitor Kubernetes with Prometheus

Set up Prometheus monitoring for Kubernetes clusters. Configure scraping, alerting rules, and visualize metrics with Grafana dashboards.

⏱ 15 minutes prometheusmonitoringmetrics
🌐 Networking intermediate

Kubernetes Rate Limiting with NGINX and Istio

Implement Kubernetes rate limiting for API protection. Ingress NGINX annotations, Istio rate limits, Kong plugins, and per-service rate limiting patterns.

⏱ 15 minutes rate-limitingingressapi-gateway
⚙️ Configuration beginner

K8s Resource Limits: CPU 500m Memory 256Mi

Configure Kubernetes container resource limits and requests. CPU 200m/500m, memory 256Mi syntax and format explained with QoS classes and right-sizing.

⏱ 15 minutes resourceslimitsrequests
⚙️ Configuration intermediate

How to Configure Resource Quotas per Namespace

Implement resource quotas to limit CPU, memory, and object counts per namespace. Ensure fair resource allocation across teams and environments.

⏱ 15 minutes resourcequotalimitsnamespaces
⚙️ Configuration intermediate

How to Configure Resource Quotas

Limit resource consumption per namespace with ResourceQuotas. Control CPU, memory, storage, and object counts to ensure fair cluster sharing.

⏱ 15 minutes resource-quotalimitsmulti-tenancy
🔒 Security advanced

How to Encrypt Secrets at Rest with KMS

Configure Kubernetes secrets encryption at rest using external KMS providers. Learn to set up AWS KMS, GCP KMS, and Azure Key Vault encryption.

⏱ 15 minutes encryptionkmssecrets
🔒 Security intermediate

How to Manage Kubernetes Secrets Securely

Best practices for managing secrets in Kubernetes. Learn encryption at rest, secret rotation, and integration with external secret stores.

⏱ 15 minutes secretssecurityencryption
🔒 Security intermediate

How to Configure Service Accounts and RBAC

Secure your Kubernetes workloads with service accounts and role-based access control. Create roles, bindings, and implement least-privilege access.

⏱ 15 minutes rbacservice-accountssecurity
🚀 Deployments intermediate

How to Use Sidecar Containers Effectively

Implement sidecar containers for logging, monitoring, proxying, and configuration management. Learn common sidecar patterns for microservices.

⏱ 15 minutes sidecarpatternscontainers
💾 Storage intermediate

How to Deploy Stateful Applications

Run stateful workloads on Kubernetes with StatefulSets. Manage stable identities, persistent storage, and ordered deployment for databases and caches.

⏱ 15 minutes statefulsetdatabasespersistence
🚀 Deployments intermediate

How to Manage Kubernetes StatefulSets

Deploy stateful applications with StatefulSets. Configure stable network identities, persistent storage, ordered deployment, and graceful scaling.

⏱ 15 minutes statefulsetstatefulstorage
🔧 Troubleshooting intermediate

Fix K8s Stuck Resources and Finalizers

Fix Kubernetes resources stuck in Terminating state by managing finalizers. Remove stuck namespaces, PVs, and CRDs with force-delete procedures.

⏱ 15 minutes finalizersdeletioncleanup
🚀 Deployments intermediate

How to Use Taints and Tolerations

Control pod scheduling with taints and tolerations. Dedicate nodes for specific workloads, handle node conditions, and implement advanced scheduling.

⏱ 15 minutes taintstolerationsscheduling
🚀 Deployments intermediate

Topology Spread Constraints for HA Workloads

Distribute pods across nodes, zones, and regions using topology spread constraints. Ensure high availability and fault tolerance for your workloads.

⏱ 15 minutes topologyschedulingavailability
💾 Storage intermediate

How to Set Up Volume Snapshots

Create and restore volume snapshots for persistent data backup. Learn to configure VolumeSnapshotClass and automate snapshot schedules.

⏱ 15 minutes snapshotsbackupstorage
📊 Observability intermediate

How to Configure Alertmanager for K8s Alerts

Set up Alertmanager to route, group, and deliver Kubernetes alerts. Learn to configure Slack, PagerDuty, and email notifications.

⏱ 30 minutes alertmanagermonitoringalerts
🚀 Deployments intermediate

How to Implement Blue-Green Deployments

Learn how to implement blue-green deployments in Kubernetes for instant rollbacks and zero-downtime releases. Complete guide with Service switching.

⏱ 25 minutes deploymentblue-greenzero-downtime
⚡ Autoscaling intermediate

Kubernetes Cluster Autoscaler Setup

Configure Kubernetes Cluster Autoscaler for automatic node scaling. AWS, GCP, and Azure setup, scaling policies, and pod priority integration.

⏱ 30 minutes autoscalingcluster-autoscalernodes
⚙️ Configuration beginner

Manage ConfigMaps and Secrets Effectively

Master Kubernetes ConfigMaps and Secrets for application configuration. Learn creation methods, mounting strategies, and security best practices.

⏱ 20 minutes configmapsecretsconfiguration
🔧 Troubleshooting beginner

CrashLoopBackOff: How to Fix in Kubernetes

Fix CrashLoopBackOff in Kubernetes pods. Learn why pods crash loop, systematic debugging with kubectl logs and describe, and solutions for common causes.

⏱ 15 minutes troubleshootingcrashloopbackoffdebugging
🔧 Troubleshooting intermediate

How to Debug DNS Issues in Kubernetes

Troubleshoot and resolve DNS problems in Kubernetes. Learn to diagnose CoreDNS issues, test resolution, and fix common DNS failures.

⏱ 20 minutes dnscorednstroubleshooting
🎯 Helm beginner

How to Create and Use Helm Charts

Master Helm, the Kubernetes package manager. Learn to create charts, manage releases, and template your deployments for reusability.

⏱ 30 minutes helmchartspackage-manager
🚀 Deployments beginner

How to Use Init Containers for Dependencies

Master Kubernetes init containers to handle dependencies, setup tasks, and pre-flight checks before your main application starts.

⏱ 15 minutes init-containersdependenciesstartup
🚀 Deployments beginner

How to Deploy Jobs and CronJobs

Master Kubernetes Jobs and CronJobs for batch processing and scheduled tasks. Learn completion modes, parallelism, and failure handling.

⏱ 20 minutes jobscronjobsbatch
⚙️ Configuration beginner

How to Manage K8s Namespaces Effectively

Master Kubernetes namespace organization for multi-team environments. Learn resource quotas, network policies, and RBAC per namespace.

⏱ 20 minutes namespacesmulti-tenancyorganization
🔒 Security intermediate

How to Implement Pod Security Standards

Secure your Kubernetes workloads using Pod Security Standards (PSS). Learn to enforce Privileged, Baseline, and Restricted policies at the namespace level.

⏱ 25 minutes securitypod-securitypss
📊 Observability intermediate

Set Up Prometheus Monitoring for Applications

Learn to instrument your Kubernetes applications with Prometheus metrics. Complete guide to ServiceMonitors, scraping configuration, and custom metrics.

⏱ 35 minutes prometheusmonitoringmetrics
🔒 Security intermediate

How to Configure RBAC and Service Accounts

Master Kubernetes RBAC (Role-Based Access Control) to secure your cluster. Learn to create Roles, ClusterRoles, and bind them to ServiceAccounts.

⏱ 30 minutes rbacsecurityservice-account
⚙️ Configuration beginner

Set Resource Requests and Limits Properly

Master Kubernetes resource management with proper CPU and memory requests and limits. Avoid OOMKills, throttling, and resource contention.

⏱ 20 minutes resourcescpumemory
🚀 Deployments beginner

Perform Rolling Updates with Zero Downtime

Master Kubernetes rolling updates to deploy new application versions without service interruption. Learn update strategies, rollback procedures, and.

⏱ 15 minutes deploymentrolling-updatezero-downtime
🌐 Networking beginner

Expose Services with LoadBalancer and NodePort

Learn different ways to expose Kubernetes services externally using LoadBalancer, NodePort, and ExternalIPs. Compare options for various environments.

⏱ 15 minutes serviceloadbalancernodeport
💾 Storage intermediate

How to Deploy MySQL with StatefulSet

Deploy a production-ready MySQL database on Kubernetes using StatefulSet. Learn persistent storage, headless services, and backup strategies.

⏱ 30 minutes statefulsetmysqldatabase
⚡ Autoscaling intermediate

Kubernetes VPA: Vertical Pod Autoscaler

Install and configure Kubernetes Vertical Pod Autoscaler. VPA updateMode Off, Initial, and Auto with recommendations and HPA coexistence strategies.

⏱ 25 minutes autoscalingvparesources
⚡ Autoscaling intermediate

Kubernetes HPA: Set Max Replicas and Scale

Configure Kubernetes HPA with autoscaling/v2, averageUtilization targets, and max replica settings. CPU, memory, and custom metrics scaling policies.

⏱ 20 minutes hpaautoscalingmetrics
🚀 Deployments beginner

K8s Readiness Probe: Complete YAML Guide

Kubernetes readiness probe explained with YAML examples. Configure HTTP, TCP, exec, and gRPC readiness probes with liveness and startup probe comparison.

⏱ 15 minutes probeshealth-checksliveness
🌐 Networking beginner

K8s NetworkPolicy: Default Deny All Traffic

Implement zero-trust network security in Kubernetes with default deny-all NetworkPolicy. Block all ingress and egress traffic with allow-list rules.

⏱ 10 minutes networkpolicysecurityzero-trust
🌐 Networking intermediate

Configure NGINX Ingress TLS using cert-manager

Learn how to set up NGINX Ingress Controller with automatic TLS certificates from Let's Encrypt using cert-manager. Complete YAML examples and.

⏱ 20 minutes ingressnginxtls
💾 Storage beginner

PersistentVolumeClaims with StorageClasses

Learn how to provision persistent storage for your Kubernetes workloads using PersistentVolumeClaims and StorageClasses. Includes examples for dynamic.

⏱ 15 minutes storagepvcpersistentvolume
🔧 Troubleshooting intermediate

Fix Pending PVC Status in Kubernetes

Fix PersistentVolumeClaims stuck in Pending status. Diagnose StorageClass issues, capacity problems, node affinity conflicts, and provisioner failures.

⏱ 15 minutes troubleshootingpvcstorage
Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens