πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Troubleshooting intermediate ⏱ 15 minutes K8s 1.28+

Kubernetes Troubleshooting Flowchart

Systematic Kubernetes troubleshooting guide with flowcharts. Debug pods, services, networking, storage, and node issues step by step with kubectl commands.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: troubleshooting

The Problem

This is a fundamental Kubernetes topic that engineers search for frequently. A comprehensive reference with production-ready examples saves hours of trial and error.

The Solution

Pod Troubleshooting Flowchart

# Step 1: What's the pod status?
kubectl get pod <name> -o wide

# Step 2: Check events and conditions
kubectl describe pod <name>

# Step 3: Check logs
kubectl logs <name>
kubectl logs <name> --previous    # Previous crash
kubectl logs <name> -c <init-container>
Pod StatusCauseFix
PendingNo node has enough resourcesCheck requests, add nodes
PendingPVC not boundCheck StorageClass, PV availability
PendingNode selector/taint mismatchFix nodeSelector or add toleration
ImagePullBackOffWrong image or missing secretFix image name, create pull secret
CrashLoopBackOffApp crashes on startupCheck logs, fix config/code
OOMKilledMemory limit exceededIncrease memory limit
EvictedNode disk/memory pressureSet resource requests, check node
Running but not readyReadiness probe failingFix probe or app health endpoint

Service Troubleshooting

# Is the service selecting the right pods?
kubectl get endpoints <service-name>
# Empty endpoints = selector doesn't match any pods

# Check pod labels match service selector
kubectl get pods --show-labels
kubectl get svc <service-name> -o yaml | grep selector -A5

# Test DNS
kubectl run test --rm -it --image=busybox -- nslookup <service-name>

# Test connectivity
kubectl run test --rm -it --image=nicolaka/netshoot -- curl http://<service>:<port>

Node Troubleshooting

# Node not ready?
kubectl describe node <name> | grep -A10 Conditions
# MemoryPressure, DiskPressure, PIDPressure β†’ resource exhaustion

# Check kubelet
ssh <node> "systemctl status kubelet"
ssh <node> "journalctl -u kubelet --since '10 minutes ago'"

# Resource usage
kubectl top nodes
kubectl top pods --sort-by=memory

Quick Reference

# Most useful debug commands
kubectl get events --sort-by='.lastTimestamp' -A
kubectl describe pod <name>                     # Events section!
kubectl logs <name> -f --tail=100
kubectl exec -it <name> -- sh
kubectl debug <name> -it --image=netshoot
kubectl get pods -A | grep -v Running
kubectl top pods --sort-by=cpu
graph TD
    A[Pod not working] --> B{kubectl get pod}
    B -->|Pending| C{kubectl describe pod}
    C -->|Insufficient CPU/memory| D[Reduce requests or add nodes]
    C -->|PVC unbound| E[Check StorageClass]
    C -->|Taints/affinity| F[Fix scheduling rules]
    B -->|CrashLoopBackOff| G{kubectl logs}
    G -->|OOMKilled| H[Increase memory limit]
    G -->|App error| I[Fix application code/config]
    B -->|Running but errors| J{kubectl logs -f}
    J -->|Connection refused| K[Check Service endpoints]
    J -->|DNS failure| L[Check CoreDNS pods]

Frequently Asked Questions

What’s the first thing to check?

Always start with kubectl describe pod <name> β€” the Events section at the bottom tells you exactly what went wrong 90% of the time.

How do I debug a pod with no shell?

Use ephemeral debug containers: kubectl debug <pod> -it --image=nicolaka/netshoot --target=<container>. This attaches a debug container that shares the pod’s network namespace.

Best Practices

  • Start with the simplest configuration that meets your needs
  • Test changes in staging before production
  • Use kubectl describe and events for troubleshooting
  • Document your decisions for the team

Key Takeaways

  • This is essential Kubernetes knowledge for production operations
  • Follow the principle of least privilege and minimal configuration
  • Monitor and iterate based on real-world behavior
  • Automation reduces human error and improves consistency
#troubleshooting #debugging #flowchart #kubectl #kubernetes
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens