Debug Node NotReady Status
Diagnose Kubernetes nodes stuck in NotReady state. Check kubelet logs, container runtime, network, disk pressure, and certificates.
π‘ Quick Answer: A NotReady node means the kubelet isnβt reporting health to the API server. Check
systemctl status kubeleton the node, thenjournalctl -u kubelet -ffor errors. Common causes: kubelet crash, container runtime down, expired certificates, disk pressure, or network partition.
The Problem
One or more nodes show NotReady status. Pods on those nodes are unreachable, new pods wonβt schedule there, and if the condition persists, pods are evicted. You need to identify why the kubelet stopped reporting.
The Solution
Step 1: Identify NotReady Nodes
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# master-1 Ready master 30d v1.28.6
# worker-1 Ready worker 30d v1.28.6
# worker-2 NotReady worker 30d v1.28.6 β Problem node
# Check conditions
kubectl describe node worker-2 | grep -A20 "Conditions:"Step 2: Check kubelet on the Node
# SSH or debug into the node
# OpenShift:
oc debug node/worker-2
# Inside the debug pod:
chroot /host
systemctl status kubelet
# If kubelet is down, check why:
journalctl -u kubelet --since "10 minutes ago" --no-pager | tail -50Step 3: Common Causes and Fixes
Cause 1: Container runtime down
systemctl status crio # OpenShift
systemctl status containerd # Kubernetes
# Restart if needed
systemctl restart crioCause 2: Disk pressure
df -h /
# If root filesystem is >85% full:
# Clean up old containers and images
crictl rmi --prune
journalctl --vacuum-size=500MCause 3: Certificate expired
# Check kubelet certificate
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
# notAfter=Mar 19 00:00:00 2026 GMT β Expired!Cause 4: Network partition
# Can the node reach the API server?
curl -k https://<api-server>:6443/healthz
# If unreachable: check network, firewall rules, DNSCause 5: kubelet OOMKilled
dmesg | grep -i "oom\|killed"
journalctl -k | grep -i oomgraph TD
A[Node NotReady] --> B{Can you SSH/debug into node?}
B -->|No| C[Network issue or node down]
B -->|Yes| D{kubelet running?}
D -->|No| E[Check journalctl -u kubelet]
D -->|Yes| F{Container runtime running?}
F -->|No| G[Restart CRI-O/containerd]
F -->|Yes| H{Disk full?}
H -->|Yes| I[Clean up images/logs]
H -->|No| J{Certs expired?}
J -->|Yes| K[Renew certificates]
J -->|No| L[Check kubelet logs for specific error]Common Issues
Node Flapping Between Ready and NotReady
Usually indicates intermittent network issues or the node is under heavy load and kubelet canβt respond to heartbeats in time.
# Check node events for flapping
kubectl get events --field-selector involvedObject.name=worker-2 --sort-by='.lastTimestamp'All Nodes NotReady Simultaneously
Likely an API server or etcd issue, not individual node problems:
kubectl get pods -n kube-system | grep -E "api|etcd"Best Practices
- Set up node health monitoring β alert on NotReady within 2 minutes
- Use node problem detector for proactive issue detection
- Ensure certificate auto-rotation is enabled in kubelet config
- Monitor disk usage β set alerts at 80% to prevent pressure
- Keep container runtime updated β runtime bugs cause kubelet failures
Key Takeaways
- NotReady = kubelet canβt report to API server (heartbeat timeout is ~40s default)
- Always start with
systemctl status kubeletandjournalctl -u kubeleton the node - Top causes: runtime down, disk full, certs expired, network partition
- On OpenShift, use
oc debug node/<name>since you canβt SSH to RHCOS - Fix the root cause, donβt just restart kubelet β the problem will return

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses β