πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Troubleshooting intermediate ⏱ 15 minutes K8s 1.28+

Debug Node NotReady Status

Diagnose Kubernetes nodes stuck in NotReady state. Check kubelet logs, container runtime, network, disk pressure, and certificates.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: A NotReady node means the kubelet isn’t reporting health to the API server. Check systemctl status kubelet on the node, then journalctl -u kubelet -f for errors. Common causes: kubelet crash, container runtime down, expired certificates, disk pressure, or network partition.

The Problem

One or more nodes show NotReady status. Pods on those nodes are unreachable, new pods won’t schedule there, and if the condition persists, pods are evicted. You need to identify why the kubelet stopped reporting.

The Solution

Step 1: Identify NotReady Nodes

kubectl get nodes
# NAME       STATUS     ROLES    AGE   VERSION
# master-1   Ready      master   30d   v1.28.6
# worker-1   Ready      worker   30d   v1.28.6
# worker-2   NotReady   worker   30d   v1.28.6  ← Problem node

# Check conditions
kubectl describe node worker-2 | grep -A20 "Conditions:"

Step 2: Check kubelet on the Node

# SSH or debug into the node
# OpenShift:
oc debug node/worker-2

# Inside the debug pod:
chroot /host
systemctl status kubelet
# If kubelet is down, check why:
journalctl -u kubelet --since "10 minutes ago" --no-pager | tail -50

Step 3: Common Causes and Fixes

Cause 1: Container runtime down

systemctl status crio    # OpenShift
systemctl status containerd  # Kubernetes
# Restart if needed
systemctl restart crio

Cause 2: Disk pressure

df -h /
# If root filesystem is >85% full:
# Clean up old containers and images
crictl rmi --prune
journalctl --vacuum-size=500M

Cause 3: Certificate expired

# Check kubelet certificate
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
# notAfter=Mar 19 00:00:00 2026 GMT  ← Expired!

Cause 4: Network partition

# Can the node reach the API server?
curl -k https://<api-server>:6443/healthz
# If unreachable: check network, firewall rules, DNS

Cause 5: kubelet OOMKilled

dmesg | grep -i "oom\|killed"
journalctl -k | grep -i oom
graph TD
    A[Node NotReady] --> B{Can you SSH/debug into node?}
    B -->|No| C[Network issue or node down]
    B -->|Yes| D{kubelet running?}
    D -->|No| E[Check journalctl -u kubelet]
    D -->|Yes| F{Container runtime running?}
    F -->|No| G[Restart CRI-O/containerd]
    F -->|Yes| H{Disk full?}
    H -->|Yes| I[Clean up images/logs]
    H -->|No| J{Certs expired?}
    J -->|Yes| K[Renew certificates]
    J -->|No| L[Check kubelet logs for specific error]

Common Issues

Node Flapping Between Ready and NotReady

Usually indicates intermittent network issues or the node is under heavy load and kubelet can’t respond to heartbeats in time.

# Check node events for flapping
kubectl get events --field-selector involvedObject.name=worker-2 --sort-by='.lastTimestamp'

All Nodes NotReady Simultaneously

Likely an API server or etcd issue, not individual node problems:

kubectl get pods -n kube-system | grep -E "api|etcd"

Best Practices

  • Set up node health monitoring β€” alert on NotReady within 2 minutes
  • Use node problem detector for proactive issue detection
  • Ensure certificate auto-rotation is enabled in kubelet config
  • Monitor disk usage β€” set alerts at 80% to prevent pressure
  • Keep container runtime updated β€” runtime bugs cause kubelet failures

Key Takeaways

  • NotReady = kubelet can’t report to API server (heartbeat timeout is ~40s default)
  • Always start with systemctl status kubelet and journalctl -u kubelet on the node
  • Top causes: runtime down, disk full, certs expired, network partition
  • On OpenShift, use oc debug node/<name> since you can’t SSH to RHCOS
  • Fix the root cause, don’t just restart kubelet β€” the problem will return
#node #not-ready #kubelet #troubleshooting #health
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens