πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Troubleshooting intermediate ⏱ 15 minutes K8s 1.28+

Fix Kubelet NotReady and Node Pressure Issues

Debug kubelet NotReady status, node pressure conditions, and eviction issues. Covers disk pressure, memory pressure, PID pressure, and network not ready.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Node NotReady means kubelet can’t communicate with the API server, or a node condition is failing (disk/memory/PID pressure, network not ready). Run kubectl describe node <name> to see conditions, then check kubelet logs on the node with journalctl -u kubelet.

Key insight: NotReady doesn’t always mean the node is down. It could be kubelet crashed, certificate expired, CNI plugin failed, or the node is under resource pressure.

The Problem

$ kubectl get nodes
NAME       STATUS     ROLES    AGE   VERSION
worker-1   NotReady   worker   30d   v1.28.4
worker-2   Ready      worker   30d   v1.28.4

The Solution

Step 1: Check Node Conditions

kubectl describe node worker-1 | grep -A20 Conditions
ConditionTrue MeansFix
MemoryPressureNode low on RAMEvict pods, add memory
DiskPressureNode low on diskClean images: crictl rmi --prune
PIDPressureToo many processesFind PID leaks, increase max PIDs
NetworkUnavailableCNI not readyRestart CNI pods
Ready=FalseKubelet stopped postingCheck kubelet service

Step 2: Check Kubelet Service

# SSH to the node or use oc debug
oc debug node/worker-1 -- chroot /host bash

# Check kubelet status
systemctl status kubelet
journalctl -u kubelet --no-pager -n 100

# Common kubelet failures:
# "failed to run Kubelet: unable to load bootstrap kubeconfig"
# "certificate has expired"
# "failed to start ContainerManager"

Step 3: Fix by Condition

Disk Pressure:

# Clean unused images
crictl rmi --prune
# Clean old containers
crictl rm $(crictl ps -a -q --state exited)
# Check disk usage
df -h /var/lib/kubelet /var/lib/containers

Memory Pressure:

# Find memory hogs
top -b -o %MEM | head -20
# Check eviction thresholds
cat /var/lib/kubelet/config.yaml | grep -A5 eviction

Certificate Expired (OpenShift):

# Check certificate expiry
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
# Approve pending CSRs
oc get csr | grep Pending | awk '{print $1}' | xargs oc adm certificate approve
graph TD
    A[Node NotReady] --> B{describe node conditions}
    B -->|DiskPressure| C[Clean images and containers]
    B -->|MemoryPressure| D[Evict pods or add RAM]
    B -->|PIDPressure| E[Find PID leaks]
    B -->|NetworkUnavailable| F[Restart CNI pods]
    B -->|Ready=False| G{Check kubelet}
    G -->|Service down| H[systemctl restart kubelet]
    G -->|Cert expired| I[Approve CSRs]
    G -->|Config error| J[Fix kubelet config]

Common Issues

Node flaps between Ready and NotReady

Usually indicates intermittent network between the node and API server. Check: network connectivity, firewall rules, and load balancer health for the API server.

All nodes NotReady after certificate rotation

Approve all pending CSRs: oc get csr -o name | xargs oc adm certificate approve

Node Ready but pods not scheduling

Check for taints: kubectl describe node worker-1 | grep Taints. The node may have been cordoned.

Best Practices

  • Monitor node conditions with alerting β€” don’t wait for NotReady
  • Set eviction thresholds appropriately for your workload: --eviction-hard=memory.available<500Mi,nodefs.available<10%
  • Auto-approve CSRs in clusters with frequent node cycling
  • Pre-pull critical images to avoid disk pressure from large image pulls
  • Use node problem detector for early warning on hardware/kernel issues

Key Takeaways

  • NotReady β‰  node down β€” could be kubelet, certificates, CNI, or resource pressure
  • kubectl describe node conditions tell you the exact problem
  • journalctl -u kubelet on the node gives detailed error logs
  • Disk pressure is the #1 cause in production β€” monitor and clean images regularly
  • Certificate expiry is the #1 cause in OpenShift β€” auto-approve CSRs
#kubelet #node #notready #eviction #troubleshooting #kubernetes
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens