Fix Stale MachineConfigPool Updates
Debug and resolve stale OpenShift MachineConfigPool updates. Identify blocked nodes, check MachineConfigDaemon logs, and unblock stuck MCP rollouts.
π‘ Quick Answer: Run
oc get mcpto check pool status. IfUPDATING=TrueandUPDATED=Falsepersists, find the blocked node withoc get mcp worker -o jsonpath='{.status.conditions}', then check MachineConfigDaemon logs on that node to identify the blocker β usually a PDB violation or pod that cannot be evicted.
The Problem
You applied a MachineConfig change (new registries.conf, kernel parameter, chrony config, etc.) and the MachineConfigPool shows UPDATING=True but never progresses. The MCP is stuck β nodes are not getting the new config, and UPDATEDMACHINECOUNT stays below MACHINECOUNT. This blocks all subsequent cluster changes.
The Solution
Step 1: Check MCP Status
oc get mcp
# NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
# master rendered-master-4688e2fd8e3040e79ec48fe88f433791 True False False 3 3 3 0 12d
# worker rendered-worker-43cbd983151c9e1eb24ef6d3906effe4 False True False 6 4 4 0 12dReading the output:
MACHINECOUNT=6β total worker nodes in the poolUPDATEDMACHINECOUNT=4β 4 nodes have the new configREADYMACHINECOUNT=4β 4 nodes are ReadyUPDATING=Trueβ MCO is still trying to update remaining nodes- 2 nodes remaining (6 - 4 = 2 need updates)
Step 2: Find Which Node Is Blocking
# Check MCP conditions for details
oc get mcp worker -o jsonpath='{.status.conditions}' | jq .
# Look for:
# "type": "Updating" β which node is being processed
# "type": "Degraded" β if there's an error
# "type": "NodeDegraded" β specific node failureStep 3: Identify the Stuck Node
# Compare desired vs current config on each worker
for node in $(oc get nodes -l node-role.kubernetes.io/worker= -o name); do
desired=$(oc get $node -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/desiredConfig}')
current=$(oc get $node -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/currentConfig}')
state=$(oc get $node -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/state}')
echo "$node: state=$state desired=$desired current=$current match=$([ "$desired" = "$current" ] && echo YES || echo NO)"
doneNodes where match=NO still need the update. The node with state=Working or state=Degraded is the current target.
Step 4: Check MachineConfigDaemon Logs
# Find the MCD pod for the stuck node
NODE_NAME="worker-3" # replace with your stuck node
MCD_POD=$(oc -n openshift-machine-config-operator get pods -o wide | grep "machine-config-daemon" | grep "$NODE_NAME" | awk '{print $1}')
# Check recent logs
oc -n openshift-machine-config-operator logs "$MCD_POD" -c machine-config-daemon --since=10mCommon log patterns:
# PDB violation (most common blocker)
Cannot drain node worker-3: eviction blocked by pod default/my-app-xxxxx because of PodDisruptionBudget
# Pod blocking eviction (no PDB, but cannot schedule replacement)
drain: pod openshift-ingress/router-custom-xxxxx cannot be evicted: no nodes available for scheduling replacement
# Node in degraded state
Node worker-3 is reporting: "unexpected on-disk state"Step 5: Unblock the Drain
Once you identify the blocking pod, see the MCP Drain PDB Workaround recipe for the fix.
graph TD
A[MCP UPDATING=True stuck] -->|Check| B[oc get mcp worker conditions]
B -->|Find node| C[Compare desired vs current config per node]
C -->|Stuck node found| D[Check MCD logs on that node]
D -->|PDB violation| E[Scale down blocking deployment]
D -->|No replacement scheduling| F[Check hostPort or resource conflicts]
D -->|Node degraded| G[Check on-disk state or force reboot]
E --> H[Drain completes]
F --> H
G --> H
H --> I[MCD reboots and applies config]
I --> J[Uncordon node]
J --> K[Repeat for next node]Common Issues
MCP Shows DEGRADED=True
# Check which node is degraded
oc get nodes -l node-role.kubernetes.io/worker= -o json | \
jq -r '.items[] | select(.metadata.annotations["machineconfiguration.openshift.io/state"]=="Degraded") | .metadata.name'
# Check the MCD logs on that node for the specific error
# Common: failed to apply rendered config, disk full, SELinux denialMultiple Nodes Stuck Simultaneously
MCO updates nodes sequentially (one at a time by default). If multiple nodes show state=Working, check maxUnavailable on the MCP:
oc get mcp worker -o jsonpath='{.spec.maxUnavailable}'
# Default: 1 (one node at a time)MCP Stuck After Removing a MachineConfig
If you deleted a MachineConfig and the MCP is now stuck with a mismatched rendered config:
# Force MCO to re-render
oc patch mcp worker --type merge -p '{"metadata":{"annotations":{"machineconfiguration.openshift.io/forceReconcile":""}}}'Best Practices
- Always check MCP status after applying MachineConfig changes β donβt assume they applied
- Monitor MCD logs during rollouts β the MCD tells you exactly whatβs blocking
- Use
maxUnavailable: 1for production β never update all workers simultaneously - Plan for PDB conflicts β know which workloads have strict PDBs before starting
- Create separate MCPs for GPU/compute nodes β isolate rollout blast radius
Key Takeaways
- MCP stuck at
UPDATING=Truemeans a node drain is blocked - Compare
desiredConfigvscurrentConfigannotations to find stuck nodes - MachineConfigDaemon logs reveal the exact blocker (PDB violation, scheduling failure)
- MCO processes nodes sequentially β fixing one node lets the rollout continue
- Always check
DEGRADEDMACHINECOUNTβ degraded nodes need manual intervention

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses β