MCP Drain Blocked by PDB: Workaround
Resolve OpenShift MachineConfigPool drain failures caused by PodDisruptionBudget violations. Scale down and restore after update.
💡 Quick Answer: When MCP drain hangs on “Cannot evict pod — violates PodDisruptionBudget”, scale the blocking deployment to 0 replicas (
oc scale deploy/<name> --replicas=0), let the drain complete, wait for the node to reboot and rejoin, then restore the original replica count.
The Problem
The MachineConfig Operator drains nodes before applying config changes. When a pod’s PodDisruptionBudget doesn’t allow eviction (e.g., minAvailable equals current count), the drain hangs indefinitely. The MCD retries every 5 seconds, the MCP stays stuck at UPDATING=True, and no further nodes get updated.
The Solution
Step 1: Identify the Blocking Pod
# Simulate drain to find blockers (dry-run — no actual eviction)
NODE="worker-3"
oc adm drain "$NODE" --ignore-daemonsets --delete-emptydir-data --force --dry-run=clientOutput reveals the blocker:
evicting pod my-namespace/my-app-7f8b9c6d4-x2kp9 (dry run)
error: Cannot evict pod as it would violate the pod's disruption budget.Step 2: Check the PDB
# Find PDBs across all namespaces
oc get pdb -A
# Look for ALLOWED DISRUPTIONS: 0
# NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
# my-namespace my-app-pdb 2 N/A 0 30d
# openshift-ingress router-pdb 1 N/A 0 30dStep 3: Understand Why Eviction Fails
The PDB says minAvailable: 2 but only 2 replicas exist → 0 disruptions allowed. Common scenarios:
- Custom ingress routers with hostNetwork — replacement pods can’t schedule because ports are already in use
- Stateful services with
minAvailableequal to replica count - Single-replica deployments with any PDB
Step 4: Scale Down the Blocker
# Record current replicas
DEPLOY="my-app"
NAMESPACE="my-namespace"
ORIGINAL_REPLICAS=$(oc -n "$NAMESPACE" get deploy "$DEPLOY" -o jsonpath='{.spec.replicas}')
echo "Original replicas: $ORIGINAL_REPLICAS"
# Scale to 0
oc -n "$NAMESPACE" scale deploy/"$DEPLOY" --replicas=0
# Verify pod is gone
oc -n "$NAMESPACE" get pods -l app="$DEPLOY"Step 5: Drain Completes Automatically
If MCD was already trying to drain, it will now succeed. Otherwise:
# Manual drain if needed
oc adm drain "$NODE" --ignore-daemonsets --delete-emptydir-data --force --timeout=30mStep 6: Wait for Node Update
# Monitor node status
watch "oc get node $NODE -o wide"
# Node will go through:
# 1. SchedulingDisabled (cordoned)
# 2. NotReady (rebooting)
# 3. Ready (config applied)
# Verify config applied
oc get node "$NODE" -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/state}'
# Should show: DoneStep 7: Uncordon and Restore
# Uncordon the node
oc adm uncordon "$NODE"
# Restore the deployment to original replicas
oc -n "$NAMESPACE" scale deploy/"$DEPLOY" --replicas="$ORIGINAL_REPLICAS"
# Verify pods are running
oc -n "$NAMESPACE" get pods -l app="$DEPLOY" -o wideStep 8: Repeat for Next Node
MCO will automatically begin draining the next worker. Check if it’s also blocked:
oc get mcp worker
# If UPDATING=True and UPDATEDMACHINECOUNT hasn't increased,
# repeat from Step 1 for the next nodeCommon Issues
Scaled Down But Drain Still Fails
Multiple pods may be blocking. Run dry-run again after scaling one deployment:
oc adm drain "$NODE" --ignore-daemonsets --delete-emptydir-data --force --dry-run=client
# May reveal a SECOND blocking pod/deploymentForgot to Restore Replicas
# Find deployments at 0 replicas that shouldn't be
oc get deploy -A --field-selector spec.replicas=0 | grep -v "^NAMESPACE"PDB with maxUnavailable Instead of minAvailable
# This PDB allows 1 disruption — usually won't block MCP
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
maxUnavailable: 1
selector:
matchLabels:
app: my-appBest Practices
- Always record original replica count before scaling down
- Scale down only the specific blocking deployment — not all replicas in the namespace
- Restore replicas immediately after the node rejoins — don’t leave at 0
- Consider relaxing PDBs during maintenance windows —
maxUnavailable: 1is safer thanminAvailable: N - Use the automation script for clusters with many blocking workloads
Key Takeaways
oc adm drain --dry-run=clientreveals exactly which pods block eviction- PDB with
ALLOWED DISRUPTIONS: 0is the root cause - Temporarily scale the blocking deployment to 0, drain, then restore
- MCO processes nodes sequentially — fix one blocker at a time
- Always restore replicas after the node returns to Ready

Recommended
Kubernetes Recipes — The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book →Learn by Doing
CopyPasteLearn — Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses →