Restore Scaled Deployments After Node Drain
Restore deployments scaled down for maintenance. Verify node health, check pod scheduling, and confirm service availability.
π‘ Quick Answer: After the drained node returns to
Ready, uncordon it (oc adm uncordon <node>), then restore each deployment to its original replica count (oc scale deploy/<name> --replicas=<original>). Verify pods are Running and Services have endpoints.
The Problem
You scaled down deployments to unblock a node drain. The node is back and Ready. Now you need to restore everything to its original state without missing any deployments or creating service disruptions.
The Solution
Step 1: Verify Node Is Ready
# Check node status
oc get node worker-3
# NAME STATUS ROLES AGE VERSION
# worker-3 Ready worker 30d v1.28.6 β Ready, good
# Uncordon if still cordoned
oc adm uncordon worker-3Step 2: Restore from Record
# If you saved to a file:
cat /tmp/drain-restore.txt
# openshift-ingress/router-custom=6
# monitoring/alertmanager=3
# Restore each
while IFS='=' read -r ns_deploy replicas; do
ns=$(echo "$ns_deploy" | cut -d/ -f1)
deploy=$(echo "$ns_deploy" | cut -d/ -f2)
echo "Restoring $ns/$deploy β $replicas replicas"
oc scale deploy "$deploy" -n "$ns" --replicas="$replicas"
done < /tmp/drain-restore.txtStep 3: Verify Pods Are Running
# Wait for all pods to be ready
for ns_deploy in $(cut -d= -f1 /tmp/drain-restore.txt); do
ns=$(echo "$ns_deploy" | cut -d/ -f1)
deploy=$(echo "$ns_deploy" | cut -d/ -f2)
echo "Checking $ns/$deploy..."
oc rollout status deploy "$deploy" -n "$ns" --timeout=120s
doneStep 4: Verify Services Have Endpoints
# Check endpoints for critical services
oc get endpoints -n openshift-ingress | grep router
# router-custom 10.128.2.15:80,10.128.3.22:80,... β Endpoints populatedStep 5: Clean Up
# Remove the restore file after successful restoration
rm /tmp/drain-restore.txtCommon Issues
Pods Pending After Restore
Not enough resources on remaining nodes. Check oc describe pod <pending> for scheduling failures.
Pods Schedule But Fail Readiness
The node may not have all required resources yet (e.g., GPU drivers still initializing). Wait for node-level operators to finish.
Missing Restore File
# Find deployments at 0 replicas that shouldn't be
oc get deploy -A --field-selector spec.replicas=0 | grep -v "^NAMESPACE"
# Cross-reference with what should be runningBest Practices
- Automate the restore β use the MCP update automation script
- Keep restore records until all deployments are verified
- Set deployment readiness timeouts β donβt wait forever
- Check MCP status after restoring β ensure the next node can proceed
- Notify the team after maintenance is complete
Key Takeaways
- Always uncordon before restoring replicas β pods need scheduling room
- Restore from the recorded file to avoid missing deployments
- Verify with
oc rollout statusand endpoint checks - Clean up restore records after successful verification

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
