πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Configuration beginner ⏱ 15 minutes K8s 1.28+

Restore Scaled Deployments After Node Drain

Restore deployments scaled down for maintenance. Verify node health, check pod scheduling, and confirm service availability.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: After the drained node returns to Ready, uncordon it (oc adm uncordon <node>), then restore each deployment to its original replica count (oc scale deploy/<name> --replicas=<original>). Verify pods are Running and Services have endpoints.

The Problem

You scaled down deployments to unblock a node drain. The node is back and Ready. Now you need to restore everything to its original state without missing any deployments or creating service disruptions.

The Solution

Step 1: Verify Node Is Ready

# Check node status
oc get node worker-3
# NAME       STATUS   ROLES    AGE   VERSION
# worker-3   Ready    worker   30d   v1.28.6   ← Ready, good

# Uncordon if still cordoned
oc adm uncordon worker-3

Step 2: Restore from Record

# If you saved to a file:
cat /tmp/drain-restore.txt
# openshift-ingress/router-custom=6
# monitoring/alertmanager=3

# Restore each
while IFS='=' read -r ns_deploy replicas; do
  ns=$(echo "$ns_deploy" | cut -d/ -f1)
  deploy=$(echo "$ns_deploy" | cut -d/ -f2)
  echo "Restoring $ns/$deploy β†’ $replicas replicas"
  oc scale deploy "$deploy" -n "$ns" --replicas="$replicas"
done < /tmp/drain-restore.txt

Step 3: Verify Pods Are Running

# Wait for all pods to be ready
for ns_deploy in $(cut -d= -f1 /tmp/drain-restore.txt); do
  ns=$(echo "$ns_deploy" | cut -d/ -f1)
  deploy=$(echo "$ns_deploy" | cut -d/ -f2)
  echo "Checking $ns/$deploy..."
  oc rollout status deploy "$deploy" -n "$ns" --timeout=120s
done

Step 4: Verify Services Have Endpoints

# Check endpoints for critical services
oc get endpoints -n openshift-ingress | grep router
# router-custom   10.128.2.15:80,10.128.3.22:80,...   ← Endpoints populated

Step 5: Clean Up

# Remove the restore file after successful restoration
rm /tmp/drain-restore.txt

Common Issues

Pods Pending After Restore

Not enough resources on remaining nodes. Check oc describe pod <pending> for scheduling failures.

Pods Schedule But Fail Readiness

The node may not have all required resources yet (e.g., GPU drivers still initializing). Wait for node-level operators to finish.

Missing Restore File

# Find deployments at 0 replicas that shouldn't be
oc get deploy -A --field-selector spec.replicas=0 | grep -v "^NAMESPACE"
# Cross-reference with what should be running

Best Practices

  • Automate the restore β€” use the MCP update automation script
  • Keep restore records until all deployments are verified
  • Set deployment readiness timeouts β€” don’t wait forever
  • Check MCP status after restoring β€” ensure the next node can proceed
  • Notify the team after maintenance is complete

Key Takeaways

  • Always uncordon before restoring replicas β€” pods need scheduling room
  • Restore from the recorded file to avoid missing deployments
  • Verify with oc rollout status and endpoint checks
  • Clean up restore records after successful verification
#scaling #restore #maintenance #deployments #post-drain
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens