πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Configuration advanced ⏱ 30 minutes K8s 1.28+

Automate MCP Updates with Drain Script

Bash script to automate OpenShift MachineConfigPool updates when drains are blocked by PDB violations. Auto-detects blockers, scales down, drains, and restores.

By Luca Berton β€’ β€’ πŸ“– 6 min read

πŸ’‘ Quick Answer: Use this script to automate MCP rollouts on clusters where PDB violations block node drains. It detects the blocking pod, resolves its owning Deployment, scales it to 0, drains the node, waits for the MCD reboot, restores replicas, and repeats until the MCP reports UPDATED=True.

The Problem

In OpenShift clusters with custom ingress routers, strict PDBs, or hostNetwork workloads, MCP updates stall because the MachineConfigDaemon can’t drain nodes. Manually identifying and scaling down blocking deployments for each of 6+ worker nodes is tedious and error-prone.

The Solution

The Automation Script

#!/usr/bin/env bash
set -euo pipefail

# Configuration (override via environment variables)
MCP_NAME="${MCP_NAME:-worker}"
DRAIN_TIMEOUT="${DRAIN_TIMEOUT:-1800}"          # 30 minutes
NODE_READY_TIMEOUT="${NODE_READY_TIMEOUT:-2700}" # 45 minutes
DEFAULT_REPLICAS="${DEFAULT_REPLICAS:-6}"

# Track scaled-down deployments for cleanup
declare -A SCALED_DEPLOYS

log()  { printf "\033[1;34m[%s INFO]\033[0m %s\n" "$(date +%H:%M:%S)" "$*"; }
warn() { printf "\033[1;33m[%s WARN]\033[0m %s\n" "$(date +%H:%M:%S)" "$*"; }
err()  { printf "\033[1;31m[%s ERR ]\033[0m %s\n" "$(date +%H:%M:%S)" "$*" >&2; }

# Cleanup trap: restore any deployments that were scaled down
cleanup() {
  if (( ${#SCALED_DEPLOYS[@]} > 0 )); then
    warn "Restoring scaled-down deployments on exit..."
    for key in "${!SCALED_DEPLOYS[@]}"; do
      IFS='|' read -r ns deploy replicas <<< "${SCALED_DEPLOYS[$key]}"
      log "Restoring $ns/$deploy to replicas=$replicas"
      oc -n "$ns" scale deploy/"$deploy" --replicas="$replicas" 2>/dev/null || true
    done
  fi
}
trap cleanup EXIT

# Check if MCP is fully updated
mcp_updated() {
  oc get mcp "$MCP_NAME" -o json | \
    jq -e '.status.conditions[] | select(.type=="Updated" and .status=="True")' >/dev/null 2>&1
}

# Find nodes that still need the update
find_pending_nodes() {
  for node in $(oc get nodes -l "node-role.kubernetes.io/$MCP_NAME=" -o name); do
    local desired current
    desired=$(oc get "$node" -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/desiredConfig}')
    current=$(oc get "$node" -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/currentConfig}')
    if [[ "$desired" != "$current" ]]; then
      echo "${node#node/}"
    fi
  done
}

# Dry-run drain to find blocking pods
find_blockers() {
  local node="$1"
  oc adm drain "$node" --ignore-daemonsets --delete-emptydir-data \
    --force --dry-run=client 2>&1 | \
    grep -B1 "Cannot evict pod" | grep "evicting pod" | \
    awk '{print $3}' | sed 's/"//g' | sort -u
}

# Resolve pod β†’ Deployment name
resolve_deployment() {
  local ns="$1" pod="$2"
  local rs deploy
  rs=$(oc -n "$ns" get pod "$pod" -o jsonpath='{.metadata.ownerReferences[?(@.kind=="ReplicaSet")].name}' 2>/dev/null)
  if [[ -n "$rs" ]]; then
    deploy=$(oc -n "$ns" get rs "$rs" -o jsonpath='{.metadata.ownerReferences[?(@.kind=="Deployment")].name}' 2>/dev/null)
    echo "$deploy"
  fi
}

# Scale down a blocking deployment and record it
scale_down_blocker() {
  local ns="$1" deploy="$2"
  local replicas
  replicas=$(oc -n "$ns" get deploy "$deploy" -o jsonpath='{.spec.replicas}')
  [[ -z "$replicas" ]] && replicas=$DEFAULT_REPLICAS
  
  log "Scaling $ns/$deploy from $replicas β†’ 0"
  SCALED_DEPLOYS["$ns/$deploy"]="$ns|$deploy|$replicas"
  oc -n "$ns" scale deploy/"$deploy" --replicas=0
}

# Restore all scaled deployments
restore_all() {
  for key in "${!SCALED_DEPLOYS[@]}"; do
    IFS='|' read -r ns deploy replicas <<< "${SCALED_DEPLOYS[$key]}"
    log "Restoring $ns/$deploy to replicas=$replicas"
    oc -n "$ns" scale deploy/"$deploy" --replicas="$replicas"
    unset "SCALED_DEPLOYS[$key]"
  done
}

# Wait for node to be Ready with correct config
wait_node_ready() {
  local node="$1" start elapsed
  start=$(date +%s)
  log "Waiting for $node to become Ready with updated config..."
  
  while true; do
    local ready desired current
    ready=$(oc get node "$node" -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}')
    desired=$(oc get node "$node" -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/desiredConfig}')
    current=$(oc get node "$node" -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/currentConfig}')
    
    if [[ "$ready" == "True" && "$desired" == "$current" ]]; then
      log "Node $node is Ready and updated"
      return 0
    fi
    
    elapsed=$(( $(date +%s) - start ))
    if (( elapsed > NODE_READY_TIMEOUT )); then
      err "Timeout waiting for $node (${elapsed}s). Check MCD logs."
      return 1
    fi
    sleep 15
  done
}

# Process a single node
process_node() {
  local node="$1"
  log "===== Processing node: $node ====="
  
  # Find and resolve blockers
  local blockers
  blockers=$(find_blockers "$node")
  
  if [[ -n "$blockers" ]]; then
    log "Found blocking pods:"
    echo "$blockers" | while read -r ns_pod; do
      local ns pod deploy
      ns=$(echo "$ns_pod" | cut -d/ -f1)
      pod=$(echo "$ns_pod" | cut -d/ -f2)
      deploy=$(resolve_deployment "$ns" "$pod")
      if [[ -n "$deploy" ]]; then
        echo "  $ns/$pod β†’ deploy/$deploy"
        scale_down_blocker "$ns" "$deploy"
      else
        warn "  $ns/$pod β†’ could not resolve Deployment (manual action needed)"
      fi
    done
    sleep 3
  fi
  
  # Drain
  log "Draining $node..."
  oc adm drain "$node" --ignore-daemonsets --delete-emptydir-data \
    --force --timeout="${DRAIN_TIMEOUT}s"
  
  # Wait for MCD to reboot and apply config
  wait_node_ready "$node"
  
  # Uncordon
  oc adm uncordon "$node" 2>/dev/null || true
  log "Uncordoned $node"
  
  # Restore scaled deployments
  restore_all
  
  log "===== Completed: $node ====="
}

# ========== MAIN ==========
log "Starting MCP update automation for pool '$MCP_NAME'"

while ! mcp_updated; do
  mapfile -t pending < <(find_pending_nodes)
  
  if (( ${#pending[@]} == 0 )); then
    log "No pending nodes found. Waiting for MCP to reconcile..."
    sleep 30
    continue
  fi
  
  log "${#pending[@]} node(s) pending update: ${pending[*]}"
  process_node "${pending[0]}"
done

log "βœ… MCP '$MCP_NAME' is fully updated!"

Usage

chmod +x mcp-update-automator.sh

# Run with defaults (worker MCP, 30min drain timeout)
./mcp-update-automator.sh

# Override settings
MCP_NAME=gpu-worker DRAIN_TIMEOUT=3600 ./mcp-update-automator.sh

What It Does

  1. Finds the next worker node that needs the new MachineConfig
  2. Runs a dry-run drain to discover PDB-blocking pods
  3. Resolves each blocking pod to its owning Deployment
  4. Scales the Deployment to 0 (records original count)
  5. Drains the node for real
  6. Waits for MCD to reboot, apply config, and report Ready
  7. Uncordons the node and restores all scaled Deployments
  8. Repeats until MCP shows UPDATED=True

Common Issues

Script Interrupted Mid-Drain

The trap cleanup EXIT ensures scaled-down deployments are restored even on Ctrl+C or errors.

Non-Deployment Pods Blocking

StatefulSets, bare pods, or DaemonSets may also block. The script warns when it can’t resolve the Deployment owner β€” handle those manually.

Best Practices

  • Run during a maintenance window β€” temporary scaling affects availability
  • Test with --dry-run=client first to see what will happen
  • Monitor the script output β€” don’t leave it fully unattended
  • Review PDB policies after β€” if the same deployments block every time, fix the PDBs

Key Takeaways

  • Automates the detect β†’ scale β†’ drain β†’ wait β†’ restore cycle
  • Handles multiple blocking pods per node
  • Cleanup trap restores replicas on script exit or interruption
  • Works with any MCP name (worker, gpu-worker, infra)
  • Configurable timeouts for different cluster sizes
#openshift #machineconfig #automation #bash #mcp #drain
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens