OpenShift Cluster Update Process Explained
Complete guide to OpenShift Container Platform cluster updates. CVO workflow, Runlevels, Machine Config Operator node updates, update channels
π‘ Quick Answer: OpenShift updates are orchestrated by the Cluster Version Operator (CVO), which applies release manifests in ordered Runlevels. The CVO updates all control plane Operators first (60-120 min), then the Machine Config Operator (MCO) rolls out OS and config changes to nodes one-by-one (5+ min per node). Use
oc adm upgradeto check available versions andoc adm upgrade --to=<version>to initiate.
The Problem
- Cluster updates are complex β multiple Operators must update in sequence
- Wrong update channel selection can delay access to critical patches
- Node updates drain workloads β poor planning causes application downtime
- Conditional updates with known risks need informed decision-making
- Estimating update duration is difficult without understanding the phases
The Solution
OpenShift Update Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenShift Update Service (OSUS) β
β β’ Hosts update graph of all release versions β
β β’ Evaluates conditional risks per cluster β
β β’ Recommends safe update paths based on channel + version β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β Query: "What can I update to?"
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cluster Version Operator (CVO) β
β β’ Manages ClusterVersion resource β
β β’ Downloads + validates release image β
β β’ Applies manifests in Runlevel order β
β β’ Monitors Operator health between Runlevels β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β After control plane complete
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Machine Config Operator (MCO) β
β β’ Updates OS + system config on each node β
β β’ Cordon β Drain β Update β Reboot β Uncordon β
β β’ Respects maxUnavailable (default: 1) β
β β’ Control plane + compute pools updated in parallel β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββCheck Available Updates
# View recommended updates
oc adm upgrade
# Include updates with known issues (conditional updates)
oc adm upgrade --include-not-recommended
# Check current cluster version
oc get clusterversion
# NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
# version 4.19.12 True False 3d Cluster version is 4.19.12
# View available updates as JSON
oc get clusterversion version -o json | jq '.status.availableUpdates[] | .version'
# Check conditional updates (with known risks)
oc get clusterversion version -o json | jq '.status.conditionalUpdates[] |
{version: .release.version, recommended: .conditions[0].status, reason: .conditions[0].reason}'Initiate an Update
# Update to specific version
oc adm upgrade --to=4.20.3
# Update to latest in channel
oc adm upgrade --to-latest=true
# Force update (bypass conditional risk warnings)
oc adm upgrade --to=4.20.3 --force
# β οΈ Use only when you've evaluated the risk and accept it
# Switch channel first if needed
oc adm upgrade channel stable-4.20Understanding Update Channels
Channel Description When to Use
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
candidate-4.20 Unsupported early access (pre-GA) Testing only
fast-4.20 GA releases immediately on publish Need fixes ASAP
stable-4.20 GA releases after promotion delay Most production clusters
eus-4.y Extended Update Support (even versions) EUS-to-EUS jumps
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Promotion flow:
candidate β fast (GA + errata) β stable (after delay)
Delay between fast and stable:
β’ z-stream updates: ~1-2 weeks
β’ Minor version initial: ~45-90 days
Key facts:
β’ fast and stable are BOTH fully supported
β’ The only difference is time-to-availability
β’ If a regression is found on fast, it's handled same as stable
β’ Newly installed clusters default to stable# Switch channels
oc adm upgrade channel fast-4.20 # Get patches sooner
oc adm upgrade channel stable-4.20 # Wait for broader validation
oc adm upgrade channel eus-4.20 # For EUS-to-EUS updates
# Empty channel (disconnect from OSUS β air-gapped)
oc adm upgrade channel ""Update Process Workflow (Detailed)
Step 1: Admin sets target version
βββΊ spec.desiredUpdate.version in ClusterVersion CR
Step 2: CVO resolves version β release image pull spec
βββΊ Uses OSUS graph data
Step 3: CVO validates release image integrity
βββΊ Cryptographic signature verification (built-in public keys)
Step 4: CVO creates extraction Job
βββΊ openshift-cluster-version/version-$version-$hash
βββΊ Downloads release image, extracts manifests
Step 5: CVO validates extracted manifests + metadata
Step 6: CVO checks preconditions
βββΊ Operators report Upgradeable=True/False
βββΊ Blocks if critical precondition fails
Step 7: CVO records in status.desired + status.history
Step 8: CVO applies manifests in Runlevel order
βββΊ Runlevel 03: CRDs
βββΊ Runlevel 10: Core Operators
βββΊ Runlevel 15: CVO itself updates (pod restarts)
βββΊ Runlevel 20: kube-apiserver, kube-controller-manager
βββΊ Runlevel 25: Other Operators
βββΊ ...
βββΊ Runlevel 90: MCO manifests (last)
Between each Runlevel, CVO waits for ALL Operators to report:
β’ Available=True
β’ Degraded=False
β’ Achieved desired version
Step 9: MCO updates nodes
βββΊ Cordon β Drain β OS update β Reboot β Uncordon
βββΊ maxUnavailable=1 (default, recommended)
Step 10: Cluster reports Updated
βββΊ Control plane done; nodes may still be rollingMonitor Update Progress
# Overall progress
oc adm upgrade
# or
oc get clusterversion version
# Watch Operator status during update
oc get clusteroperators
# NAME VERSION AVAILABLE PROGRESSING DEGRADED
# kube-apiserver 4.20.3 True True False β updating
# network 4.19.12 True True False β updating
# machine-config 4.19.12 True False False β waiting
# Detailed CVO status
oc get clusterversion version -o json | jq '.status.conditions[] |
{type: .type, status: .status, message: .message}'
# Watch node updates (MCO phase)
oc get mcp
# NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT
# master rendered-master-abc123 True False False 3 3
# worker rendered-worker-def456 False True False 6 4
# Watch specific node progress
oc get nodes -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions[-1].type,READY:.status.conditions[-1].statusRunlevel Manifest Ordering
# Release image manifests are named:
# 0000_<runlevel>_<component>_<manifest-name>.yaml
# Extract and inspect release contents
oc adm release extract quay.io/openshift-release-dev/ocp-release:4.20.3-x86_64
# View ordering
ls | head -20
# 0000_03_authorization-openshift_01_rolebindingrestriction.crd.yaml
# 0000_03_config-operator_01_proxy.crd.yaml
# 0000_10_cluster-openshift-controller-manager_00_namespace.yaml
# 0000_20_kube-apiserver-operator_00_namespace.yaml
# 0000_25_kube-scheduler-operator_00_namespace.yaml
# 0000_50_cluster-ingress-operator_00_namespace.yaml
# 0000_90_machine-config_01_namespace.yaml
# Rules:
# 1. Lower Runlevel applied before higher
# 2. Within Runlevel: different components in parallel
# 3. Within component: lexicographic order
# 4. CVO waits for stability before next RunlevelEstimate Update Duration
Formula:
Update time = CVO phase + (node iterations Γ time per node)
CVO phase: 60-120 minutes (control plane Operators)
Node update time per node:
β’ Cloud instances: 5-10 minutes (fast reboot)
β’ Bare metal: 15-30 minutes (slow reboot + BIOS POST)
Node iterations = ceil(total_nodes / maxUnavailable)
Examples:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Cluster: 3 control + 6 compute, cloud, maxUnavailable=1
= 60 min + (6 iterations Γ 5 min) = 90 minutes
Cluster: 3 control + 6 compute, cloud, maxUnavailable=2
= 60 min + (3 iterations Γ 5 min) = 75 minutes
Cluster: 3 control + 20 compute, bare metal, maxUnavailable=1
= 90 min + (20 iterations Γ 20 min) = 490 minutes (~8 hours)
Cluster: 3 control + 20 compute, bare metal, maxUnavailable=5
= 90 min + (4 iterations Γ 20 min) = 170 minutes (~3 hours)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββMCO Node Update Sequence
For each MachineConfigPool (master, worker):
While nodes remain to update:
1. Select up to maxUnavailable nodes
2. Cordon selected nodes (no new workloads scheduled)
3. Drain pods (respecting PodDisruptionBudgets)
4. Apply new MachineConfig (OS + systemd + kubelet + CRI-O)
5. Reboot node
6. Node comes back Ready
7. Uncordon node (workloads can schedule again)
8. Repeat with next batch
Node selection order:
β’ Alphabetical by topology.kubernetes.io/zone
β’ Within zone: oldest nodes first
β’ No zones: oldest first# Check MCP status during update
oc get mcp worker -o yaml | yq '.status'
# machineCount: 6
# readyMachineCount: 4
# updatedMachineCount: 4
# unavailableMachineCount: 1
# degradedMachineCount: 0
# See which node is currently updating
oc get nodes -l node-role.kubernetes.io/worker \
-o custom-columns=NAME:.metadata.name,READY:.status.conditions[-1].status,SCHEDULABLE:.spec.unschedulableConditional Updates (Known Risks)
# View conditional updates with risk details
oc get clusterversion version -o json | jq '.status.conditionalUpdates[] | {
version: .release.version,
recommended: .conditions[0].status,
reason: .conditions[0].reason,
message: .conditions[0].message
}'
# Example output:
# {
# "version": "4.20.2",
# "recommended": "False",
# "reason": "MultipleReasons",
# "message": "In Azure clusters with user-provisioned registry storage..."
# }
# Risk evaluation: CVO continuously checks if YOUR cluster matches risk criteria
# If no match β appears in availableUpdates (recommended)
# If matches β appears in conditionalUpdates (known issues)
# You can still update β it's informational, not blocking (unless Upgradeable=False)ClusterOperator Condition Types
# Check all operator conditions
oc get co -o json | jq '.items[] | {
name: .metadata.name,
available: (.status.conditions[] | select(.type=="Available") | .status),
progressing: (.status.conditions[] | select(.type=="Progressing") | .status),
degraded: (.status.conditions[] | select(.type=="Degraded") | .status),
upgradeable: (.status.conditions[] | select(.type=="Upgradeable") | .status)
}'Condition Types:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Available=True Operator is functional (False = admin intervention needed)
Progressing=True Operator is rolling out changes (normal during update)
Degraded=True Persistent issue requiring attention (not transient)
Upgradeable=False Operator says cluster shouldn't update (blocks minor updates)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ClusterVersion Condition Types:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Failing Cannot reach desired state (unhealthy)
Invalid Error prevents CVO from taking action
RetrievedUpdates Successfully fetched update graph from OSUS
ReleaseAccepted Release payload loaded and verified successfully
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββEUS-to-EUS Updates (Control Plane Only)
# EUS versions: 4.14, 4.16, 4.18, 4.20 (even minor versions)
# Skip intermediate minor for worker nodes
# 1. Pause worker MCP
oc patch mcp/worker --type merge --patch '{"spec":{"paused":true}}'
# 2. Update control plane through intermediate version
oc adm upgrade channel eus-4.20
oc adm upgrade --to=4.19.latest # intermediate
# Wait for control plane...
oc adm upgrade --to=4.20.latest # target EUS
# 3. Resume worker MCP (nodes update directly to 4.20)
oc patch mcp/worker --type merge --patch '{"spec":{"paused":false}}'
# Benefit: Workers reboot only ONCE (not twice)Common Issues
Update stuck at βProgressingβ for >2 hours
- Cause: An Operator canβt reach stable state (often kube-apiserver graceful termination)
- Fix: Check
oc get cofor Progressing=True operators; inspect their logs
Node stuck in βSchedulingDisabledβ after update
- Cause: MCO drain stuck on pod with restrictive PDB
- Fix: Check
oc get pods --field-selector=status.phase=Pending; review PDBs
βUpgradeable=Falseβ blocking update
- Cause: An Operator detected a condition preventing safe update
- Fix: Run
oc get co <operator> -o json | jq '.status.conditions[] | select(.type=="Upgradeable")'to see message
Update not available in channel
- Cause: Release not yet promoted to stable; or conditional risk blocks recommendation
- Fix: Switch to fast channel; or use
--include-not-recommendedto see all options
MCO Degraded after node reboot
- Cause: Node failed to apply new machine config (disk full, kernel panic, etc.)
- Fix: SSH to node; check
journalctl -u machine-config-daemon; may need tooc debug node/
Best Practices
- Use stable channel for production β fast only when you need specific fixes immediately
- Never change maxUnavailable for control plane β keep at 1 (sequential)
- Check Upgradeable conditions before starting β
oc adm upgradeshows blockers - Monitor PDBs before update β restrictive PDBs cause drain timeouts
- Ensure all nodes are Ready β unavailable nodes delay the entire update
- EUS-to-EUS for large clusters β saves one full reboot cycle for all workers
- Test in non-production first β use fast channel in staging, stable in production
- Plan maintenance windows β estimate with formula: CVO time + (iterations Γ node time)
Key Takeaways
- OpenShift updates = CVO phase (Operators in Runlevels) + MCO phase (node OS/config)
- CVO applies manifests in dependency order (Runlevel 03 β 90); waits for stability between levels
- MCO updates nodes one-by-one: cordon β drain β update β reboot β uncordon
- Four channels: candidate (testing), fast (GA immediate), stable (GA delayed), eus (skip minors)
- Conditional updates: OSUS evaluates cluster-specific risks and flags known issues
- Duration estimate: 60-120 min CVO + (nodes/maxUnavailable Γ reboot time)
- Default maxUnavailable=1 for both pools β increase compute only, never control plane
- EUS-to-EUS: pause workers, update control plane through intermediate, resume = one reboot
- ClusterOperator conditions (Available/Progressing/Degraded/Upgradeable) drive update flow

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
