πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Configuration advanced ⏱ 15 minutes K8s 1.28+

Cluster API: Declarative K8s Management

Manage Kubernetes cluster lifecycle with Cluster API. Provision, upgrade, and scale clusters declaratively using management clusters and infrastructure provi...

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Cluster API (CAPI) manages Kubernetes clusters as Kubernetes resources. A management cluster runs CAPI controllers that provision workload clusters on AWS, Azure, GCP, vSphere, bare metal, etc. Define clusters in YAML, apply to management cluster, CAPI handles provisioning. Key resources: Cluster, MachineDeployment, MachinePool. Install: clusterctl init --infrastructure aws.

The Problem

Managing multiple Kubernetes clusters is complex:

  • Manual provisioning is error-prone and slow
  • Each cloud has different tools (eksctl, az aks, gcloud)
  • Upgrades require careful coordination
  • No unified API for multi-cloud cluster management
  • Infrastructure as Code tools (Terraform) don’t understand K8s lifecycle

The Solution

Architecture

Management Cluster (runs CAPI controllers)
β”œβ”€β”€ Cluster API core controllers
β”œβ”€β”€ Bootstrap provider (kubeadm)
β”œβ”€β”€ Control plane provider (kubeadm)
β”œβ”€β”€ Infrastructure provider (AWS/Azure/vSphere/...)
β”‚
β”œβ”€β”€ Cluster/production-us
β”‚   β”œβ”€β”€ KubeadmControlPlane (3 control planes)
β”‚   └── MachineDeployment (10 workers)
β”‚
└── Cluster/staging-eu
    β”œβ”€β”€ KubeadmControlPlane (1 control plane)
    └── MachineDeployment (3 workers)

Install Cluster API

# Install clusterctl CLI
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.7.0/clusterctl-linux-amd64 \
  -o clusterctl
chmod +x clusterctl
mv clusterctl /usr/local/bin/

# Initialize management cluster (current kubeconfig)
# AWS example:
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=<key>
export AWS_SECRET_ACCESS_KEY=<secret>

clusterctl init --infrastructure aws

# Other providers:
# clusterctl init --infrastructure azure
# clusterctl init --infrastructure vsphere
# clusterctl init --infrastructure docker  (for testing)

# Verify
kubectl get providers -A

Create a Workload Cluster

# Generate cluster manifest
clusterctl generate cluster production \
  --kubernetes-version v1.30.0 \
  --control-plane-machine-count 3 \
  --worker-machine-count 5 \
  > production-cluster.yaml

# Apply to management cluster
kubectl apply -f production-cluster.yaml

# Watch provisioning
kubectl get cluster production -w
# NAME         PHASE          AGE
# production   Provisioning   1m
# production   Provisioned    5m

# Get workload cluster kubeconfig
clusterctl get kubeconfig production > production.kubeconfig
kubectl --kubeconfig=production.kubeconfig get nodes

Cluster Resources

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: production-cp
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: production

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: production
spec:
  region: us-east-1
  sshKeyName: my-key

---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: production-cp
spec:
  replicas: 3
  version: v1.30.0
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: AWSMachineTemplate
      name: production-cp
  kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: external

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: production-workers
spec:
  clusterName: production
  replicas: 5
  selector:
    matchLabels: {}
  template:
    spec:
      clusterName: production
      version: v1.30.0
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: production-workers
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AWSMachineTemplate
        name: production-workers

Scale and Upgrade

# Scale workers
kubectl patch machinedeployment production-workers --type merge \
  -p '{"spec":{"replicas": 10}}'

# Upgrade Kubernetes version (rolling update)
kubectl patch kubeadmcontrolplane production-cp --type merge \
  -p '{"spec":{"version":"v1.31.0"}}'
# Control planes upgrade first, then workers follow

# Monitor upgrade
kubectl get machines -w
# NAME                        PHASE     VERSION
# production-cp-abc           Running   v1.31.0  ← upgraded
# production-cp-def           Running   v1.30.0  ← upgrading
# production-workers-ghi      Running   v1.30.0  ← waiting

# Delete cluster
kubectl delete cluster production
# All infrastructure cleaned up automatically

clusterctl Operations

# List clusters
kubectl get clusters -A

# Cluster status
clusterctl describe cluster production

# Move management to another cluster
clusterctl move --to-kubeconfig new-mgmt.kubeconfig

# Upgrade CAPI components
clusterctl upgrade plan
clusterctl upgrade apply --contract v1beta1

# List available providers
clusterctl config repositories

Common Issues

Cluster stuck in Provisioning

Check infrastructure provider logs: kubectl logs -n capi-system deployment/capi-controller-manager. Usually cloud credentials or quota issue.

Machines not joining

Bootstrap failure. Check: kubectl get machines β†’ describe the stuck machine β†’ check bootstrap data and cloud-init logs.

Management cluster lost

If management cluster dies, workload clusters keep running but can’t be managed. Use clusterctl move to back up to another cluster.

Best Practices

  • Dedicated management cluster β€” don’t run workloads on it
  • GitOps for cluster definitions β€” version-control all cluster YAML
  • Use MachineHealthCheck β€” auto-replace unhealthy nodes
  • Back up management cluster β€” etcd snapshots of CAPI state
  • Test upgrades on staging before production

Key Takeaways

  • Cluster API manages K8s clusters as Kubernetes resources (CRDs)
  • Management cluster runs controllers; workload clusters run applications
  • Supports AWS, Azure, GCP, vSphere, bare metal, Docker (test)
  • Scale and upgrade clusters by patching resources (declarative)
  • GitOps-friendly β€” define entire cluster fleet in version-controlled YAML
#cluster-api #cluster-management #infrastructure #automation #cka
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens