Cluster API: Declarative K8s Management
Manage Kubernetes cluster lifecycle with Cluster API. Provision, upgrade, and scale clusters declaratively using management clusters and infrastructure provi...
π‘ Quick Answer: Cluster API (CAPI) manages Kubernetes clusters as Kubernetes resources. A management cluster runs CAPI controllers that provision workload clusters on AWS, Azure, GCP, vSphere, bare metal, etc. Define clusters in YAML, apply to management cluster, CAPI handles provisioning. Key resources:
Cluster,MachineDeployment,MachinePool. Install:clusterctl init --infrastructure aws.
The Problem
Managing multiple Kubernetes clusters is complex:
- Manual provisioning is error-prone and slow
- Each cloud has different tools (eksctl, az aks, gcloud)
- Upgrades require careful coordination
- No unified API for multi-cloud cluster management
- Infrastructure as Code tools (Terraform) donβt understand K8s lifecycle
The Solution
Architecture
Management Cluster (runs CAPI controllers)
βββ Cluster API core controllers
βββ Bootstrap provider (kubeadm)
βββ Control plane provider (kubeadm)
βββ Infrastructure provider (AWS/Azure/vSphere/...)
β
βββ Cluster/production-us
β βββ KubeadmControlPlane (3 control planes)
β βββ MachineDeployment (10 workers)
β
βββ Cluster/staging-eu
βββ KubeadmControlPlane (1 control plane)
βββ MachineDeployment (3 workers)Install Cluster API
# Install clusterctl CLI
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.7.0/clusterctl-linux-amd64 \
-o clusterctl
chmod +x clusterctl
mv clusterctl /usr/local/bin/
# Initialize management cluster (current kubeconfig)
# AWS example:
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=<key>
export AWS_SECRET_ACCESS_KEY=<secret>
clusterctl init --infrastructure aws
# Other providers:
# clusterctl init --infrastructure azure
# clusterctl init --infrastructure vsphere
# clusterctl init --infrastructure docker (for testing)
# Verify
kubectl get providers -ACreate a Workload Cluster
# Generate cluster manifest
clusterctl generate cluster production \
--kubernetes-version v1.30.0 \
--control-plane-machine-count 3 \
--worker-machine-count 5 \
> production-cluster.yaml
# Apply to management cluster
kubectl apply -f production-cluster.yaml
# Watch provisioning
kubectl get cluster production -w
# NAME PHASE AGE
# production Provisioning 1m
# production Provisioned 5m
# Get workload cluster kubeconfig
clusterctl get kubeconfig production > production.kubeconfig
kubectl --kubeconfig=production.kubeconfig get nodesCluster Resources
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production
namespace: default
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"]
services:
cidrBlocks: ["10.96.0.0/12"]
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: production-cp
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: production
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
name: production
spec:
region: us-east-1
sshKeyName: my-key
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: production-cp
spec:
replicas: 3
version: v1.30.0
machineTemplate:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: production-cp
kubeadmConfigSpec:
initConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: production-workers
spec:
clusterName: production
replicas: 5
selector:
matchLabels: {}
template:
spec:
clusterName: production
version: v1.30.0
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: production-workers
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: production-workersScale and Upgrade
# Scale workers
kubectl patch machinedeployment production-workers --type merge \
-p '{"spec":{"replicas": 10}}'
# Upgrade Kubernetes version (rolling update)
kubectl patch kubeadmcontrolplane production-cp --type merge \
-p '{"spec":{"version":"v1.31.0"}}'
# Control planes upgrade first, then workers follow
# Monitor upgrade
kubectl get machines -w
# NAME PHASE VERSION
# production-cp-abc Running v1.31.0 β upgraded
# production-cp-def Running v1.30.0 β upgrading
# production-workers-ghi Running v1.30.0 β waiting
# Delete cluster
kubectl delete cluster production
# All infrastructure cleaned up automaticallyclusterctl Operations
# List clusters
kubectl get clusters -A
# Cluster status
clusterctl describe cluster production
# Move management to another cluster
clusterctl move --to-kubeconfig new-mgmt.kubeconfig
# Upgrade CAPI components
clusterctl upgrade plan
clusterctl upgrade apply --contract v1beta1
# List available providers
clusterctl config repositoriesCommon Issues
Cluster stuck in Provisioning
Check infrastructure provider logs: kubectl logs -n capi-system deployment/capi-controller-manager. Usually cloud credentials or quota issue.
Machines not joining
Bootstrap failure. Check: kubectl get machines β describe the stuck machine β check bootstrap data and cloud-init logs.
Management cluster lost
If management cluster dies, workload clusters keep running but canβt be managed. Use clusterctl move to back up to another cluster.
Best Practices
- Dedicated management cluster β donβt run workloads on it
- GitOps for cluster definitions β version-control all cluster YAML
- Use MachineHealthCheck β auto-replace unhealthy nodes
- Back up management cluster β etcd snapshots of CAPI state
- Test upgrades on staging before production
Key Takeaways
- Cluster API manages K8s clusters as Kubernetes resources (CRDs)
- Management cluster runs controllers; workload clusters run applications
- Supports AWS, Azure, GCP, vSphere, bare metal, Docker (test)
- Scale and upgrade clusters by patching resources (declarative)
- GitOps-friendly β define entire cluster fleet in version-controlled YAML

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Learn infrastructure as code with Terraform β provision Kubernetes clusters and cloud resources.
Start Learning βAutomate Kubernetes node configuration and cluster bootstrapping with Ansible.
Start Learning βCourses by CopyPasteLearn.com β Learn IT by Doing
