Cluster API on AWS: Provision EKS Clusters
Use Cluster API (CAPI) to provision and manage EKS clusters declaratively. Install clusterctl, configure CAPA provider, and automate cluster lifecycle on AWS.
π‘ Quick Answer: Cluster API (CAPI) lets you manage Kubernetes clusters as Kubernetes resources. Install
clusterctlon a management cluster, configure the AWS provider (CAPA), thenkubectl applyaCluster+AWSManagedControlPlanemanifest to provision EKS clusters. Scale node groups by editingMachineDeploymentreplicas. Delete clusters withkubectl delete cluster.
The Problem
Managing multiple EKS clusters through the AWS console or Terraform becomes unwieldy at scale. Each cluster needs its own Terraform state, IAM roles, and upgrade pipeline. Cluster API treats clusters as Kubernetes-native resources β you manage them with kubectl, GitOps (ArgoCD/Flux), and standard K8s tooling.
flowchart TB
MGMT["Management Cluster<br/>(runs CAPI controllers)"] -->|"reconciles"| C1["Workload Cluster 1<br/>(EKS us-east-1)"]
MGMT -->|"reconciles"| C2["Workload Cluster 2<br/>(EKS eu-west-1)"]
MGMT -->|"reconciles"| C3["Workload Cluster 3<br/>(EKS ap-south-1)"]
subgraph CAPI["Cluster API Resources"]
CLUSTER["Cluster"]
CP["AWSManagedControlPlane"]
MD["MachineDeployment"]
MP["AWSManagedMachinePool"]
endThe Solution
Install Cluster API
# Install clusterctl CLI
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/latest/download/clusterctl-linux-amd64 -o clusterctl
chmod +x clusterctl && sudo mv clusterctl /usr/local/bin/
# Initialize CAPI with AWS provider on management cluster
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=<your-key>
export AWS_SECRET_ACCESS_KEY=<your-secret>
# Create IAM resources for CAPA
clusterawsadm bootstrap iam create-cloudformation-stack
# Initialize providers
clusterctl init --infrastructure aws
# Verify controllers are running
kubectl get pods -n capa-system
kubectl get pods -n capi-systemProvision an EKS Cluster
# eks-cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-east
namespace: default
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"]
services:
cidrBlocks: ["10.128.0.0/12"]
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: AWSManagedControlPlane
name: production-east-control-plane
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedCluster
name: production-east
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: AWSManagedControlPlane
metadata:
name: production-east-control-plane
spec:
region: us-east-1
version: v1.31.0
eksClusterName: production-east
endpointAccess:
public: true
private: true
logging:
apiServer: true
audit: true
authenticator: true
controllerManager: true
scheduler: true
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedCluster
metadata:
name: production-east
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
name: production-east-pool-0
spec:
clusterName: production-east
replicas: 3
template:
spec:
clusterName: production-east
bootstrap:
dataSecretName: ""
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
name: production-east-pool-0
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
metadata:
name: production-east-pool-0
spec:
scaling:
minSize: 3
maxSize: 10
instanceType: m5.xlarge
diskSize: 100# Apply and watch provisioning
kubectl apply -f eks-cluster.yaml
kubectl get cluster -w
# Get kubeconfig for workload cluster
clusterctl get kubeconfig production-east > production-east.kubeconfig
kubectl --kubeconfig=production-east.kubeconfig get nodesCluster Lifecycle Operations
# Scale node pool
kubectl patch machinepool production-east-pool-0 \
--type=merge -p '{"spec":{"replicas":5}}'
# Upgrade Kubernetes version
kubectl patch awsmanagedcontrolplane production-east-control-plane \
--type=merge -p '{"spec":{"version":"v1.32.0"}}'
# Add GPU node pool
cat << EOF | kubectl apply -f -
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
metadata:
name: production-east-gpu
spec:
scaling:
minSize: 0
maxSize: 4
instanceType: g5.2xlarge
diskSize: 200
labels:
nvidia.com/gpu.present: "true"
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
EOF
# Delete a workload cluster
kubectl delete cluster production-east
# CAPI deletes all AWS resources (EKS, node groups, VPC, etc.)GitOps with Cluster API
# ArgoCD Application for fleet management
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: fleet-clusters
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/example/fleet-config
path: clusters/
targetRevision: main
destination:
server: https://kubernetes.default.svc
namespace: default
syncPolicy:
automated:
prune: false # Never auto-delete clusters
selfHeal: trueCommon Issues
| Issue | Cause | Fix |
|---|---|---|
clusterctl init fails | Missing AWS credentials | Run clusterawsadm bootstrap iam first |
| Cluster stuck in Provisioning | IAM role trust policy wrong | Check CAPA controller logs |
| Node pool not scaling | Scaling limits too low | Increase maxSize in MachinePool |
| Upgrade fails | Add-ons incompatible | Verify CoreDNS/kube-proxy versions |
| Delete hangs | Finalizers blocking | Check for stuck LoadBalancers in workload cluster |
Best Practices
- Use a dedicated management cluster β donβt run CAPI on a workload cluster
- Pin Kubernetes versions β explicit versions prevent surprise upgrades
- GitOps for cluster definitions β store manifests in Git, deploy with ArgoCD
- Set
prune: falsein ArgoCD β prevent accidental cluster deletion - Label node pools by purpose β GPU, spot, on-demand, system
- Enable all EKS logging β audit logs are essential for compliance
Key Takeaways
- Cluster API manages clusters as Kubernetes resources with
kubectl - CAPA provider provisions EKS clusters, node groups, and VPC networking
- Scale, upgrade, and delete clusters with standard Kubernetes operations
- Combine with GitOps for declarative multi-cluster fleet management
- Management cluster pattern keeps control plane separate from workloads

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
