etcd Backup and Restore Kubernetes
Back up and restore etcd for Kubernetes disaster recovery. Covers automated snapshots, S3 upload, and point-in-time restore procedures.
💡 Quick Answer: Run
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d).dbas a CronJob every 6 hours. Upload to S3 with lifecycle policies. Restore withetcdctl snapshot restoreon a fresh cluster. Always test restores — untested backups aren’t backups.
The Problem
etcd holds your entire cluster state — every Deployment, Secret, ConfigMap, and RBAC rule. If etcd is lost without a backup, you lose everything. Yet many teams skip etcd backups because the control plane is ‘managed’ — until their managed provider has an outage.
The Solution
Automated etcd Backup CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-backup
namespace: kube-system
spec:
schedule: "0 */6 * * *"
jobTemplate:
spec:
template:
spec:
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
containers:
- name: etcd-backup
image: registry.example.com/etcd:3.5.15
command:
- /bin/sh
- -c
- |
BACKUP_FILE="/backup/etcd-$(date +%Y%m%d-%H%M%S).db"
etcdctl snapshot save $BACKUP_FILE \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
etcdctl snapshot status $BACKUP_FILE --write-table
# Upload to S3
aws s3 cp $BACKUP_FILE s3://cluster-backups/etcd/
# Keep only last 7 days locally
find /backup -name "etcd-*.db" -mtime +7 -delete
volumeMounts:
- name: etcd-certs
mountPath: /etc/kubernetes/pki/etcd
readOnly: true
- name: backup
mountPath: /backup
volumes:
- name: etcd-certs
hostPath:
path: /etc/kubernetes/pki/etcd
- name: backup
hostPath:
path: /var/lib/etcd-backups
restartPolicy: OnFailureRestore from Snapshot
# Stop kube-apiserver (move manifest)
sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
# Restore etcd
sudo etcdctl snapshot restore /backup/etcd-20260424.db \
--data-dir=/var/lib/etcd-restored \
--initial-cluster="master-1=https://10.0.0.10:2380" \
--initial-advertise-peer-urls=https://10.0.0.10:2380 \
--name=master-1
# Replace etcd data directory
sudo mv /var/lib/etcd /var/lib/etcd.old
sudo mv /var/lib/etcd-restored /var/lib/etcd
# Restart kube-apiserver
sudo mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/graph LR
ETCD[etcd cluster] -->|Every 6h| SNAP[Snapshot .db file]
SNAP --> LOCAL[Local disk<br/>/var/lib/etcd-backups]
SNAP --> S3[S3 bucket<br/>7-day retention]
S3 -->|Disaster| RESTORE[etcdctl snapshot restore]
RESTORE --> NEW[New etcd data dir]
NEW --> CLUSTER[Cluster recovered ✅]Common Issues
Restore fails with “member already bootstrapped”
You must restore to a NEW data directory, not the existing one. Use --data-dir=/var/lib/etcd-restored then swap directories.
Snapshot is 0 bytes
etcdctl can’t reach the endpoint. Verify certs and that etcd is listening on 127.0.0.1:2379.
Best Practices
- Backup every 6 hours minimum — RPO of 6 hours for most clusters
- Upload to remote storage — local backups are useless if the node dies
- Test restores quarterly — untested backups are not backups
- Monitor backup job failures — alert if CronJob hasn’t succeeded in 12 hours
- Encrypt backups at rest — etcd contains Secrets in base64 (not encrypted)
Key Takeaways
- etcd holds all cluster state — losing it means losing everything
etcdctl snapshot savecreates a point-in-time backup- Automate with CronJob on control plane nodes every 6 hours
- Upload to S3/GCS for off-site disaster recovery
- Always test restores — the backup is worthless if restore doesn’t work

Recommended
Kubernetes Recipes — The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book →Learn by Doing
CopyPasteLearn — Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses →🎓 Deepen Your Skills — Hands-on Courses
Courses by CopyPasteLearn.com — Learn IT by Doing
