Kubernetes Backup and Disaster Recovery with Velero
Implement comprehensive backup and disaster recovery strategies for Kubernetes clusters using Velero to protect workloads, configurations, and persistent data
Problem
You need a comprehensive backup and disaster recovery solution for Kubernetes clusters to protect against data loss, cluster failures, and facilitate migration between environments.
Solution
Use Velero (formerly Heptio Ark) to create backups of cluster resources and persistent volumes, enabling disaster recovery, cluster migration, and point-in-time restores.
Architecture
flowchart TB
subgraph Source["Kubernetes Cluster (Source)"]
WR[Workloads Resources]
PV1[Persistent Volumes]
VS[Velero Server]
WR --> VS
PV1 --> VS
end
subgraph Storage["Object Storage (S3/MinIO/GCS)"]
BK["Backups<br/>(YAML manifests, metadata)"]
SN["Volume Snapshots<br/>(references)"]
end
subgraph Target["Kubernetes Cluster (Target)"]
RW[Restored Workloads]
RV[Restored Volumes]
end
VS --> Storage
Storage -->|Restore| TargetStep 1: Install Velero CLI
Install the Velero command-line tool:
# macOS
brew install velero
# Linux
wget https://github.com/vmware-tanzu/velero/releases/download/v1.12.0/velero-v1.12.0-linux-amd64.tar.gz
tar -xvf velero-v1.12.0-linux-amd64.tar.gz
sudo mv velero-v1.12.0-linux-amd64/velero /usr/local/bin/
# Verify installation
velero version --client-onlyStep 2: Configure AWS S3 Backend
Create S3 bucket and IAM credentials for AWS:
# Create S3 bucket
aws s3 mb s3://kubernetes-backups --region us-east-1
# Create IAM policy
cat > velero-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateSnapshot",
"ec2:DeleteSnapshot"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:DeleteObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::kubernetes-backups/*"
},
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::kubernetes-backups"
}
]
}
EOF
# Create IAM user
aws iam create-user --user-name velero
aws iam put-user-policy \
--user-name velero \
--policy-name velero \
--policy-document file://velero-policy.json
# Create access key
aws iam create-access-key --user-name velero
# Create credentials file
cat > credentials-velero <<EOF
[default]
aws_access_key_id=YOUR_ACCESS_KEY_ID
aws_secret_access_key=YOUR_SECRET_ACCESS_KEY
EOFStep 3: Install Velero in Cluster
Deploy Velero server using CLI:
# Install with AWS provider
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--bucket kubernetes-backups \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1 \
--secret-file ./credentials-velero \
--use-volume-snapshots=true \
--use-node-agent
# Verify installation
kubectl get pods -n velero
kubectl get crds | grep veleroStep 4: Create Namespace Backup
Backup specific namespace:
apiVersion: velero.io/v1
kind: Backup
metadata:
name: production-backup
namespace: velero
spec:
includedNamespaces:
- production
excludedResources:
- events
- events.events.k8s.io
ttl: 720h0m0s # 30 days
storageLocation: default
volumeSnapshotLocations:
- default
snapshotVolumes: true
includeClusterResources: falseCreate backup using CLI:
# Backup entire namespace
velero backup create production-backup \
--include-namespaces production \
--wait
# Backup with label selector
velero backup create app-backup \
--selector app=my-app \
--wait
# Backup excluding certain resources
velero backup create config-backup \
--include-namespaces production \
--exclude-resources pods,replicasets \
--waitStep 5: Schedule Automated Backups
Create backup schedule:
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-production-backup
namespace: velero
spec:
schedule: "0 1 * * *" # Daily at 1 AM
template:
includedNamespaces:
- production
- staging
ttl: 720h0m0s # 30 days
storageLocation: default
snapshotVolumes: true
includeClusterResources: falseUsing CLI:
# Daily backup at 1 AM
velero schedule create daily-backup \
--schedule="0 1 * * *" \
--include-namespaces production,staging \
--ttl 720h
# Weekly full cluster backup
velero schedule create weekly-full-backup \
--schedule="0 0 * * 0" \
--ttl 2160h \
--include-cluster-resources=true
# Hourly backup of critical namespace
velero schedule create hourly-critical \
--schedule="@every 1h" \
--include-namespaces critical-apps \
--ttl 168hStep 6: Restore from Backup
Restore backup to original namespace:
# List available backups
velero backup get
# Restore entire backup
velero restore create --from-backup production-backup
# Restore to different namespace
velero restore create --from-backup production-backup \
--namespace-mappings production:production-restore
# Restore with label selector
velero restore create --from-backup production-backup \
--selector app=database \
--wait
# Restore excluding certain resources
velero restore create --from-backup production-backup \
--exclude-resources services,ingressesUsing YAML:
apiVersion: velero.io/v1
kind: Restore
metadata:
name: production-restore
namespace: velero
spec:
backupName: production-backup
includedNamespaces:
- production
excludedResources:
- nodes
- events
restorePVs: true
preserveNodePorts: true
namespaceMapping:
production: production-drStep 7: Cluster Migration
Migrate workloads between clusters:
# In source cluster: Create backup
velero backup create migration-backup \
--include-cluster-resources=true \
--wait
# Configure target cluster with same S3 backend
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--bucket kubernetes-backups \
--backup-location-config region=us-east-1 \
--secret-file ./credentials-velero
# In target cluster: Restore backup
velero restore create migration-restore \
--from-backup migration-backup \
--wait
# Verify restoration
kubectl get all --all-namespacesStep 8: Configure Backup Hooks
Run pre/post backup commands in pods:
apiVersion: v1
kind: Pod
metadata:
name: database
namespace: production
annotations:
# Backup hooks
pre.hook.backup.velero.io/container: database
pre.hook.backup.velero.io/command: '["/bin/bash", "-c", "pg_dump -U postgres mydb > /tmp/backup.sql"]'
post.hook.backup.velero.io/container: database
post.hook.backup.velero.io/command: '["/bin/bash", "-c", "rm /tmp/backup.sql"]'
spec:
containers:
- name: database
image: postgres:15Step 9: Monitor Backup Status
# Check backup status
velero backup describe production-backup
# View backup logs
velero backup logs production-backup
# List all backups
velero backup get
# Check backup expiration
velero backup get --show-labels
# Monitor backup progress
watch velero backup getVerification
Verify backups are created:
# List backups
velero backup get
# Check backup details
velero backup describe production-backup --details
# Verify backup in S3
aws s3 ls s3://kubernetes-backups/backups/
# Check backup status
kubectl get backups -n velero
kubectl describe backup production-backup -n veleroTest restore functionality:
# Create test namespace
kubectl create namespace restore-test
# Restore to test namespace
velero restore create test-restore \
--from-backup production-backup \
--namespace-mappings production:restore-test
# Check restore status
velero restore describe test-restore
# Verify resources
kubectl get all -n restore-test
# Cleanup
kubectl delete namespace restore-testMonitor Velero components:
# Check Velero pods
kubectl get pods -n velero
# View Velero logs
kubectl logs -n velero deployment/velero
# Check node-agent (for file-level backups)
kubectl logs -n velero daemonset/node-agent
# View backup locations
velero backup-location get
# Check volume snapshot locations
velero snapshot-location getAdvanced Configuration
Configure multiple backup locations:
# Add secondary backup location
velero backup-location create secondary \
--provider aws \
--bucket kubernetes-backups-dr \
--config region=us-west-2 \
--access-mode ReadWrite
# Create backup to specific location
velero backup create dr-backup \
--storage-location secondary \
--include-namespaces productionEnable file-level backup for volumes:
apiVersion: v1
kind: Pod
metadata:
name: app-with-volumes
namespace: production
annotations:
backup.velero.io/backup-volumes: data,config
spec:
containers:
- name: app
image: myapp:v1.0
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /config
volumes:
- name: data
persistentVolumeClaim:
claimName: app-data
- name: config
configMap:
name: app-configBest Practices
- Schedule regular backups for critical namespaces
- Test restore procedures regularly
- Set appropriate TTL for backup retention
- Use backup hooks for application-consistent backups
- Monitor backup completion and failures
- Store backups offsite for disaster recovery
- Document restore procedures for operations team
- Use namespace mappings for blue/green deployments
- Exclude unnecessary resources to reduce backup size
- Implement backup verification with automated tests
Common Issues
Backup stuck in InProgress:
- Check Velero pod logs
- Verify object storage connectivity
- Check for large PVs timing out
Restore failures:
- Verify target cluster has required StorageClasses
- Check RBAC permissions
- Ensure namespace exists if not using namespaceMapping
Volume snapshots not working:
- Verify CSI driver supports snapshots
- Check VolumeSnapshotClass exists
- Ensure proper IAM permissions
Related Resources
π Go Further with Kubernetes Recipes
Love this recipe? Thereβs so much more! This is just one of 100+ hands-on recipes in our comprehensive Kubernetes Recipes book.
Inside the book, youβll master:
- β Production-ready deployment strategies
- β Advanced networking and security patterns
- β Observability, monitoring, and troubleshooting
- β Real-world best practices from industry experts
βThe practical, recipe-based approach made complex Kubernetes concepts finally click for me.β
π Get Your Copy Now β Start building production-grade Kubernetes skills today!
π Get All 100+ Recipes in One Book
Stop searching β get every production-ready pattern with detailed explanations, best practices, and copy-paste YAML.
Want More Kubernetes Recipes?
This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.