Kubernetes Cost Optimization Strategies
Comprehensive cost reduction strategies for Kubernetes clusters: right-sizing, spot instances, autoscaling, idle resource detection, namespace budgets, and GPU
π‘ Quick Answer: Kubernetes cost optimization combines right-sizing (VPA), horizontal scaling (HPA), cluster autoscaling (Karpenter/CA), spot instances for fault-tolerant workloads, idle resource detection, and GPU time-sharing β typically achieving 30-60% cost reduction.
The Problem
Kubernetes clusters waste money through:
- Over-provisioned containers (requesting 4x what they use)
- Always-on nodes for variable workloads
- Idle GPUs ($10-30/hour doing nothing)
- No visibility into per-team/per-workload costs
- Paying on-demand prices for fault-tolerant workloads
The Solution
Cost Optimization Framework
Impact Strategy Typical Savings
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
HIGH Right-size containers (VPA) 30-50%
HIGH Spot/preemptible instances 60-90% per node
HIGH GPU time-sharing/MIG 2-7x utilization
MEDIUM Cluster autoscaler (scale to 0) 20-40%
MEDIUM HPA (scale with demand) 20-30%
MEDIUM Namespace resource quotas Prevent sprawl
LOW Pod priority/preemption 5-10%
LOW Storage class tiering 10-20%Right-Sizing with VPA
# Step 1: Deploy VPA in recommendation mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off"
# Step 2: After 1 week, read recommendations
# kubectl describe vpa api-vpa
# Target: CPU 150m, Memory 256Mi
# Current: CPU 1000m, Memory 2Gi
# Savings: 850m CPU, 1.75Gi memory PER PODSpot Instance Node Pools
# Karpenter NodePool for spot instances
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-general
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.xlarge", "m5a.xlarge", "m6i.xlarge", "m6a.xlarge"]
nodeClassRef:
name: default
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
limits:
cpu: "100"
memory: 400Gi
---
# Workloads opt-in to spot via tolerations
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
spec:
template:
spec:
tolerations:
- key: "karpenter.sh/capacity-type"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
nodeSelector:
karpenter.sh/capacity-type: spotGPU Cost Management
# MIG slicing: 1 A100 β 7 workloads
# Time-slicing: share GPU across Pods
apiVersion: v1
kind: ConfigMap
metadata:
name: gpu-sharing-config
namespace: gpu-operator
data:
config.yaml: |
version: v1
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 4 # 4 Pods share each physical GPU
# Result: 8-GPU node serves 32 inference workloads
# Cost: $30/hr shared across 32 β $0.94/hr per workload
# vs. $30/hr for 1 dedicated workloadIdle Resource Detection
# Find Pods using < 10% of their CPU request
kubectl top pods -A --sort-by=cpu | awk '
NR>1 {
split($3, cpu, "m");
if (cpu[1] < 10) print $1, $2, $3, "β IDLE"
}
'
# Find Deployments with 0 traffic (scale to 0 candidates)
# Requires Prometheus
curl -s "http://prometheus:9090/api/v1/query?query=
sum(rate(http_requests_total[24h])) by (deployment) == 0"
# Find PVCs with no Pod attached
kubectl get pvc -A --no-headers | while read ns name _ _ _ _ sc _; do
if ! kubectl get pods -n $ns -o json | grep -q $name; then
echo "ORPHAN PVC: $ns/$name ($sc)"
fi
doneNamespace Cost Budgets
# ResourceQuota as cost guardrail
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-budget
namespace: team-alpha
spec:
hard:
requests.cpu: "16" # ~$200/month at on-demand
requests.memory: "64Gi"
requests.nvidia.com/gpu: "2" # ~$1500/month
persistentvolumeclaims: "20"
services.loadbalancers: "2" # ~$36/month eachScale-to-Zero for Dev/Staging
# KEDA ScaledObject: scale to 0 when no traffic
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-scaledobject
spec:
scaleTargetRef:
name: staging-api
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 5
cooldownPeriod: 300
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring:9090
query: sum(rate(http_requests_total{deployment="staging-api"}[5m]))
threshold: "1"Common Issues
Spot instance interruption causes downtime
- Cause: Only 1 replica on spot; no graceful handling
- Fix: Run 2+ replicas across multiple instance types; handle SIGTERM
VPA evictions during peak
- Cause: Auto mode evicts to apply new resource values
- Fix: Use Initial mode; or schedule VPA updates during maintenance windows
GPU idle but canβt reclaim
- Cause: Pod holds GPU allocation even when not computing
- Fix: Use Run:ai fractional GPU or time-slicing; set idle timeout
Best Practices
- Right-size first β biggest bang for least effort (VPA + Goldilocks)
- Spot for stateless β web servers, batch jobs, CI runners
- On-demand for stateful β databases, message queues, critical services
- GPU time-sharing for inference β most LLM inference is bursty
- Scale to zero in dev/staging β nobody works at 3 AM
- Tag everything β team labels enable cost attribution
- Review monthly β workload patterns change; savings drift
Key Takeaways
- Average K8s cluster wastes 40-60% of resources (over-provisioned)
- VPA + Goldilocks identifies right-size targets across all workloads
- Spot instances save 60-90% for fault-tolerant workloads
- GPU time-slicing/MIG: share expensive GPUs across multiple workloads
- Scale-to-zero (KEDA): eliminate cost for idle dev/staging environments
- ResourceQuotas prevent uncontrolled spending per team/namespace
- Combined strategy typically achieves 30-60% total cluster cost reduction

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
