Kubernetes Right-Sizing and Cost Optimization
Optimize Kubernetes resource allocation with right-sizing, VPA recommendations, bin packing, request-to-limit ratios, and cost reduction best practices.
π‘ Quick Answer: Kubernetes resource optimization means right-sizing requests and limits to match actual usage. Start by deploying VPA in recommendation mode (
kubectl get vpa -o yaml), then adjust requests to P95 usage + 20% buffer. Set CPU limits to 2-5x requests (CPU is compressible) and memory limits to 1.2-1.5x requests (memory is not). Use Goldilocks or Kubecost for continuous recommendations.
The Problem
Most Kubernetes clusters waste 60-80% of provisioned resources:
- Developers request 1 CPU / 1Gi but pods use 50m / 128Mi
- Over-provisioning wastes money on cloud ($$$)
- Under-provisioning causes OOMKill and CPU throttling
- No visibility into actual vs requested resource usage
- Teams donβt update requests after initial deployment
The Solution
Step 1: Measure Actual Usage
# Check current requests vs actual usage
kubectl top pods -n production --sort-by=cpu
kubectl top pods -n production --sort-by=memory
# Compare requests vs usage for all pods in namespace
kubectl get pods -n production -o json | jq -r '
.items[] |
.metadata.name as $name |
.spec.containers[] |
"\($name) | req: \(.resources.requests.cpu // "none") cpu, \(.resources.requests.memory // "none") mem"'
# Prometheus query: CPU request vs actual (ratio)
# container_cpu_usage_seconds_total / kube_pod_container_resource_requests{resource="cpu"}Step 2: Deploy VPA for Recommendations
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommendation only β won't change pods
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 4
memory: 8Gi# Apply and wait for recommendations (needs 24h+ of data)
kubectl apply -f vpa.yaml
# Check recommendations
kubectl get vpa my-app-vpa -o yaml | grep -A20 recommendation
# recommendation:
# containerRecommendations:
# - containerName: my-app
# lowerBound:
# cpu: 25m
# memory: 131072k
# target: β Use this for requests
# cpu: 100m
# memory: 256Mi
# upperBound: β Use this for limits
# cpu: 500m
# memory: 512MiStep 3: Right-Size Resources
# Before (over-provisioned):
resources:
requests:
cpu: "1"
memory: 1Gi
limits:
cpu: "2"
memory: 2Gi
# After (right-sized based on VPA target + buffer):
resources:
requests:
cpu: 120m # VPA target 100m + 20% buffer
memory: 300Mi # VPA target 256Mi + ~20% buffer
limits:
cpu: 500m # 4x requests (CPU is compressible)
memory: 400Mi # 1.3x requests (memory is NOT compressible)Right-Sizing Rules
| Resource | Requests | Limits | Why |
|---|---|---|---|
| CPU | P95 usage + 20% | 2-5x requests (or no limit) | CPU is compressible β throttled, not killed |
| Memory | P95 usage + 20% | 1.2-1.5x requests | Memory is NOT compressible β OOMKill if exceeded |
| Ephemeral storage | Based on log/tmp volume | 2x requests | Evicted if exceeded |
Step 4: Automated Optimization Tools
# Goldilocks β VPA recommendations for every deployment
kubectl create namespace goldilocks
helm install goldilocks fairwinds-stable/goldilocks -n goldilocks
# Label namespaces to enable
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
# Access dashboard
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80
# Kubecost β cost visibility
helm install kubecost kubecost/cost-analyzer -n kubecost \
--set prometheus.server.global.external_labels.cluster_id=my-cluster
# kubectl-view-allocations plugin
kubectl krew install view-allocations
kubectl view-allocations -n productionStep 5: Cluster-Level Optimization
# LimitRange β prevent absurd requests
apiVersion: v1
kind: LimitRange
metadata:
name: resource-constraints
namespace: production
spec:
limits:
- type: Container
default:
cpu: 200m
memory: 256Mi
defaultRequest:
cpu: 50m
memory: 128Mi
max:
cpu: "4"
memory: 8Gi
min:
cpu: 10m
memory: 32Mi
---
# ResourceQuota β cap namespace total
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80GiStep 6: Node Bin Packing
# Scheduler profile for bin packing (pack pods tightly onto fewer nodes)
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
plugins:
score:
enabled:
- name: NodeResourcesFit
weight: 1
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated # Bin packing (vs LeastAllocated = spreading)
resources:
- name: cpu
weight: 1
- name: memory
weight: 1Common Issues
VPA and HPA conflict on CPU
VPA adjusts CPU requests, HPA scales replicas on CPU utilization. The metric basis shifts. Use VPA for memory only, HPA for CPU scaling.
Pods OOMKilled after right-sizing memory
Buffer too small. Memory usage has spikes β use P99 (not P95) for bursty workloads, and set limit to 1.5x requests.
CPU throttling after reducing limits
Check container_cpu_cfs_throttled_periods_total. If >5% throttled, increase CPU limit or remove it entirely (burstable QoS is often fine).
Best Practices
- Start with VPA in
Offmode β get recommendations before auto-applying - Right-size requests first, limits second β requests affect scheduling
- Donβt set CPU limits on non-batch workloads β CPU throttling hurts latency
- Always set memory limits β memory is not compressible, leaks cause node pressure
- Review monthly β usage patterns change with traffic and features
- Use namespace quotas β prevent any team from over-provisioning
Key Takeaways
- Most clusters waste 60-80% of resources β right-sizing saves real money
- VPA recommendations give data-driven request/limit targets
- CPU requests = P95 + 20%, memory requests = P95 + 20%, memory limits = 1.2-1.5x requests
- Use Goldilocks or Kubecost for continuous optimization visibility
- Bin packing (MostAllocated scoring) reduces node count and cost

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
