πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Autoscaling intermediate ⏱ 20 minutes K8s 1.28+

Kubernetes Right-Sizing and Cost Optimization

Optimize Kubernetes resource allocation with right-sizing, VPA recommendations, bin packing, request-to-limit ratios, and cost reduction best practices.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Kubernetes resource optimization means right-sizing requests and limits to match actual usage. Start by deploying VPA in recommendation mode (kubectl get vpa -o yaml), then adjust requests to P95 usage + 20% buffer. Set CPU limits to 2-5x requests (CPU is compressible) and memory limits to 1.2-1.5x requests (memory is not). Use Goldilocks or Kubecost for continuous recommendations.

The Problem

Most Kubernetes clusters waste 60-80% of provisioned resources:

  • Developers request 1 CPU / 1Gi but pods use 50m / 128Mi
  • Over-provisioning wastes money on cloud ($$$)
  • Under-provisioning causes OOMKill and CPU throttling
  • No visibility into actual vs requested resource usage
  • Teams don’t update requests after initial deployment

The Solution

Step 1: Measure Actual Usage

# Check current requests vs actual usage
kubectl top pods -n production --sort-by=cpu
kubectl top pods -n production --sort-by=memory

# Compare requests vs usage for all pods in namespace
kubectl get pods -n production -o json | jq -r '
  .items[] | 
  .metadata.name as $name |
  .spec.containers[] |
  "\($name) | req: \(.resources.requests.cpu // "none") cpu, \(.resources.requests.memory // "none") mem"'

# Prometheus query: CPU request vs actual (ratio)
# container_cpu_usage_seconds_total / kube_pod_container_resource_requests{resource="cpu"}

Step 2: Deploy VPA for Recommendations

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Recommendation only β€” won't change pods
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi
# Apply and wait for recommendations (needs 24h+ of data)
kubectl apply -f vpa.yaml

# Check recommendations
kubectl get vpa my-app-vpa -o yaml | grep -A20 recommendation
# recommendation:
#   containerRecommendations:
#   - containerName: my-app
#     lowerBound:
#       cpu: 25m
#       memory: 131072k
#     target:        ← Use this for requests
#       cpu: 100m
#       memory: 256Mi
#     upperBound:    ← Use this for limits
#       cpu: 500m
#       memory: 512Mi

Step 3: Right-Size Resources

# Before (over-provisioned):
resources:
  requests:
    cpu: "1"
    memory: 1Gi
  limits:
    cpu: "2"
    memory: 2Gi

# After (right-sized based on VPA target + buffer):
resources:
  requests:
    cpu: 120m        # VPA target 100m + 20% buffer
    memory: 300Mi    # VPA target 256Mi + ~20% buffer
  limits:
    cpu: 500m        # 4x requests (CPU is compressible)
    memory: 400Mi    # 1.3x requests (memory is NOT compressible)

Right-Sizing Rules

ResourceRequestsLimitsWhy
CPUP95 usage + 20%2-5x requests (or no limit)CPU is compressible β€” throttled, not killed
MemoryP95 usage + 20%1.2-1.5x requestsMemory is NOT compressible β€” OOMKill if exceeded
Ephemeral storageBased on log/tmp volume2x requestsEvicted if exceeded

Step 4: Automated Optimization Tools

# Goldilocks β€” VPA recommendations for every deployment
kubectl create namespace goldilocks
helm install goldilocks fairwinds-stable/goldilocks -n goldilocks
# Label namespaces to enable
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
# Access dashboard
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80

# Kubecost β€” cost visibility
helm install kubecost kubecost/cost-analyzer -n kubecost \
  --set prometheus.server.global.external_labels.cluster_id=my-cluster

# kubectl-view-allocations plugin
kubectl krew install view-allocations
kubectl view-allocations -n production

Step 5: Cluster-Level Optimization

# LimitRange β€” prevent absurd requests
apiVersion: v1
kind: LimitRange
metadata:
  name: resource-constraints
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: 200m
      memory: 256Mi
    defaultRequest:
      cpu: 50m
      memory: 128Mi
    max:
      cpu: "4"
      memory: 8Gi
    min:
      cpu: 10m
      memory: 32Mi

---
# ResourceQuota β€” cap namespace total
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi

Step 6: Node Bin Packing

# Scheduler profile for bin packing (pack pods tightly onto fewer nodes)
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
  plugins:
    score:
      enabled:
      - name: NodeResourcesFit
        weight: 1
  pluginConfig:
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: MostAllocated    # Bin packing (vs LeastAllocated = spreading)
        resources:
        - name: cpu
          weight: 1
        - name: memory
          weight: 1

Common Issues

VPA and HPA conflict on CPU

VPA adjusts CPU requests, HPA scales replicas on CPU utilization. The metric basis shifts. Use VPA for memory only, HPA for CPU scaling.

Pods OOMKilled after right-sizing memory

Buffer too small. Memory usage has spikes β€” use P99 (not P95) for bursty workloads, and set limit to 1.5x requests.

CPU throttling after reducing limits

Check container_cpu_cfs_throttled_periods_total. If >5% throttled, increase CPU limit or remove it entirely (burstable QoS is often fine).

Best Practices

  • Start with VPA in Off mode β€” get recommendations before auto-applying
  • Right-size requests first, limits second β€” requests affect scheduling
  • Don’t set CPU limits on non-batch workloads β€” CPU throttling hurts latency
  • Always set memory limits β€” memory is not compressible, leaks cause node pressure
  • Review monthly β€” usage patterns change with traffic and features
  • Use namespace quotas β€” prevent any team from over-provisioning

Key Takeaways

  • Most clusters waste 60-80% of resources β€” right-sizing saves real money
  • VPA recommendations give data-driven request/limit targets
  • CPU requests = P95 + 20%, memory requests = P95 + 20%, memory limits = 1.2-1.5x requests
  • Use Goldilocks or Kubecost for continuous optimization visibility
  • Bin packing (MostAllocated scoring) reduces node count and cost
#resources #optimization #cost #vpa #right-sizing #autoscaling
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens