πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Configuration advanced ⏱ 15 minutes K8s 1.28+

CPU and Memory Limits Deep Dive

Deep dive into Kubernetes CPU and memory management. CFS bandwidth throttling, OOMKill scoring, cgroup v2 behavior, memory.high vs memory.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Use cgroup v2 memory.high (soft limit) for gradual throttling before OOMKill. Remove CPU limits on latency-sensitive services β€” CFS quota causes 5-10ms throttling bursts even at 30% average utilization. Set memory.max equal to memory limit for hard OOMKill protection.

The Problem

A service using only 30% average CPU gets throttled by CFS quota, causing p99 latency to spike from 5ms to 50ms. Understanding how Linux cgroups enforce Kubernetes resource limits is essential for production tuning.

The Solution

CFS Bandwidth Control (CPU Limits)

CPU limit: 500m = 50ms every 100ms period

If your service handles a burst of requests using 80ms of CPU in 100ms:
  β†’ CFS throttles for 30ms (80ms used - 50ms quota = 30ms throttled)
  β†’ Requests during throttled period wait 30ms
  β†’ p99 latency spikes by 30ms

With no CPU limit:
  β†’ Service uses 80ms freely
  β†’ No throttling, no latency spike
  β†’ Node-level resource management via requests (scheduling)

Memory: cgroup v2 Behavior

memory.max = Kubernetes memory limit (hard limit)
  β†’ Exceeding triggers OOMKill immediately

memory.high = Kubernetes memory request (soft limit, cgroup v2)
  β†’ Exceeding triggers kernel memory reclaim (slow allocation)
  β†’ Acts as early warning before OOMKill

OOMKill scoring:
  oom_score = (memory_usage / memory_limit) * 1000
  Guaranteed QoS: oom_score_adj = -997 (almost never killed)
  BestEffort QoS: oom_score_adj = 1000 (first to die)
# Latency-sensitive service: NO CPU limit
resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    memory: 512Mi
    # NO cpu limit β€” prevents throttling

# Batch/background service: CPU limit OK
resources:
  requests:
    cpu: "1"
    memory: 1Gi
  limits:
    cpu: "2"
    memory: 2Gi

Monitoring Throttling

# Check CFS throttling metrics
cat /sys/fs/cgroup/cpu/cpu.stat
# nr_periods 1000
# nr_throttled 150       ← 15% of periods were throttled!
# throttled_time 450000  ← 450ms total throttled time

# Prometheus query for throttling
rate(container_cpu_cfs_throttled_periods_total[5m])
/ rate(container_cpu_cfs_periods_total[5m]) * 100
# Returns: % of periods throttled
graph TD
    subgraph CPU - Compressible
        REQ_CPU[CPU Request: 250m] -->|Scheduling guarantee| SCHED[Node placement]
        LIM_CPU[CPU Limit: 500m] -->|CFS Quota| THROTTLE[⚠️ Throttled<br/>NOT killed]
        NO_LIM[No CPU Limit] -->|Best for latency| BURST[Can burst freely<br/>up to node capacity]
    end
    subgraph Memory - Incompressible
        REQ_MEM[Memory Request: 256Mi] -->|memory.high| RECLAIM[Kernel reclaim<br/>slow allocation]
        LIM_MEM[Memory Limit: 512Mi] -->|memory.max| OOMKILL[❌ OOMKilled<br/>Container restarted]
    end

Common Issues

Service throttled at 30% average CPU

CFS works in 100ms periods. A burst to 80% in one period gets throttled even if the next 9 periods are idle. Remove CPU limits for latency-sensitive services.

OOMKilled but kubectl top shows below limit

kubectl top shows RSS only. Actual memory includes page cache, kernel buffers, and tmpfs. Check cgroup stats: /sys/fs/cgroup/memory/memory.usage_in_bytes.

Best Practices

  • Remove CPU limits for latency-sensitive services β€” throttling causes unpredictable latency spikes
  • Always set memory limits β€” memory is incompressible, OOMKill is better than node-wide pressure
  • Monitor container_cpu_cfs_throttled_periods_total β€” >5% throttling rate needs investigation
  • cgroup v2 is preferred β€” memory.high provides graceful degradation before OOMKill
  • Guaranteed QoS for critical services β€” oom_score_adj: -997 means almost never OOMKilled

Key Takeaways

  • CFS bandwidth control throttles CPU in 100ms windows β€” causes latency spikes even at low average usage
  • Remove CPU limits for latency-sensitive services β€” use requests for scheduling guarantees
  • Memory limits trigger OOMKill (hard); memory requests trigger reclaim (soft, cgroup v2)
  • OOM scoring: BestEffort dies first (1000), Guaranteed dies last (-997)
  • Monitor CFS throttling: >5% throttled periods indicates CPU limits are too tight
  • cgroup v2 memory.high (soft limit) provides gradual degradation before hard OOMKill
#cpu #memory #cgroups #oomkill #throttling
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens