What is the quick answer for OOMKilled in Kubernetes: How to Debug and Fix?

Fix OOMKilled errors in Kubernetes pods. Learn why containers get OOMKilled (exit code 137), how to set memory limits, debug memory leaks, and prevent OOM.

OOMKilled in Kubernetes: How to Debug and Fix

💡 Quick Answer: OOMKilled (exit code 137) means your container exceeded its memory limit. Fix it by increasing resources.limits.memory in your pod spec. Check current usage with kubectl top pod <pod>. If the app has a memory leak, profile it with tools like pprof (Go), VisualVM (Java), or heapdump (Node.js).

OOMKilled occurs when a container exceeds its memory limit or the node runs out of memory. Learn to diagnose, fix, and prevent OOM issues in Kubernetes.

Identifying OOMKilled Pods

# Find OOMKilled pods
kubectl get pods --all-namespaces -o json | jq -r '
  .items[] |
  select(.status.containerStatuses[]?.lastState.terminated.reason == "OOMKilled") |
  [.metadata.namespace, .metadata.name] | @tsv'

# Check specific pod status
kubectl describe pod myapp-pod

# Look for this in output:
#   Last State:     Terminated
#     Reason:       OOMKilled
#     Exit Code:    137

Check Current Memory Usage

# Pod memory usage
kubectl top pod myapp-pod

# Container-level usage
kubectl top pod myapp-pod --containers

# Node memory usage
kubectl top nodes

# Detailed node memory
kubectl describe node <node-name> | grep -A5 "Allocated resources"

Analyze Memory with Debug Container

# Attach debug container (Kubernetes 1.25+)
kubectl debug myapp-pod -it --image=busybox --target=myapp

# Inside debug container, check memory
cat /proc/meminfo
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/memory.limit_in_bytes

# For cgroups v2
cat /sys/fs/cgroup/memory.current
cat /sys/fs/cgroup/memory.max

Check Container Memory Limits

# deployment.yaml - Proper memory configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
        - name: myapp
          image: myapp:v1
          resources:
            requests:
              memory: "256Mi"  # Scheduling guarantee
            limits:
              memory: "512Mi"  # Hard limit (OOMKilled if exceeded)

Memory Debugging for Java Applications

# java-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: java-app
spec:
  template:
    spec:
      containers:
        - name: app
          image: java-app:v1
          resources:
            requests:
              memory: "1Gi"
            limits:
              memory: "2Gi"
          env:
            # Let JVM respect container limits
            - name: JAVA_OPTS
              value: >-
                -XX:+UseContainerSupport
                -XX:MaxRAMPercentage=75.0
                -XX:InitialRAMPercentage=50.0
                -XX:+HeapDumpOnOutOfMemoryError
                -XX:HeapDumpPath=/tmp/heapdump.hprof
            # Or explicit sizing
            - name: JAVA_TOOL_OPTIONS
              value: "-Xmx1536m -Xms512m"
          volumeMounts:
            - name: heap-dumps
              mountPath: /tmp
      volumes:
        - name: heap-dumps
          emptyDir:
            sizeLimit: 2Gi

Memory Debugging for Node.js

# nodejs-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nodejs-app
spec:
  template:
    spec:
      containers:
        - name: app
          image: nodejs-app:v1
          resources:
            limits:
              memory: "512Mi"
          env:
            # Set Node.js heap limit (in MB)
            - name: NODE_OPTIONS
              value: "--max-old-space-size=384"
          command:
            - node
            - --expose-gc
            - --max-old-space-size=384
            - app.js

Memory Debugging for Python

# python-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-app
spec:
  template:
    spec:
      containers:
        - name: app
          image: python-app:v1
          resources:
            limits:
              memory: "512Mi"
          env:
            # Enable memory profiling
            - name: PYTHONTRACEMALLOC
              value: "1"
            # Reduce memory fragmentation
            - name: MALLOC_TRIM_THRESHOLD_
              value: "65536"

Create Memory Monitoring

# memory-monitoring.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: memory-alerts
spec:
  groups:
    - name: memory
      rules:
        # Alert before OOM
        - alert: ContainerMemoryHigh
          expr: |
            (container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.9
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Container {{ $labels.container }} using >90% memory"
            description: "{{ $labels.pod }} is at {{ $value | humanizePercentage }} of limit"
        
        # Track OOMKilled events
        - alert: ContainerOOMKilled
          expr: |
            kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
          labels:
            severity: critical
          annotations:
            summary: "Container {{ $labels.container }} was OOMKilled"

Vertical Pod Autoscaler for Right-Sizing

# vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Off"  # Just recommend, don't auto-update
  resourcePolicy:
    containerPolicies:
      - containerName: myapp
        minAllowed:
          memory: "128Mi"
        maxAllowed:
          memory: "4Gi"

Check recommendations:

kubectl describe vpa myapp-vpa

Investigate with Ephemeral Debug Container

# debug-pod.yaml - Full debugging capabilities
apiVersion: v1
kind: Pod
metadata:
  name: memory-debug
spec:
  containers:
    - name: debug
      image: alpine:latest
      command: ["sleep", "infinity"]
      securityContext:
        capabilities:
          add: ["SYS_PTRACE"]
      resources:
        limits:
          memory: "256Mi"

# Run memory profiling tools
kubectl exec -it memory-debug -- sh

# Install tools
apk add --no-cache procps htop

# Monitor memory
watch -n 1 'cat /proc/meminfo | grep -E "MemTotal|MemFree|Buffers|Cached"'

Node-Level OOM Investigation

# Check node events for OOM
kubectl get events --field-selector reason=OOMKilling -A

# Check kubelet logs
journalctl -u kubelet | grep -i oom

# Check kernel OOM killer logs
dmesg | grep -i "out of memory"
dmesg | grep -i "killed process"

# Check node memory pressure
kubectl describe node <node> | grep -A5 Conditions

Memory Limit Best Practices

# Recommended configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
        - name: myapp
          resources:
            requests:
              # Set to expected average usage
              memory: "256Mi"
            limits:
              # Set 1.5-2x requests for burst headroom
              # But not too high to prevent node-level OOM
              memory: "512Mi"

Quick Fixes

# Increase memory limit temporarily
kubectl set resources deployment myapp \
  --limits=memory=1Gi --requests=memory=512Mi

# Scale down to reduce node pressure
kubectl scale deployment myapp --replicas=2

# Restart pod to clear memory
kubectl rollout restart deployment myapp

Common Causes and Solutions

Cause	Solution
Memory leak	Profile app, fix leak, implement GC
Limit too low	Increase limit based on profiling
JVM heap misconfigured	Use `-XX:MaxRAMPercentage`
Large file processing	Stream instead of loading fully
Unbounded caches	Add size limits, use LRU eviction
Node memory exhaustion	Add nodes, use resource quotas

Summary

OOMKilled errors indicate memory limit violations. Debug by checking container and node memory usage, analyze application memory patterns, set appropriate limits with headroom, and use VPA for recommendations. For production, implement memory alerting to catch issues before they cause OOMKilled events.

📘 Go Further with Kubernetes Recipes

Love this recipe? There’s so much more! This is just one of 100+ hands-on recipes in our comprehensive Kubernetes Recipes book.

Inside the book, you’ll master:

✅ Production-ready deployment strategies
✅ Advanced networking and security patterns
✅ Observability, monitoring, and troubleshooting
✅ Real-world best practices from industry experts

“The practical, recipe-based approach made complex Kubernetes concepts finally click for me.”

👉 Get Your Copy Now — Start building production-grade Kubernetes skills today!

Frequently Asked Questions

What does OOMKilled mean in Kubernetes?

OOMKilled means the Linux kernel’s Out-of-Memory (OOM) killer terminated your container because it exceeded its memory limit (exit code 137). This happens when resources.limits.memory is set too low or the application has a memory leak.

How do I check if a pod was OOMKilled?

Run kubectl describe pod <pod-name> and look for Reason: OOMKilled in the container’s Last State section. You can also use kubectl get pods -o jsonpath='{.items[*].status.containerStatuses[*].lastState.terminated.reason}' to check all pods at once.

How do I fix OOMKilled in Kubernetes?

Check current memory usage with kubectl top pod
Increase resources.limits.memory in your pod spec
Set resources.requests.memory to the typical usage
Profile your application for memory leaks
Consider using Vertical Pod Autoscaler (VPA) to auto-tune limits