Fix OOMKilled in Kubernetes Pods
Debug and fix OOMKilled errors in Kubernetes. Find memory leaks, set correct limits, use VPA for right-sizing, and prevent container OOM kills.
π‘ Quick Answer: Debug and fix OOMKilled errors in Kubernetes. Find memory leaks, set correct limits, use VPA for right-sizing, and prevent container OOM kills.
The Problem
This is one of the most searched Kubernetes topics. A comprehensive, well-structured guide helps engineers of all levels quickly find actionable solutions.
The Solution
Detailed implementation with production-ready examples below.
Diagnose OOMKilled
# Check pod status
kubectl get pod <name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
# {"exitCode":137,"reason":"OOMKilled",...}
# Check events
kubectl describe pod <name> | grep -i oom
# Check current memory usage vs limits
kubectl top pod <name>
kubectl get pod <name> -o jsonpath='{.spec.containers[0].resources.limits.memory}'
# Check node-level OOM events
kubectl get events --field-selector reason=OOMKilling -AFix: Increase Memory Limits
# Before (too low)
resources:
limits:
memory: 128Mi
# After (right-sized)
resources:
requests:
memory: 256Mi # Based on actual usage
limits:
memory: 512Mi # 2x request for burst headroomFix: Find Memory Leaks
# Monitor memory over time
kubectl top pod <name> --containers
# Run every 30s and watch if memory grows continuously
# Profile inside the container
kubectl exec -it <name> -- /bin/sh
# Java: jmap -heap <pid>
# Python: tracemalloc
# Node.js: --inspect flag + Chrome DevTools
# Go: pprof endpointFix: Use VPA for Right-Sizing
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Just get recommendations firstkubectl describe vpa my-app-vpa
# Target: memory 340Mi β set your request to thisgraph TD
A[OOMKilled exit code 137] --> B{Is it a memory leak?}
B -->|Memory grows over time| C[Fix the leak in code]
B -->|Spikes under load| D[Increase memory limit]
B -->|Constant high usage| E[Right-size with VPA]
D --> F[Set limit = 2x average usage]
E --> G[VPA recommends optimal values]Frequently Asked Questions
Why exit code 137?
137 = 128 + 9 (SIGKILL). The Linux OOM killer sends SIGKILL (signal 9) when a process exceeds its cgroup memory limit.
OOMKilled vs node-level OOM?
Container OOMKilled = container exceeded its own memory limit. Node OOM = node ran out of memory entirely and the kernel OOM killer chose a victim. Set memory requests to guarantee capacity.
Common Issues
Check kubectl describe and kubectl get events first β most issues have clear error messages pointing to the root cause.
Best Practices
- Follow least privilege β only grant the access thatβs needed
- Test in staging before applying to production
- Monitor and alert on key metrics
- Document your runbooks for the team
Key Takeaways
- Essential knowledge for Kubernetes operations
- Start simple and evolve your approach
- Automation reduces human error
- Share knowledge with your team

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses β