Kubernetes Graceful Shutdown and Pod Termination
Implement graceful shutdown for Kubernetes pods. Configure terminationGracePeriodSeconds, preStop hooks, SIGTERM handling, connection
π‘ Quick Answer: When Kubernetes terminates a pod, it sends SIGTERM to PID 1, waits up to
terminationGracePeriodSeconds(default 30s), then sends SIGKILL. For zero-downtime: 1) Handle SIGTERM in your app (stop accepting, drain connections), 2) Add apreStophook with a short sleep (5-10s) to allow endpoint removal propagation, 3) Set grace period longer than your drain time.
The Problem
- Pods receive in-flight requests during shutdown β 502/504 errors for clients
- SIGTERM not handled β app killed abruptly losing work in progress
- Endpoint removal races with pod termination β traffic sent to dying pods
- Rolling updates cause brief connection resets
- Long-running requests (WebSocket, streaming) cut off prematurely
The Solution
Pod Termination Sequence
Time β Event
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0s β Pod marked for termination
β βββ Pod removed from Service endpoints (async)
β βββ preStop hook executed (blocking)
β βββ SIGTERM sent to PID 1 (parallel with preStop)
β
5s β preStop hook completes (e.g., sleep 5)
β App receives SIGTERM (if preStop blocked it)
β App starts graceful shutdown:
β βββ Stop accepting new connections
β βββ Drain in-flight requests
β βββ Close database connections, flush buffers
β
25s β App finishes graceful shutdown, exits 0
β
30s β terminationGracePeriodSeconds expires
β SIGKILL sent (force kill if still running)
ββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key insight: Endpoint removal is ASYNC β traffic may still
arrive for a few seconds after SIGTERM. The preStop sleep
gives time for kube-proxy/ingress to update routing tables.Recommended Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # Total time before SIGKILL
containers:
- name: app
image: registry.example.com/api:v2
ports:
- containerPort: 8080
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
# Sleep gives time for endpoint removal to propagate
# Then SIGTERM triggers app's graceful shutdown
readinessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 5
# Failing readiness removes pod from endpoints fasterApplication SIGTERM Handling
# Python (Flask/FastAPI)
import signal
import sys
def graceful_shutdown(signum, frame):
print("SIGTERM received, shutting down gracefully...")
# Stop accepting new requests
server.should_exit = True
# Wait for in-flight requests (max 20s)
server.shutdown(timeout=20)
sys.exit(0)
signal.signal(signal.SIGTERM, graceful_shutdown)// Go
func main() {
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGINT)
defer stop()
server := &http.Server{Addr: ":8080"}
go server.ListenAndServe()
<-ctx.Done()
log.Println("SIGTERM received, draining connections...")
shutdownCtx, cancel := context.WithTimeout(context.Background(), 20*time.Second)
defer cancel()
server.Shutdown(shutdownCtx)
}// Node.js
process.on('SIGTERM', () => {
console.log('SIGTERM received, graceful shutdown...');
server.close(() => {
console.log('All connections drained');
process.exit(0);
});
// Force exit after 20s if connections don't drain
setTimeout(() => process.exit(1), 20000);
});Zero-Downtime Rolling Update
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # Never reduce below desired replicas
maxSurge: 1 # Add 1 extra pod during update
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
spec:
minAvailable: 3 # Always keep at least 3 pods running
selector:
matchLabels:
app: api-serverLong-Running Connections (WebSocket/gRPC Streaming)
spec:
terminationGracePeriodSeconds: 300 # 5 minutes for long connections
containers:
- name: websocket-server
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
# Signal app to stop accepting new connections
curl -X POST localhost:8080/admin/drain
# Wait for existing connections to close naturally
sleep 30Common Issues
502 errors during rolling update
- Cause: Traffic sent to pod after SIGTERM but before endpoint removal propagates
- Fix: Add
preStop: sleep 5-10to delay shutdown; setmaxUnavailable: 0
Pod killed before finishing graceful shutdown
- Cause:
terminationGracePeriodSecondstoo short for drain time - Fix: Increase grace period; grace period must be > preStop + drain time
SIGTERM not received by application
- Cause: PID 1 is shell script that doesnβt forward signals; or using
CMDwith shell form - Fix: Use
execform in Dockerfile (CMD ["./app"]notCMD ./app); or useexecin entrypoint
App exits immediately on SIGTERM without draining
- Cause: Application doesnβt handle SIGTERM (default behavior = exit)
- Fix: Add signal handler to drain connections before exiting
Best Practices
- Always add
preStop: sleep 5-10β allows endpoint removal to propagate - Handle SIGTERM in your application β drain connections, flush buffers
- Set
terminationGracePeriodSeconds> preStop + drain time β prevent SIGKILL - Use
maxUnavailable: 0β never remove capacity during updates - Fail readiness probe during shutdown β accelerates endpoint removal
- Use
execform in Dockerfile CMD β ensures PID 1 receives signals - PodDisruptionBudget β protects against voluntary disruptions
Key Takeaways
- Pod termination: mark terminating β remove endpoints (async) β preStop β SIGTERM β wait β SIGKILL
preStop: sleep 5-10bridges the gap between SIGTERM and endpoint removal- Application must handle SIGTERM: stop accepting, drain in-flight, exit cleanly
terminationGracePeriodSeconds(default 30s) is the hard deadline before SIGKILL- Zero-downtime: preStop hook + SIGTERM handler +
maxUnavailable: 0+ readiness probe - Shell form Dockerfile CMD (
CMD ./app) doesnβt forward signals β use exec form

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
