πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Configuration intermediate ⏱ 10 minutes K8s 1.21+

Kubernetes Liveness Probe Best Practices

Configure Kubernetes liveness probes correctly. Best practices for httpGet, exec, and tcpSocket probes. Avoid database checks, thundering herd.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Liveness probes should check ONLY if the process is alive and responsive β€” never external dependencies (databases, APIs). Use /healthz with minimal logic. If the liveness probe fails, kubelet kills the container β€” cascading failures happen when probes check shared dependencies.

The Problem

Bad liveness probes cause:

  • Thundering herd: All pods restart when a shared database hiccups
  • Cascading failures: Pods kill themselves when they should just stop serving traffic
  • CrashLoopBackOff: Aggressive probe settings kill slow-starting apps
  • False restarts: External dependency flakiness triggers unnecessary kills

The Solution

Correct: Simple Process Health Check

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: app
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
            timeoutSeconds: 5
            failureThreshold: 3
            # Total time before kill: 15 + (20 Γ— 3) = 75 seconds
// /healthz endpoint β€” checks ONLY process health
func healthz(w http.ResponseWriter, r *http.Request) {
    // βœ“ Check if the process can handle requests
    // βœ“ Check if critical goroutines are alive
    // βœ“ Check memory isn't corrupted
    
    // βœ— DON'T check database connectivity
    // βœ— DON'T check downstream services
    // βœ— DON'T check disk space
    // βœ— DON'T check cache connectivity
    
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("ok"))
}

❌ Anti-Pattern: Checking Database

# DON'T DO THIS β€” if database is slow, ALL pods restart simultaneously
livenessProbe:
  httpGet:
    path: /health  # Endpoint that queries database
    port: 8080
  timeoutSeconds: 3
  failureThreshold: 2
  # Database latency spike β†’ probe timeout β†’ all pods killed β†’ thundering herd on recovery

Liveness vs Readiness vs Startup

spec:
  containers:
    - name: app
      # Startup: "Are you finished starting up?"
      # Only checked during startup, prevents premature liveness kills
      startupProbe:
        httpGet:
          path: /healthz
          port: 8080
        periodSeconds: 5
        failureThreshold: 30  # 30 Γ— 5s = 150s to start
        # Liveness/readiness don't run until startup succeeds

      # Liveness: "Are you deadlocked or crashed?"
      # Failure β†’ container RESTART (kill + recreate)
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        periodSeconds: 20
        failureThreshold: 3

      # Readiness: "Can you handle traffic right now?"
      # Failure β†’ removed from Service endpoints (no traffic, no restart)
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        periodSeconds: 5
        failureThreshold: 2

Decision Matrix

graph TD
    A[Health Check Needed] --> B{What to check?}
    
    B -->|Process alive?| C[Liveness Probe]
    C --> D[Simple /healthz - no dependencies]
    
    B -->|Can serve traffic?| E[Readiness Probe]
    E --> F[/ready - check DB, cache, connections]
    
    B -->|Startup complete?| G[Startup Probe]
    G --> H[Same as liveness, generous timeout]
    
    style D fill:#e8f5e9
    style F fill:#e3f2fd
    style H fill:#fff3e0

Probe Types

# HTTP GET (most common for web services)
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
    httpHeaders:
      - name: X-Health-Check
        value: "liveness"

# TCP Socket (for non-HTTP services like databases, Redis)
livenessProbe:
  tcpSocket:
    port: 5432

# Exec (for custom checks, sidecar processes)
livenessProbe:
  exec:
    command:
      - /bin/sh
      - -c
      - "pidof myprocess"

# gRPC (native in K8s 1.27+)
livenessProbe:
  grpc:
    port: 50051
    service: ""  # Empty = overall health
WorkloadinitialDelayperiodtimeoutfailureThreshold
Fast web app5s10s3s3
Java/Spring30s20s5s3
ML inference60s30s10s3
Database30s20s5s5
Worker/queue10s30s5s3

Use startupProbe for Slow-Starting Apps

# Instead of increasing initialDelaySeconds (which is a guess):
startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 5
  failureThreshold: 60  # Up to 5 minutes to start
  # After startup succeeds, liveness probe takes over with normal settings

Common Issues

IssueCauseFix
Constant restarts on startupNo startupProbe, initialDelay too shortAdd startupProbe with generous timeout
All pods restart at onceLiveness checks databaseRemove dependency checks from liveness
Probe timeout but app is finetimeoutSeconds too low for GC pausesIncrease to 5-10s
CrashLoopBackOfffailureThreshold=1, tight timingUse failureThreshold β‰₯ 3
Unnecessary restarts under loadProbe timeout during CPU pressureIncrease timeout, lower CPU limits

Best Practices

  1. Liveness = process health ONLY β€” never check external dependencies
  2. Readiness = dependency health β€” check DB, cache, downstream APIs here
  3. Use startupProbe for slow starts β€” better than guessing initialDelaySeconds
  4. Set failureThreshold β‰₯ 3 β€” tolerates transient issues
  5. timeoutSeconds β‰₯ 5s β€” GC pauses can exceed 1-2s
  6. periodSeconds β‰₯ 10s for liveness β€” you don’t need to check every second

Key Takeaways

  • Liveness probe failure = container kill β†’ use sparingly and check only process health
  • Readiness probe failure = remove from traffic β†’ safe to check dependencies
  • Never check databases in liveness probes β€” causes thundering herd cascades
  • startupProbe decouples slow startup from normal operation monitoring
  • Conservative settings (period=20s, failure=3, timeout=5s) prevent false positives
#liveness #probes #health-check #best-practices #reliability
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens