πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Configuration beginner ⏱ 10 minutes K8s 1.28+

Kubernetes Startup Probes for Slow Containers

Configure Kubernetes startup probes for containers with long initialization. Separate startup from liveness checks, failureThreshold tuning.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Startup probes protect slow-starting containers from being killed by liveness probes. The liveness and readiness probes are disabled until the startup probe succeeds. Set `failureThreshold Γ— periodSeconds` to cover your worst-case startup time (e.g., `failureThreshold: 30, periodSeconds: 10` = 5-minute startup window).

The Problem

Java applications, ML model loading, or database containers can take 60-300+ seconds to start. If your liveness probe has a 30-second `initialDelaySeconds`, pods get killed during startup and enter CrashLoopBackOff. Increasing `initialDelaySeconds` delays failure detection after the app is running. Startup probes solve this by separating startup detection from runtime health checking.

flowchart TB
    subgraph WITHOUT["Without Startup Probe"]
        START1["Container starts<br/>(takes 120s)"] -->|"30s"| LIVE1["Liveness probe fires"]
        LIVE1 -->|"App not ready yet"| KILL["❌ Killed!<br/>CrashLoopBackOff"]
    end
    subgraph WITH["With Startup Probe"]
        START2["Container starts<br/>(takes 120s)"] -->|"Startup probe<br/>checks every 10s"| WAIT["Waiting..."]
        WAIT -->|"120s: App ready"| PASS["βœ… Startup probe passes"]
        PASS --> LIVE2["Liveness + readiness<br/>probes activate"]
    end

The Solution

Basic Startup Probe

apiVersion: v1
kind: Pod
metadata:
  name: java-app
spec:
  containers:
    - name: app
      image: spring-boot-app:v1.0
      ports:
        - containerPort: 8080

      # Startup probe: protects during slow startup
      startupProbe:
        httpGet:
          path: /healthz
          port: 8080
        failureThreshold: 30       # 30 failures Γ— 10s = 300s max startup
        periodSeconds: 10

      # Liveness probe: only runs AFTER startup probe succeeds
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        periodSeconds: 10
        failureThreshold: 3

      # Readiness probe: only runs AFTER startup probe succeeds
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        periodSeconds: 5
        failureThreshold: 3

How the Timing Works

Timeline:
t=0:      Container starts
t=0-300s: Startup probe checks every 10s (up to 30 failures allowed)
t=120s:   App responds to /healthz β†’ startup probe SUCCEEDS
t=120s+:  Liveness probe starts (every 10s)
t=120s+:  Readiness probe starts (every 5s)

If app never starts within 300s β†’ pod killed (startup probe failed)
If app crashes after startup  β†’ liveness probe catches it in 30s

Java Spring Boot Example

startupProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  failureThreshold: 30
  periodSeconds: 10           # 30 Γ— 10 = 300s for Spring Boot startup

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  periodSeconds: 10
  failureThreshold: 3         # Kill after 30s of failures

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  periodSeconds: 5
  failureThreshold: 3

ML Model Loading Example

# NIM or HuggingFace model that takes 2-10 minutes to load
startupProbe:
  httpGet:
    path: /v1/health/ready
    port: 8000
  failureThreshold: 60
  periodSeconds: 10           # 60 Γ— 10 = 600s (10 min) for model loading

livenessProbe:
  httpGet:
    path: /v1/health/live
    port: 8000
  periodSeconds: 15
  failureThreshold: 3

TCP Startup Probe

For services that accept connections before HTTP endpoints are ready:

startupProbe:
  tcpSocket:
    port: 5432                 # PostgreSQL port
  failureThreshold: 30
  periodSeconds: 5             # 30 Γ— 5 = 150s for database startup

Exec Startup Probe

For custom readiness checks:

startupProbe:
  exec:
    command:
      - sh
      - -c
      - "pg_isready -U postgres -d mydb"
  failureThreshold: 20
  periodSeconds: 5

Probe Comparison

ProbeWhen It RunsOn FailurePurpose
StartupUntil first successKill pod (after failureThreshold)Protect slow-starting containers
LivenessAfter startup succeedsKill pod β†’ restartDetect deadlocked/stuck apps
ReadinessAfter startup succeedsRemove from Service endpointsControl traffic routing

Calculate Your Startup Budget

Max startup time = failureThreshold Γ— periodSeconds

Examples:
  30 Γ— 10s = 300s (5 min) β€” Java apps, Spring Boot
  60 Γ— 10s = 600s (10 min) β€” ML model loading
  12 Γ— 5s  = 60s (1 min) β€” Standard web apps
  90 Γ— 10s = 900s (15 min) β€” Large NIM models on slow storage

Common Issues

IssueCauseFix
Pod killed during startupStartup budget too shortIncrease `failureThreshold` or `periodSeconds`
Startup probe never succeedsApp crash, wrong port/pathCheck pod logs, verify health endpoint
Liveness probe killing healthy podNo startup probe, `initialDelaySeconds` too shortAdd startup probe instead
readiness never runsStartup probe failingFix startup probe first
False positive startupProbe endpoint returns 200 before app is truly readyUse a dedicated startup endpoint that checks all dependencies

Best Practices

  • Always use startup probes for slow containers β€” cleaner than large `initialDelaySeconds`
  • Set generous startup budgets β€” 2Γ— your worst observed startup time
  • Use different endpoints β€” `/healthz` for liveness, `/ready` for readiness, can share for startup
  • Don’t check dependencies in liveness β€” liveness should only check if the process is alive
  • Monitor startup durations β€” track P99 startup time to right-size failureThreshold

Key Takeaways

  • Startup probes disable liveness/readiness probes until the container is ready
  • Max startup window = `failureThreshold Γ— periodSeconds`
  • Essential for Java, ML models, databases, and any container with >30s startup
  • Liveness and readiness probes only activate after startup probe succeeds
  • Replaces the anti-pattern of large `initialDelaySeconds` on liveness probes
  • Available since Kubernetes 1.20 (GA)
#startup-probe #health-checks #liveness #readiness #slow-start
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens