πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Deployments intermediate ⏱ 10 minutes K8s 1.28+

Kubernetes Job Completions and Parallelism

Configure Kubernetes Job completions, parallelism, backoffLimit, and indexed jobs. Parallel batch processing, work queue patterns, and job failure handling.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: `completions` = total successful pod runs needed. `parallelism` = maximum pods running simultaneously. A Job with `completions: 10, parallelism: 3` runs 3 pods at a time until 10 complete successfully. Indexed Jobs (completionMode: Indexed) give each pod a unique `JOB_COMPLETION_INDEX` for partitioned work.

The Problem

You need to process a batch of work β€” 100 images to resize, 50 database shards to migrate, or 1000 reports to generate. Running one pod at a time is too slow. Running all at once overwhelms your cluster. Jobs let you control exactly how many run in parallel and how many must complete.

flowchart TB
    JOB["Job<br/>completions: 10<br/>parallelism: 3"] --> P1["Pod 0 βœ…"]
    JOB --> P2["Pod 1 βœ…"]
    JOB --> P3["Pod 2 πŸ”„ Running"]
    JOB --> P4["Pod 3 πŸ”„ Running"]
    JOB --> P5["Pod 4 πŸ”„ Running"]
    JOB -.->|"Waiting"| P6["Pod 5-9"]
    P1 & P2 -.->|"Complete β†’ start next"| P6

The Solution

Basic Parallel Job

apiVersion: batch/v1
kind: Job
metadata:
  name: image-resize
spec:
  completions: 10          # Need 10 successful completions
  parallelism: 3           # Run 3 pods at a time
  backoffLimit: 5          # Max 5 retries for failures
  template:
    spec:
      containers:
        - name: resize
          image: imagetools:v1
          command: ["./resize.sh"]
      restartPolicy: Never

Job Patterns

PatterncompletionsparallelismUse Case
Single pod1 (default)1 (default)One-off task
Fixed countNMProcess N items, M at a time
Work queueunsetMProcess until queue empty
IndexedNMEach pod gets unique index

Indexed Jobs (Partitioned Work)

Each pod gets a unique index via `JOB_COMPLETION_INDEX`:

apiVersion: batch/v1
kind: Job
metadata:
  name: shard-migration
spec:
  completions: 50
  parallelism: 10
  completionMode: Indexed     # ← Each pod gets unique index
  template:
    spec:
      containers:
        - name: migrate
          image: db-tools:v1
          command: ["./migrate-shard.sh"]
          env:
            - name: SHARD_ID
              value: "$(JOB_COMPLETION_INDEX)"  # 0, 1, 2, ... 49
      restartPolicy: Never
# Inside pod with index 7:
echo $JOB_COMPLETION_INDEX
# 7

# Use index to partition work
# Shard 7 of 50 β†’ process items 7*1000 to 7999

Work Queue Pattern

Pods pull from a queue and exit when empty β€” no fixed completion count:

apiVersion: batch/v1
kind: Job
metadata:
  name: queue-worker
spec:
  # completions: not set β†’ work queue mode
  parallelism: 5
  template:
    spec:
      containers:
        - name: worker
          image: worker:v1
          command: ["./process-queue.sh"]
          env:
            - name: REDIS_URL
              value: "redis://queue-svc:6379"
      restartPolicy: Never
# Job completes when ALL pods exit with 0

Failure Handling

apiVersion: batch/v1
kind: Job
metadata:
  name: reliable-job
spec:
  completions: 10
  parallelism: 3
  backoffLimit: 6                    # Total failures before job fails
  activeDeadlineSeconds: 3600        # Kill entire job after 1 hour
  ttlSecondsAfterFinished: 300       # Clean up 5 min after completion
  template:
    spec:
      containers:
        - name: task
          image: task:v1
      restartPolicy: Never           # Never = create new pod on failure
                                     # OnFailure = restart in same pod

Pod Failure Policy (K8s 1.26+)

Handle specific exit codes differently:

spec:
  podFailurePolicy:
    rules:
      - action: FailJob              # Fail entire job
        onExitCodes:
          containerName: task
          operator: In
          values: [42]               # Exit code 42 = unrecoverable
      - action: Ignore               # Don't count as failure
        onPodConditions:
          - type: DisruptionTarget   # Pod was preempted β€” retry
      - action: Count                # Count toward backoffLimit
        onExitCodes:
          containerName: task
          operator: NotIn
          values: [0]                # Any other non-zero exit

Monitor Jobs

# Job status
kubectl get jobs
# NAME           COMPLETIONS   DURATION   AGE
# image-resize   7/10          3m         3m

# Watch progress
kubectl get jobs -w

# Pod status per job
kubectl get pods -l job-name=image-resize
# NAME                 READY   STATUS      RESTARTS   AGE
# image-resize-abc12   0/1     Completed   0          3m
# image-resize-def34   0/1     Completed   0          2m
# image-resize-ghi56   1/1     Running     0          30s

# Check failed pods
kubectl get pods -l job-name=image-resize --field-selector=status.phase=Failed

Common Issues

IssueCauseFix
Job stuck at N-1 completionsLast pod keeps failingCheck pod logs, increase `backoffLimit`
All pods run at once`parallelism` not set (defaults to `completions`)Explicitly set `parallelism`
Job pods not cleaned upNo `ttlSecondsAfterFinished`Add TTL or manually delete
Index out of rangePod logic doesn’t handle `JOB_COMPLETION_INDEX` correctlyValidate index bounds in code
Job takes foreverParallelism too low for completion countIncrease `parallelism`
Zombie jobs`activeDeadlineSeconds` not setAdd deadline to prevent infinite running

Best Practices

  • Use Indexed Jobs for partitioned data β€” cleaner than work queues for fixed datasets
  • Set `activeDeadlineSeconds` β€” prevents jobs from running indefinitely
  • Set `ttlSecondsAfterFinished` β€” automatic cleanup of completed jobs
  • Use `restartPolicy: Never` over `OnFailure` β€” easier to debug (pod logs preserved)
  • Add `podFailurePolicy` to distinguish retryable from fatal errors
  • Monitor with `kubectl get jobs -w` β€” watch completion progress

Key Takeaways

  • `completions` = how many pods must succeed; `parallelism` = how many run concurrently
  • Indexed Jobs give each pod a unique `JOB_COMPLETION_INDEX` (0 to N-1)
  • Work queue pattern (no completions set) = pods process until queue is empty
  • `backoffLimit` controls total retries; `activeDeadlineSeconds` caps total runtime
  • `podFailurePolicy` (K8s 1.26+) enables exit-code-based retry decisions
  • Always set TTL cleanup to prevent orphaned job pods consuming resources
#jobs #batch-processing #parallelism #completions #indexed-jobs
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens