Job Completion Patterns Kubernetes
Configure Kubernetes Jobs with indexed completions, work queues, parallel processing, backoff limits, and TTL cleanup for batch workloads.
π‘ Quick Answer: Use
completions: Nwithparallelism: Mfor fixed-count parallel jobs,completionMode: Indexedfor worker-index-aware processing, andttlSecondsAfterFinished: 3600for automatic cleanup.
The Problem
Batch workloads β data processing, ML training, ETL pipelines β need different execution patterns than long-running services. You need parallel processing, indexed workers, automatic retry, and cleanup of completed jobs.
The Solution
Fixed Completion Count
apiVersion: batch/v1
kind: Job
metadata:
name: data-processor
spec:
completions: 10
parallelism: 3
backoffLimit: 4
ttlSecondsAfterFinished: 3600
template:
spec:
restartPolicy: Never
containers:
- name: processor
image: registry.example.com/processor:1.0
command: ["python", "process.py"]10 completions, 3 running at a time β each pod processes one unit of work.
Indexed Completions
apiVersion: batch/v1
kind: Job
metadata:
name: indexed-job
spec:
completions: 5
parallelism: 5
completionMode: Indexed
template:
spec:
restartPolicy: Never
containers:
- name: worker
image: registry.example.com/worker:1.0
env:
- name: JOB_COMPLETION_INDEX
value: "placeholder"Each pod gets JOB_COMPLETION_INDEX (0, 1, 2, 3, 4) β use it to partition work (e.g., process shard N of N).
Work Queue Pattern
apiVersion: batch/v1
kind: Job
metadata:
name: queue-worker
spec:
parallelism: 5
# No completions set β runs until all pods succeed (work queue empty)
template:
spec:
restartPolicy: Never
containers:
- name: worker
image: registry.example.com/queue-worker:1.0
env:
- name: QUEUE_URL
value: "redis://redis:6379/0"graph TD
subgraph Fixed Completion
J1[completions: 10<br/>parallelism: 3] --> P1[Pod 1] & P2[Pod 2] & P3[Pod 3]
P1 -->|Complete| P4[Pod 4]
P2 -->|Complete| P5[Pod 5]
end
subgraph Indexed
J2[completionMode: Indexed] --> I0[Index 0] & I1[Index 1] & I2[Index 2]
end
subgraph Work Queue
J3[parallelism: 5<br/>no completions] --> W1[Worker] & W2[Worker] & W3[Worker]
Q[Redis Queue] --> W1 & W2 & W3
endCommon Issues
Job pods keep restarting β backoffLimit exceeded
Check pod logs: kubectl logs job/data-processor. The backoffLimit (default 6) controls total retries. Increase for flaky workloads.
Completed Jobs cluttering the namespace
Set ttlSecondsAfterFinished: 3600 β Kubernetes auto-deletes the Job and its pods 1 hour after completion.
Best Practices
- Set
ttlSecondsAfterFinishedβ prevent Job accumulation restartPolicy: Neverfor Jobs β let Kubernetes create new pods on failure (not restart in-place)backoffLimit: 4-6β enough retries for transient failures, not infinite loops- Indexed mode for sharded data β each worker processes a specific partition
- Work queue for dynamic workloads β workers pull items until queue is empty
Key Takeaways
completions+parallelismcontrols how many pods run and how many must succeedcompletionMode: Indexedgives each pod a unique index for data partitioning- Work queue pattern: no completions set, workers pull from a queue until done
ttlSecondsAfterFinishedauto-cleans completed Jobs β essential for CronJob hygienebackoffLimitprevents infinite retries β set based on expected failure rate

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
