Kueue for Batch Jobs and GPU Queues
Use Kueue to manage batch job queues on Kubernetes. GPU quota, fair sharing, priority queues, ML training workloads, and multi-tenant cluster scheduling.
π‘ Quick Answer: Kueue is a Kubernetes-native job queueing system that manages when and where batch Jobs run based on available quotas. Install Kueue, define
ResourceFlavors(CPU/GPU types),ClusterQueues(resource pools), andLocalQueues(per-namespace). Jobs wait in queue until quota is available β no more overcommitting GPUs or starving lower-priority teams.
The Problem
Kubernetes Jobs run immediately if resources exist β thereβs no native concept of βwait in line.β When multiple teams submit GPU training jobs simultaneously, you get either resource contention (OOMKilled, scheduling failures) or massive overprovisioning. Kueue adds enterprise batch scheduling: fair sharing between teams, GPU quota management, priority-based preemption, and multi-cluster job distribution.
flowchart TB
subgraph SUBMIT["Job Submission"]
TEAM_A["Team A<br/>ML Training<br/>(8Γ A100)"]
TEAM_B["Team B<br/>Data Pipeline<br/>(4Γ A100)"]
TEAM_C["Team C<br/>Fine-tuning<br/>(2Γ A100)"]
end
subgraph KUEUE["Kueue Controller"]
LQ_A["LocalQueue<br/>team-a"]
LQ_B["LocalQueue<br/>team-b"]
LQ_C["LocalQueue<br/>team-c"]
CQ["ClusterQueue<br/>gpu-pool<br/>(16Γ A100 total)"]
end
subgraph CLUSTER["Cluster Resources"]
GPU["GPU Nodes<br/>16Γ A100"]
end
TEAM_A --> LQ_A --> CQ
TEAM_B --> LQ_B --> CQ
TEAM_C --> LQ_C --> CQ
CQ -->|"Admit when<br/>quota available"| GPUThe Solution
Install Kueue
kubectl apply --server-side -f \
https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.0/manifests.yaml
# Verify
kubectl get pods -n kueue-systemDefine Resource Flavors
# ResourceFlavors describe types of resources (GPU models, CPU tiers)
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: nvidia-a100-80gb
spec:
nodeLabels:
nvidia.com/gpu.product: "NVIDIA-A100-SXM4-80GB"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: nvidia-h100
spec:
nodeLabels:
nvidia.com/gpu.product: "NVIDIA-H100-80GB-HBM3"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: default-cpu
spec: {} # No special labels β any CPU nodeCreate Cluster Queue
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: gpu-pool
spec:
namespaceSelector: {} # All namespaces can use this
preemption:
reclaimWithinCohort: Any
withinClusterQueue: LowerPriority
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: nvidia-a100-80gb
resources:
- name: "cpu"
nominalQuota: 128
- name: "memory"
nominalQuota: 1Ti
- name: "nvidia.com/gpu"
nominalQuota: 16 # 16 A100 GPUs total
borrowingLimit: 0 # No borrowing from other queues
- name: nvidia-h100
resources:
- name: "cpu"
nominalQuota: 64
- name: "memory"
nominalQuota: 512Gi
- name: "nvidia.com/gpu"
nominalQuota: 8 # 8 H100 GPUs
fairSharing:
weight: 1 # Equal weight with other ClusterQueuesPer-Team Local Queues
# Team A: ML training (gets up to 8 GPUs)
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: ml-training
namespace: team-a
spec:
clusterQueue: gpu-pool
---
# Team B: Data pipeline (gets up to 4 GPUs)
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: data-pipeline
namespace: team-b
spec:
clusterQueue: gpu-pool
---
# Team C: Fine-tuning (gets up to 4 GPUs)
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: fine-tuning
namespace: team-c
spec:
clusterQueue: gpu-poolSubmit a Queued Job
apiVersion: batch/v1
kind: Job
metadata:
name: llm-training-run-42
namespace: team-a
labels:
kueue.x-k8s.io/queue-name: ml-training # Assign to LocalQueue
spec:
parallelism: 4
completions: 4
template:
spec:
containers:
- name: trainer
image: myorg/llm-trainer:v3.0
command: ["torchrun", "--nproc_per_node=2"]
args: ["train.py", "--model=llama-7b", "--epochs=3"]
resources:
requests:
cpu: "16"
memory: "64Gi"
nvidia.com/gpu: 2 # 2 GPUs per pod Γ 4 pods = 8 GPUs
limits:
nvidia.com/gpu: 2
restartPolicy: NeverPriority Classes for Preemption
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
name: production
value: 1000
description: "Production training jobs β preempt research"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
name: research
value: 100
description: "Research experiments β can be preempted"
---
# Job with priority
apiVersion: batch/v1
kind: Job
metadata:
name: production-training
namespace: team-a
labels:
kueue.x-k8s.io/queue-name: ml-training
kueue.x-k8s.io/priority-class: production
spec:
template:
spec:
containers:
- name: trainer
image: myorg/llm-trainer:v3.0
resources:
limits:
nvidia.com/gpu: 8
restartPolicy: NeverMonitor Queue Status
# Check queue status
kubectl get clusterqueues
kubectl get localqueues -A
kubectl get workloads -A
# Detailed queue usage
kubectl describe clusterqueue gpu-pool
# Output shows:
# Flavors Usage:
# nvidia-a100-80gb:
# nvidia.com/gpu: 12/16 (used/nominal)
# Pending Workloads: 3
# Admitted Workloads: 5MultiKueue: Cross-Cluster Jobs
# Distribute jobs across multiple clusters
apiVersion: kueue.x-k8s.io/v1beta1
kind: MultiKueue
metadata:
name: global-gpu-pool
spec:
clusters:
- name: cluster-us-east
kubeConfig:
secretRef:
name: cluster-us-east-kubeconfig
- name: cluster-eu-west
kubeConfig:
secretRef:
name: cluster-eu-west-kubeconfig
admissionCheck:
controllerName: kueue.x-k8s.io/multikueueCommon Issues
| Issue | Cause | Fix |
|---|---|---|
| Job stuck in βInadmissibleβ | Not enough quota in ClusterQueue | Increase nominalQuota or wait for running jobs to complete |
| Job never gets admitted | Missing kueue.x-k8s.io/queue-name label | Add the label to Job metadata |
| Wrong GPU flavor selected | Node labels donβt match ResourceFlavor | Verify nvidia.com/gpu.product labels on nodes |
| Preemption not working | WorkloadPriorityClass not set | Add priority label to Job |
| Fair sharing unbalanced | Queue weights misconfigured | Adjust fairSharing.weight in ClusterQueue |
Best Practices
- One ClusterQueue per GPU pool β separate A100 and H100 pools if pricing/access differs
- LocalQueue per team/project β maps to namespaces for multi-tenancy
- Use WorkloadPriorityClass β production training > research experiments > nightly batch
- Enable preemption β
LowerPrioritypreemption prevents low-priority jobs from blocking production - Monitor admission latency β long queue times signal need for more GPUs or better scheduling
- Combine with Cluster Autoscaler β scale nodes when queue depth exceeds threshold
Key Takeaways
- Kueue adds enterprise batch scheduling to Kubernetes: quotas, queuing, fair sharing
- ResourceFlavors describe GPU/CPU types; ClusterQueues define resource pools
- Jobs wait in queue until quota is available β no more overcommitting GPUs
- Priority-based preemption ensures production jobs get resources first
- MultiKueue distributes jobs across clusters for capacity management
- Essential for teams running ML training workloads on shared GPU infrastructure

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
