Kueue Job Queuing Fair Sharing Kubernetes
Implement fair-share GPU job queuing with Kueue on Kubernetes. ClusterQueues, LocalQueues, ResourceFlavors, and cohort-based borrowing for multi-team AI cl.
π‘ Quick Answer: Install Kueue and create
ClusterQueueswith GPU quotas per team. Teams submit jobs toLocalQueuesβ Kueue admits jobs when resources are available and queues the rest. Cohort-based borrowing lets idle GPUs be used by other teams automatically.
The Problem
Multiple teams compete for a shared GPU cluster. Without queuing, teams over-provision to guarantee GPU access, or jobs fail with βinsufficient resources.β ResourceQuotas alone donβt queue β they reject. Kueue provides fair-share queuing that maximizes GPU utilization while respecting team quotas.
The Solution
Install Kueue
kubectl apply --server-side -f \
https://github.com/kubernetes-sigs/kueue/releases/download/v0.9.0/manifests.yamlDefine Resource Flavors
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: a100-80gb
spec:
nodeLabels:
nvidia.com/gpu.product: A100-SXM4-80GB
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: h100-80gb
spec:
nodeLabels:
nvidia.com/gpu.product: H100-SXM5-80GBClusterQueues with Fair Sharing
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: team-alpha
spec:
cohort: gpu-cluster
resourceGroups:
- coveredResources: ["nvidia.com/gpu", "cpu", "memory"]
flavors:
- name: a100-80gb
resources:
- name: nvidia.com/gpu
nominalQuota: 16
borrowingLimit: 8
- name: cpu
nominalQuota: 64
- name: memory
nominalQuota: 256Gi
preemption:
reclaimWithinCohort: Any
withinClusterQueue: LowerPriority
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: team-beta
spec:
cohort: gpu-cluster
resourceGroups:
- coveredResources: ["nvidia.com/gpu", "cpu", "memory"]
flavors:
- name: a100-80gb
resources:
- name: nvidia.com/gpu
nominalQuota: 16
borrowingLimit: 8LocalQueue (Per Namespace)
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: training-queue
namespace: team-alpha
spec:
clusterQueue: team-alphaSubmit a Training Job
apiVersion: batch/v1
kind: Job
metadata:
name: llm-finetune
namespace: team-alpha
labels:
kueue.x-k8s.io/queue-name: training-queue
spec:
template:
spec:
containers:
- name: training
image: registry.example.com/training:1.0
resources:
requests:
nvidia.com/gpu: 8
restartPolicy: NeverHow Borrowing Works
Team Alpha quota: 16 GPUs
Team Beta quota: 16 GPUs
Total cluster: 32 GPUs
Scenario: Team Alpha needs 24 GPUs, Team Beta using 8
β Team Alpha uses 16 (own) + 8 (borrowed from Beta's idle)
β Total: 24 GPUs active, 8 idle = 75% utilization
When Team Beta submits a job needing 12 GPUs:
β Kueue reclaims 4 borrowed GPUs from Team Alpha
β Team Alpha: 20 GPUs, Team Beta: 12 GPUsgraph TD
subgraph Cohort: gpu-cluster
CQ_A[ClusterQueue: team-alpha<br/>Quota: 16 GPU<br/>Borrow limit: 8]
CQ_B[ClusterQueue: team-beta<br/>Quota: 16 GPU<br/>Borrow limit: 8]
end
LQ_A[LocalQueue<br/>ns: team-alpha] --> CQ_A
LQ_B[LocalQueue<br/>ns: team-beta] --> CQ_B
CQ_A <-->|Borrow idle GPUs| CQ_B
JOB1[Job: 8 GPU<br/>β
Admitted] --> LQ_A
JOB2[Job: 24 GPU<br/>β³ Queued<br/>waiting for GPUs] --> LQ_ACommon Issues
Jobs not being admitted β stuck in queue
Check ClusterQueue status: kubectl get clusterqueue -o wide. Ensure ResourceFlavor nodeLabels match actual node labels.
Borrowed GPUs not being reclaimed
reclaimWithinCohort: Any must be set. Without it, borrowed resources are only reclaimed when the borrowing job completes.
Best Practices
- Cohort-based sharing maximizes utilization β idle GPUs automatically available to other teams
borrowingLimitprevents one team from taking everything β cap at 50% of other queues- Preemption for fairness β
reclaimWithinCohort: Anyreclaims borrowed resources - Priority classes for job importance β inference > training > experiments
- Monitor queue depth β Kueue exposes Prometheus metrics for admission latency
Key Takeaways
- Kueue provides fair-share GPU queuing that maximizes cluster utilization
- ClusterQueues define per-team GPU quotas; LocalQueues are the submission interface
- Cohort-based borrowing lets idle GPUs be used by other teams automatically
- Preemption reclaims borrowed resources when the owning team needs them
- Unlike ResourceQuotas, Kueue queues jobs instead of rejecting them

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
