πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
ai advanced ⏱ 25 minutes K8s 1.28+

Kueue for Batch Jobs and GPU Queues

Use Kueue to manage batch job queues on Kubernetes. GPU quota, fair sharing, priority queues, ML training workloads, and multi-tenant cluster scheduling.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Kueue is a Kubernetes-native job queueing system that manages when and where batch Jobs run based on available quotas. Install Kueue, define ResourceFlavors (CPU/GPU types), ClusterQueues (resource pools), and LocalQueues (per-namespace). Jobs wait in queue until quota is available β€” no more overcommitting GPUs or starving lower-priority teams.

The Problem

Kubernetes Jobs run immediately if resources exist β€” there’s no native concept of β€œwait in line.” When multiple teams submit GPU training jobs simultaneously, you get either resource contention (OOMKilled, scheduling failures) or massive overprovisioning. Kueue adds enterprise batch scheduling: fair sharing between teams, GPU quota management, priority-based preemption, and multi-cluster job distribution.

flowchart TB
    subgraph SUBMIT["Job Submission"]
        TEAM_A["Team A<br/>ML Training<br/>(8Γ— A100)"]
        TEAM_B["Team B<br/>Data Pipeline<br/>(4Γ— A100)"]
        TEAM_C["Team C<br/>Fine-tuning<br/>(2Γ— A100)"]
    end
    
    subgraph KUEUE["Kueue Controller"]
        LQ_A["LocalQueue<br/>team-a"]
        LQ_B["LocalQueue<br/>team-b"]
        LQ_C["LocalQueue<br/>team-c"]
        CQ["ClusterQueue<br/>gpu-pool<br/>(16Γ— A100 total)"]
    end
    
    subgraph CLUSTER["Cluster Resources"]
        GPU["GPU Nodes<br/>16Γ— A100"]
    end
    
    TEAM_A --> LQ_A --> CQ
    TEAM_B --> LQ_B --> CQ
    TEAM_C --> LQ_C --> CQ
    CQ -->|"Admit when<br/>quota available"| GPU

The Solution

Install Kueue

kubectl apply --server-side -f \
  https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.0/manifests.yaml

# Verify
kubectl get pods -n kueue-system

Define Resource Flavors

# ResourceFlavors describe types of resources (GPU models, CPU tiers)
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: nvidia-a100-80gb
spec:
  nodeLabels:
    nvidia.com/gpu.product: "NVIDIA-A100-SXM4-80GB"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: nvidia-h100
spec:
  nodeLabels:
    nvidia.com/gpu.product: "NVIDIA-H100-80GB-HBM3"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: default-cpu
spec: {}  # No special labels β€” any CPU node

Create Cluster Queue

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: gpu-pool
spec:
  namespaceSelector: {}             # All namespaces can use this
  preemption:
    reclaimWithinCohort: Any
    withinClusterQueue: LowerPriority
  resourceGroups:
    - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
      flavors:
        - name: nvidia-a100-80gb
          resources:
            - name: "cpu"
              nominalQuota: 128
            - name: "memory"
              nominalQuota: 1Ti
            - name: "nvidia.com/gpu"
              nominalQuota: 16          # 16 A100 GPUs total
              borrowingLimit: 0         # No borrowing from other queues
        - name: nvidia-h100
          resources:
            - name: "cpu"
              nominalQuota: 64
            - name: "memory"
              nominalQuota: 512Gi
            - name: "nvidia.com/gpu"
              nominalQuota: 8           # 8 H100 GPUs
  fairSharing:
    weight: 1                           # Equal weight with other ClusterQueues

Per-Team Local Queues

# Team A: ML training (gets up to 8 GPUs)
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: ml-training
  namespace: team-a
spec:
  clusterQueue: gpu-pool
---
# Team B: Data pipeline (gets up to 4 GPUs)
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: data-pipeline
  namespace: team-b
spec:
  clusterQueue: gpu-pool
---
# Team C: Fine-tuning (gets up to 4 GPUs)
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: fine-tuning
  namespace: team-c
spec:
  clusterQueue: gpu-pool

Submit a Queued Job

apiVersion: batch/v1
kind: Job
metadata:
  name: llm-training-run-42
  namespace: team-a
  labels:
    kueue.x-k8s.io/queue-name: ml-training    # Assign to LocalQueue
spec:
  parallelism: 4
  completions: 4
  template:
    spec:
      containers:
        - name: trainer
          image: myorg/llm-trainer:v3.0
          command: ["torchrun", "--nproc_per_node=2"]
          args: ["train.py", "--model=llama-7b", "--epochs=3"]
          resources:
            requests:
              cpu: "16"
              memory: "64Gi"
              nvidia.com/gpu: 2          # 2 GPUs per pod Γ— 4 pods = 8 GPUs
            limits:
              nvidia.com/gpu: 2
      restartPolicy: Never

Priority Classes for Preemption

apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
  name: production
value: 1000
description: "Production training jobs β€” preempt research"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
  name: research
value: 100
description: "Research experiments β€” can be preempted"
---
# Job with priority
apiVersion: batch/v1
kind: Job
metadata:
  name: production-training
  namespace: team-a
  labels:
    kueue.x-k8s.io/queue-name: ml-training
    kueue.x-k8s.io/priority-class: production
spec:
  template:
    spec:
      containers:
        - name: trainer
          image: myorg/llm-trainer:v3.0
          resources:
            limits:
              nvidia.com/gpu: 8
      restartPolicy: Never

Monitor Queue Status

# Check queue status
kubectl get clusterqueues
kubectl get localqueues -A
kubectl get workloads -A

# Detailed queue usage
kubectl describe clusterqueue gpu-pool

# Output shows:
# Flavors Usage:
#   nvidia-a100-80gb:
#     nvidia.com/gpu: 12/16 (used/nominal)
#   Pending Workloads: 3
#   Admitted Workloads: 5

MultiKueue: Cross-Cluster Jobs

# Distribute jobs across multiple clusters
apiVersion: kueue.x-k8s.io/v1beta1
kind: MultiKueue
metadata:
  name: global-gpu-pool
spec:
  clusters:
    - name: cluster-us-east
      kubeConfig:
        secretRef:
          name: cluster-us-east-kubeconfig
    - name: cluster-eu-west
      kubeConfig:
        secretRef:
          name: cluster-eu-west-kubeconfig
  admissionCheck:
    controllerName: kueue.x-k8s.io/multikueue

Common Issues

IssueCauseFix
Job stuck in β€œInadmissible”Not enough quota in ClusterQueueIncrease nominalQuota or wait for running jobs to complete
Job never gets admittedMissing kueue.x-k8s.io/queue-name labelAdd the label to Job metadata
Wrong GPU flavor selectedNode labels don’t match ResourceFlavorVerify nvidia.com/gpu.product labels on nodes
Preemption not workingWorkloadPriorityClass not setAdd priority label to Job
Fair sharing unbalancedQueue weights misconfiguredAdjust fairSharing.weight in ClusterQueue

Best Practices

  • One ClusterQueue per GPU pool β€” separate A100 and H100 pools if pricing/access differs
  • LocalQueue per team/project β€” maps to namespaces for multi-tenancy
  • Use WorkloadPriorityClass β€” production training > research experiments > nightly batch
  • Enable preemption β€” LowerPriority preemption prevents low-priority jobs from blocking production
  • Monitor admission latency β€” long queue times signal need for more GPUs or better scheduling
  • Combine with Cluster Autoscaler β€” scale nodes when queue depth exceeds threshold

Key Takeaways

  • Kueue adds enterprise batch scheduling to Kubernetes: quotas, queuing, fair sharing
  • ResourceFlavors describe GPU/CPU types; ClusterQueues define resource pools
  • Jobs wait in queue until quota is available β€” no more overcommitting GPUs
  • Priority-based preemption ensures production jobs get resources first
  • MultiKueue distributes jobs across clusters for capacity management
  • Essential for teams running ML training workloads on shared GPU infrastructure
#kueue #batch-jobs #gpu-scheduling #ml-training #resource-management
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens