πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Configuration intermediate ⏱ 15 minutes K8s 1.28+

PriorityClasses for GPU Workloads

Configure Kubernetes PriorityClasses for GPU workloads with training, serving, batch, and interactive tiers and preemption policies.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Create four PriorityClasses: P0 Training (1000), P1 Serving (800), P2 Batch (400), P3 Interactive (200). Higher priority jobs preempt lower ones. Use preemptionPolicy: PreemptLowerPriority for training/serving and Never for interactive to prevent cascade evictions.

The Problem

Without explicit priorities, GPU scheduling is first-come-first-served. A low-priority notebook session can block a critical training job for hours. When GPU resources are scarce, you need deterministic rules for who can evict whom.

The Solution

# P0 β€” Training (highest priority)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: gpu-training
value: 1000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "GPU training jobs β€” can preempt batch and interactive"
---
# P1 β€” Serving
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: gpu-serving
value: 800
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "GPU inference serving β€” can preempt batch and interactive"
---
# P2 β€” Batch
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: gpu-batch
value: 400
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "GPU batch jobs β€” can preempt interactive only"
---
# P3 β€” Interactive (lowest)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: gpu-interactive
value: 200
globalDefault: false
preemptionPolicy: Never
description: "Interactive notebooks β€” cannot preempt anything"

Using PriorityClasses

# Training job with high priority
apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: llm-finetune
  namespace: tenant-alpha
spec:
  pytorchReplicaSpecs:
    Worker:
      template:
        spec:
          priorityClassName: gpu-training
          containers:
            - name: trainer
              resources:
                limits:
                  nvidia.com/gpu: 8
---
# Interactive notebook with low priority
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jupyter-notebook
  namespace: tenant-alpha
spec:
  template:
    spec:
      priorityClassName: gpu-interactive
      containers:
        - name: jupyter
          resources:
            limits:
              nvidia.com/gpu: 1

Scoped Quotas per Priority

apiVersion: v1
kind: ResourceQuota
metadata:
  name: training-gpu-quota
  namespace: tenant-alpha
spec:
  hard:
    requests.nvidia.com/gpu: "6"
  scopeSelector:
    matchExpressions:
      - scopeName: PriorityClass
        operator: In
        values: ["gpu-training"]
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: interactive-gpu-quota
  namespace: tenant-alpha
spec:
  hard:
    requests.nvidia.com/gpu: "2"
  scopeSelector:
    matchExpressions:
      - scopeName: PriorityClass
        operator: In
        values: ["gpu-interactive"]
graph TD
    A[GPU Priority Hierarchy] --> B[P0 Training value 1000]
    A --> C[P1 Serving value 800]
    A --> D[P2 Batch value 400]
    A --> E[P3 Interactive value 200]
    
    B -->|Can preempt| C
    B -->|Can preempt| D
    B -->|Can preempt| E
    C -->|Can preempt| D
    C -->|Can preempt| E
    D -->|Can preempt| E
    E -->|Cannot preempt| F[Nothing]

Common Issues

  • Notebooks keep getting evicted β€” expected behavior; use preemptionPolicy: Never on interactive class so notebooks don’t evict others, but accept they’ll be evicted by training
  • Training job stuck pending despite lower-priority pods running β€” preemption takes time; scheduler needs to find pods whose removal frees enough GPUs
  • All pods use default priority β€” pods without priorityClassName get cluster default; set no globalDefault to force explicit priority selection

Best Practices

  • Four tiers is sufficient for most GPU clusters: training > serving > batch > interactive
  • Use preemptionPolicy: Never for interactive to prevent cascade evictions
  • Combine with scoped ResourceQuotas to limit GPUs per priority class per tenant
  • Document the preemption posture: who can evict whom β€” make it explicit in tenant onboarding
  • Training checkpoints are critical β€” preempted training jobs must be able to resume from checkpoint

Key Takeaways

  • PriorityClasses make GPU contention deterministic β€” no more random wins
  • Higher priority = can preempt lower priority pods to free GPUs
  • preemptionPolicy: Never prevents a class from evicting others
  • Scoped quotas limit GPU allocation per priority tier per tenant
  • Training jobs should always save checkpoints for preemption recovery
#priorityclass #gpu #scheduling #preemption #multi-tenant
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens