Installing NVIDIA KAI Scheduler for AI Workloads
Deploy KAI Scheduler for optimized GPU resource allocation in Kubernetes AI/ML clusters with hierarchical queues and batch scheduling
The Problem
Standard Kubernetes schedulers arenβt optimized for AI/ML workloads. They lack features like gang scheduling, GPU sharing, hierarchical resource quotas, and topology-aware placement that are essential for efficiently running training and inference workloads at scale.
The Solution
Deploy NVIDIA KAI Scheduler, a robust scheduler designed for large-scale GPU clusters. It provides batch scheduling, hierarchical queues, GPU sharing, elastic workloads, and topology-aware scheduling optimized for AI workloads.
KAI Scheduler Architecture
flowchart TB
subgraph cluster["βΈοΈ KUBERNETES CLUSTER"]
subgraph kai["π― KAI SCHEDULER"]
SC["Scheduler<br/>Controller"]
PG["PodGrouper"]
AD["Admission<br/>Controller"]
RR["Resource<br/>Reservation"]
end
subgraph queues["π HIERARCHICAL QUEUES"]
RQ["Root Queue<br/>(Cluster Resources)"]
TQ["Training Queue<br/>quota: 60%"]
IQ["Inference Queue<br/>quota: 30%"]
DQ["Dev Queue<br/>quota: 10%"]
end
subgraph nodes["π₯οΈ GPU NODES"]
N1["Node 1<br/>8x A100"]
N2["Node 2<br/>8x A100"]
N3["Node 3<br/>8x H100"]
end
subgraph workloads["π WORKLOADS"]
W1["Training Job<br/>PodGroup"]
W2["Inference<br/>Deployment"]
W3["Interactive<br/>Notebook"]
end
end
workloads --> AD
AD --> SC
SC --> PG
PG --> queues
queues --> nodesStep 1: Verify Prerequisites
# Check Kubernetes version
kubectl version --short
# Verify GPU Operator is installed
kubectl get pods -n gpu-operator
# Check GPU nodes are available
kubectl get nodes -l nvidia.com/gpu.present=true
# Verify GPUs are discoverable
kubectl describe node <gpu-node> | grep nvidia.com/gpuStep 2: Install KAI Scheduler
# Create namespace
kubectl create namespace kai-scheduler
# Install from NVIDIA's container registry (recommended)
# Replace <VERSION> with latest from https://github.com/NVIDIA/KAI-Scheduler/releases
helm upgrade -i kai-scheduler \
oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
-n kai-scheduler \
--version v0.12.10
# Verify installation
kubectl get pods -n kai-schedulerExpected output:
NAME READY STATUS RESTARTS AGE
kai-scheduler-0 1/1 Running 0 60s
kai-scheduler-admission-xxx 1/1 Running 0 60s
kai-scheduler-podgrouper-xxx 1/1 Running 0 60s
kai-scheduler-resource-reservation-xxx 1/1 Running 0 60sStep 3: Configure Default Queues
# queues.yaml
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
name: root
spec:
displayName: "Root Queue"
resources:
gpu:
quota: -1 # Unlimited (cluster total)
limit: -1
---
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
name: training
spec:
displayName: "Training Queue"
parentQueue: root
resources:
gpu:
quota: 16 # Guaranteed GPUs
limit: 24 # Max GPUs (over-quota)
overQuotaWeight: 2 # Priority for over-quota
---
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
name: inference
spec:
displayName: "Inference Queue"
parentQueue: root
resources:
gpu:
quota: 8
limit: 16
overQuotaWeight: 1
---
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
name: development
spec:
displayName: "Development Queue"
parentQueue: root
resources:
gpu:
quota: 4
limit: 8
overQuotaWeight: 0.5kubectl apply -f queues.yaml
kubectl get queuesStep 4: Create a Workload Namespace
# workload-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ai-workloads
labels:
# Associate namespace with a queue
runai/queue: trainingkubectl apply -f workload-namespace.yamlStep 5: Submit a Simple GPU Workload
# simple-gpu-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: gpu-test
namespace: ai-workloads
labels:
runai/queue: training
spec:
template:
metadata:
labels:
runai/queue: training
spec:
schedulerName: kai-scheduler
restartPolicy: Never
containers:
- name: cuda-test
image: nvcr.io/nvidia/cuda:12.0-base
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 1kubectl apply -f simple-gpu-job.yaml
kubectl get pods -n ai-workloads -w
kubectl logs job/gpu-test -n ai-workloadsStep 6: Configure Priority Classes
# priority-classes.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: kai-high-priority
value: 100
globalDefault: false
description: "High priority for production inference"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: kai-normal-priority
value: 50
globalDefault: true
description: "Normal priority for training jobs"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: kai-low-priority
value: 10
globalDefault: false
description: "Low priority for development/interactive"
preemptionPolicy: Never # Won't preempt otherskubectl apply -f priority-classes.yamlStep 7: OpenShift Installation (Optional)
# For OpenShift with GPU Operator < v25.10.0
helm upgrade -i kai-scheduler \
oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler \
-n kai-scheduler \
--create-namespace \
--version v0.12.10 \
--set admission.gpuPodRuntimeClassName=nullStep 8: Verify Scheduler is Working
# Check scheduler logs
kubectl logs -n kai-scheduler -l app=kai-scheduler --tail=100
# View queue status
kubectl get queues -o wide
# Check scheduled pods
kubectl get pods -A -o wide --field-selector spec.schedulerName=kai-scheduler
# View scheduling events
kubectl get events -n ai-workloads --field-selector reason=ScheduledKAI Scheduler Key Features
| Feature | Description |
|---|---|
| Batch Scheduling | Gang scheduling - all pods scheduled together or none |
| Hierarchical Queues | Two-level queue hierarchy for organizational control |
| GPU Sharing | Multiple workloads share GPUs efficiently |
| Bin Packing | Minimize fragmentation by packing pods |
| Spread Scheduling | Distribute for resilience and load balancing |
| Elastic Workloads | Dynamic scaling within min/max bounds |
| DRA Support | Kubernetes ResourceClaims for vendor GPUs |
| Topology-Aware | Optimal placement for NVLink/NVSwitch |
Troubleshooting
Pods stuck in Pending
# Check pod events
kubectl describe pod <pod-name> -n <namespace>
# Check queue capacity
kubectl get queue training -o yaml
# View scheduler logs for the pod
kubectl logs -n kai-scheduler -l app=kai-scheduler | grep <pod-name>Queue not found
# Ensure queue label is correct
kubectl get pod <pod-name> -o jsonpath='{.metadata.labels.runai/queue}'
# List available queues
kubectl get queues
# Check namespace queue association
kubectl get namespace <ns> -o jsonpath='{.metadata.labels.runai/queue}'Best Practices
| Practice | Description |
|---|---|
| Dedicated workload namespace | Never use kai-scheduler namespace for workloads |
| Queue per team | Create separate queues for different teams/projects |
| Set quotas | Define guaranteed and limit resources per queue |
| Use priority classes | Differentiate production vs development workloads |
| Monitor queue utilization | Track quota usage for capacity planning |
Summary
KAI Scheduler provides enterprise-grade GPU scheduling for Kubernetes AI workloads. With hierarchical queues, batch scheduling, and GPU sharing, it optimizes resource utilization while maintaining fairness across teams and workloads.
π Go Further with Kubernetes Recipes
Love this recipe? Thereβs so much more! This is just one of 100+ hands-on recipes in our comprehensive Kubernetes Recipes book.
Inside the book, youβll master:
- β Production-ready deployment strategies
- β Advanced networking and security patterns
- β Observability, monitoring, and troubleshooting
- β Real-world best practices from industry experts
βThe practical, recipe-based approach made complex Kubernetes concepts finally click for me.β
π Get Your Copy Now β Start building production-grade Kubernetes skills today!
π Get All 100+ Recipes in One Book
Stop searching β get every production-ready pattern with detailed explanations, best practices, and copy-paste YAML.
Want More Kubernetes Recipes?
This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.