AI Resource Allocation Optimization
Optimize GPU and memory allocation for AI workloads on Kubernetes. Right-size GPU requests, bin-packing strategies, gang scheduling.
π‘ Quick Answer: Use gang scheduling (Volcano/Coscheduling) for distributed training β all workers start together or none do. Enable topology-aware scheduling to co-locate GPU pods on the same switch for NCCL performance. Implement priority-based preemption: inference > training > notebooks.
The Problem
AI/ML workloads have unique scheduling requirements: distributed training needs all workers to start simultaneously (gang scheduling), GPU communication requires network proximity (topology awareness), and mixed workloads (training + inference + notebooks) compete for limited GPU resources.
The Solution
Gang Scheduling with Volcano
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
name: training-job
spec:
minMember: 4
queue: gpu-queue
priorityClassName: training
---
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: distributed-training
spec:
schedulerName: volcano
minAvailable: 4
policies:
- event: PodEvicted
action: RestartJob
tasks:
- replicas: 4
name: worker
template:
spec:
schedulerName: volcano
containers:
- name: pytorch
image: registry.example.com/training:1.0
resources:
limits:
nvidia.com/gpu: 8All 4 workers (32 GPUs) must be schedulable simultaneously. If only 24 GPUs are available, the job waits rather than partially starting.
GPU Bin-Packing
# Scheduler configuration for GPU bin-packing
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated
resources:
- name: nvidia.com/gpu
weight: 10
- name: cpu
weight: 1
- name: memory
weight: 1MostAllocated packs GPU workloads onto fewer nodes β frees up entire nodes for large multi-GPU jobs.
Priority Hierarchy for AI Workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: inference-critical
value: 100000
description: "Production inference β preempts everything"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: training-standard
value: 10000
preemptionPolicy: Never
description: "Training β queues without preempting"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: notebook-dev
value: 1000
preemptionPolicy: Never
description: "Interactive notebooks β lowest priority"graph TD
subgraph GPU Cluster - 4 Nodes Γ 8 GPUs
N1[Node 1<br/>8/8 GPU used<br/>Training Job A]
N2[Node 2<br/>8/8 GPU used<br/>Training Job A]
N3[Node 3<br/>4/8 GPU used<br/>Inference pods]
N4[Node 4<br/>2/8 GPU used<br/>Notebooks]
end
GANG[Gang Scheduler<br/>All-or-nothing] -->|16 GPUs| N1 & N2
BINPACK[Bin-Packing<br/>MostAllocated] -->|Pack inference| N3
PREEMPT[Priority<br/>Inference > Training] -->|Can preempt| N4Common Issues
Gang-scheduled job stuck in Pending
Not enough GPUs available simultaneously. Check: kubectl describe podgroup training-job. Consider preempting lower-priority workloads or adding nodes.
Training pods scattered across racks β slow NCCL
Enable topology-aware scheduling. Label nodes with topology.kubernetes.io/rack and use topology spread constraints to co-locate training pods.
Best Practices
- Gang scheduling for distributed training β partial starts waste GPU time
- Bin-pack GPUs with
MostAllocatedscoring β frees entire nodes for large jobs - Priority: inference > training > notebooks β production SLA always wins
- Topology-aware placement β co-locate training pods on same switch for NCCL performance
preemptionPolicy: Neverfor training β queue instead of disrupting other jobs
Key Takeaways
- Gang scheduling ensures all workers start together β prevents deadlocks and wasted GPUs
- GPU bin-packing consolidates workloads onto fewer nodes
- Priority-based preemption: inference always gets GPUs, training queues
- Topology-aware scheduling reduces NCCL communication latency by 2-5x
- Combine gang scheduling + topology awareness + priority for optimal GPU cluster utilization

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
