Time-Slicing vs MIG vs Full GPU Allocation
Compare GPU sharing strategies: time-slicing for notebooks, MIG for isolated inference, and full GPU for training workloads.
π‘ Quick Answer: Use full GPU for training (max performance, no sharing overhead). Use MIG for inference with isolation (separate memory/compute, no noisy neighbor). Use time-slicing for interactive notebooks and development (cheapest sharing, but no memory isolation).
The Problem
GPUs are expensive. Giving every notebook user a full H200 (141GB) is wasteful when they use 5% of compute. But sharing GPUs without isolation causes noisy-neighbor problems where one userβs OOM kills anotherβs process. You need different sharing strategies for different workload types.
The Solution
Comparison Matrix
full_gpu:
isolation: "Complete β dedicated GPU"
memory: "Full VRAM (141GB on H200)"
compute: "100% SM access"
overhead: "None"
use_case: "Training, large model inference"
fault_isolation: "Complete"
supported_gpus: "All NVIDIA GPUs"
mig:
isolation: "Hardware β separate memory and compute partitions"
memory: "Partitioned (e.g., 7x 20GB on H200)"
compute: "Partitioned SMs per instance"
overhead: "Minimal (~2-3%)"
use_case: "Inference serving, CI/CD GPU tests"
fault_isolation: "Complete β OOM in one partition doesn't affect others"
supported_gpus: "A100, H100, H200 only"
time_slicing:
isolation: "None β shared memory space"
memory: "Shared (all users see full VRAM)"
compute: "Time-multiplexed (round-robin)"
overhead: "Context switching (~10-15%)"
use_case: "Notebooks, development, light workloads"
fault_isolation: "None β OOM affects all users"
supported_gpus: "All NVIDIA GPUs"Full GPU (Training)
apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
name: llm-training
namespace: tenant-alpha
spec:
pytorchReplicaSpecs:
Worker:
replicas: 1
template:
spec:
containers:
- name: trainer
image: nvcr.io/nvidia/pytorch:24.03-py3
resources:
limits:
nvidia.com/gpu: 8 # 8 full GPUs, no sharingMIG Configuration (Inference)
# ClusterPolicy MIG configuration
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: gpu-cluster-policy
spec:
mig:
strategy: mixed # Some nodes MIG, some full GPU
devicePlugin:
config:
name: mig-config
default: all-balanced
---
# MIG config profiles
apiVersion: v1
kind: ConfigMap
metadata:
name: mig-config
namespace: gpu-operator
data:
# 7 equal partitions for inference
all-balanced: |
version: v1
mig-configs:
all-balanced:
- device-filter: ["0x2339"] # H200
devices: all
mig-enabled: true
mig-devices:
"1g.20gb": 7
# Mixed: 1 large + 3 small
mixed-workload: |
version: v1
mig-configs:
mixed-workload:
- device-filter: ["0x2339"]
devices: all
mig-enabled: true
mig-devices:
"4g.80gb": 1 # Large inference model
"1g.20gb": 3 # Small models or preprocessingMIG Pod Request
# Request a specific MIG partition
apiVersion: v1
kind: Pod
metadata:
name: inference-small
namespace: tenant-beta
spec:
containers:
- name: model
image: vllm/vllm-openai:v0.6.6
resources:
limits:
nvidia.com/mig-1g.20gb: 1 # Request 1x 20GB MIG slice
---
apiVersion: v1
kind: Pod
metadata:
name: inference-large
spec:
containers:
- name: model
resources:
limits:
nvidia.com/mig-4g.80gb: 1 # Request 1x 80GB MIG sliceTime-Slicing Configuration (Notebooks)
# ClusterPolicy time-slicing
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: gpu-cluster-policy
spec:
devicePlugin:
config:
name: time-slicing-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: time-slicing-config
namespace: gpu-operator
data:
any: |
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 4 # Each GPU appears as 4 virtual GPUsTime-Sliced Pod
# Request a time-sliced GPU share
apiVersion: v1
kind: Pod
metadata:
name: jupyter-notebook
namespace: tenant-alpha
spec:
containers:
- name: jupyter
image: jupyter/pytorch-notebook:latest
resources:
limits:
nvidia.com/gpu: 1 # Gets 1/4 of a GPU (time-sliced)Node Labeling for Mixed Strategy
# Label nodes by GPU sharing strategy
oc label node gpu-worker-1 gpu-worker-2 nvidia.com/gpu-sharing=full
oc label node gpu-worker-3 nvidia.com/gpu-sharing=mig
oc label node gpu-worker-4 nvidia.com/gpu-sharing=time-slicing
# Use nodeSelector in pods:
# Training β gpu-sharing=full
# Inference β gpu-sharing=mig
# Notebooks β gpu-sharing=time-slicingDecision Matrix
graph TD
A[GPU Workload] --> B{Workload Type?}
B -->|Training| C[Full GPU]
B -->|Inference| D{Isolation needed?}
B -->|Interactive or Dev| E[Time-Slicing]
D -->|Yes: SLA critical| F[MIG Partitions]
D -->|No: best-effort| E
C --> G[8x Full GPUs per node]
F --> H[7x 20GB or mixed partitions]
E --> I[4x virtual per GPU]
G --> J[Max performance, no sharing]
H --> K[Hardware isolation, fixed memory]
I --> L[Most users per GPU, no isolation]Common Issues
- MIG not supported β only A100, H100, H200 support MIG; older GPUs (V100, T4) must use time-slicing
- Time-sliced GPU OOM β all users share memory; one large allocation can OOM others; use LimitRange to cap per-container memory
- MIG reconfiguration requires drain β changing MIG profiles needs GPU idle; drain node, reconfigure, uncordon
- Mixed strategy complexity β label nodes clearly; use nodeSelector to route workloads to correct sharing type
Best Practices
- Full GPU for training β sharing overhead defeats the purpose of distributed training
- MIG for production inference β hardware isolation ensures SLA compliance
- Time-slicing for notebooks and dev β cheapest sharing, acceptable for interactive use
- Label nodes by sharing strategy β donβt mix strategies on the same GPU
- Use MIG mixed profiles when serving both large and small models on the same node
- Document the sharing strategy in tenant onboarding β teams need to know what theyβre getting
Key Takeaways
- Three GPU sharing strategies: full (training), MIG (inference), time-slicing (dev)
- MIG provides hardware isolation with partitioned memory and compute β no noisy neighbor
- Time-slicing provides maximum density but no memory isolation β only for interactive use
- Label nodes by strategy and use nodeSelector for deterministic routing
- H200 MIG: 7x 20GB partitions or flexible mixed profiles
- Strategy choice directly impacts performance, isolation, and cost per user

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
