πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
ai intermediate ⏱ 20 minutes K8s 1.28+

GPU Sharing MIG and Time-Slicing Kubernetes

Share GPUs across multiple pods with NVIDIA MIG and time-slicing on Kubernetes. MIG profiles for A100/H100, time-slicing configuration.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Use MIG (Multi-Instance GPU) on A100/H100 for hardware-isolated GPU partitions with guaranteed memory and compute. Use time-slicing on any NVIDIA GPU for soft sharing where pods take turns. MIG for production inference with SLA guarantees; time-slicing for dev/notebook workloads.

The Problem

GPUs are expensive ($2-5/hour for A100). Running one model per GPU wastes resources when inference workloads use only 20-30% of GPU compute. You need to share GPUs across multiple pods β€” but safely, with predictable performance.

The Solution

MIG (Multi-Instance GPU)

Available on A100 (7 instances), H100 (7 instances), A30 (4 instances).

# GPU Operator ClusterPolicy β€” enable MIG
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: gpu-cluster-policy
spec:
  mig:
    strategy: mixed
  migManager:
    enabled: true
    config:
      name: mig-config
---
# MIG configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: mig-config
  namespace: gpu-operator
data:
  config.yaml: |
    version: v1
    mig-configs:
      all-3g.40gb:
        - devices: all
          mig-enabled: true
          mig-devices:
            3g.40gb: 2
      all-1g.10gb:
        - devices: all
          mig-enabled: true
          mig-devices:
            1g.10gb: 7

Request MIG Slices in Pods

apiVersion: v1
kind: Pod
metadata:
  name: inference-small
spec:
  containers:
    - name: model
      image: registry.example.com/inference:1.0
      resources:
        limits:
          nvidia.com/mig-1g.10gb: 1
---
apiVersion: v1
kind: Pod
metadata:
  name: inference-large
spec:
  containers:
    - name: model
      image: registry.example.com/inference:1.0
      resources:
        limits:
          nvidia.com/mig-3g.40gb: 1

Time-Slicing Configuration

# GPU Operator device plugin config
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
  namespace: gpu-operator
data:
  any: |-
    version: v1
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: false
        resources:
          - name: nvidia.com/gpu
            replicas: 4

With replicas: 4, each physical GPU appears as 4 nvidia.com/gpu resources. 4 pods can share one GPU.

MIG vs Time-Slicing

FeatureMIGTime-Slicing
IsolationHardware (memory + compute)None (shared context)
GPUsA100, H100, A30 onlyAny NVIDIA GPU
MemoryDedicated per instanceShared (OOM possible)
PerformanceGuaranteed, predictableVariable, contention
Use caseProduction inferenceDev, notebooks, batch
Max instances7 per GPUConfigurable (4-10 typical)
graph TD
    subgraph MIG - Hardware Isolation
        A100[A100 80GB] --> M1[MIG 1g.10gb<br/>Pod A]
        A100 --> M2[MIG 1g.10gb<br/>Pod B]
        A100 --> M3[MIG 1g.10gb<br/>Pod C]
        A100 --> M4[MIG 3g.40gb<br/>Pod D]
    end
    
    subgraph Time-Slicing - Soft Sharing
        T4[T4 16GB] --> TS1[Pod E<br/>Time slice 1]
        T4 --> TS2[Pod F<br/>Time slice 2]
        T4 --> TS3[Pod G<br/>Time slice 3]
        T4 --> TS4[Pod H<br/>Time slice 4]
    end

Common Issues

MIG slices not appearing as resources

Check MIG manager: kubectl logs -n gpu-operator ds/nvidia-mig-manager. MIG mode must be enabled on the GPU first β€” requires node reboot.

Time-sliced pods OOMKilled on GPU

Time-slicing doesn’t partition GPU memory. If 4 pods each try to use 12GB on a 16GB GPU, some will OOM. Set CUDA_MPS_PINNED_DEVICE_MEM_LIMIT to cap per-pod GPU memory.

Best Practices

  • MIG for production inference β€” guaranteed memory and compute isolation
  • Time-slicing for dev/notebooks β€” simple, works on any GPU, but no isolation
  • Don’t time-slice more than 4-8x β€” diminishing returns and increased latency
  • Monitor GPU utilization β€” nvidia-smi dmon shows per-MIG and per-process stats
  • MIG reconfiguration requires draining the node β€” plan changes during maintenance windows

Key Takeaways

  • MIG provides hardware-isolated GPU partitions on A100/H100/A30
  • Time-slicing provides soft GPU sharing on any NVIDIA GPU
  • MIG guarantees memory and compute; time-slicing shares everything
  • MIG for production inference with SLA; time-slicing for dev and notebooks
  • Up to 7 MIG instances per A100/H100; time-slicing configurable (4-10 typical)
#gpu #mig #time-slicing #nvidia #sharing
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens