πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
ai intermediate ⏱ 15 minutes K8s 1.28+

NIM LLM Support Matrix and GPU Compatibility

Complete NVIDIA NIM support matrix for Kubernetes. Supported models, profiles, precision formats, GPU compatibility, and hardware requirements per model.

By Luca Berton β€’ β€’ πŸ“– 8 min read

πŸ’‘ Quick Answer: NIM LLM 2.x supports model-specific containers (Llama 3.1/3.3, GPT-OSS, Nemotron, StarCoder2) and model-free NIM for any vLLM-supported model. Profiles range from BF16 to NVFP4 quantization with TP1–TP8. Verified GPUs include A100, H100, H200, L40S, B200, B300, GH200, GB200, and Blackwell RTX PRO.

The Problem

Before deploying NIM on Kubernetes, you need to know which models are supported, which precision profiles are available, and whether your GPU hardware is verified. Mismatches between model, profile, and GPU cause deployment failures or degraded performance. This reference provides the complete NIM LLM 2.x support matrix.

flowchart LR
    MODEL["Model Selection"] --> PROFILE["Profile Selection<br/>(precision Γ— TP)"]
    PROFILE --> GPU["GPU Compatibility<br/>Check"]
    GPU -->|Compatible| DEPLOY["Deploy on K8s"]
    GPU -->|Incompatible| ALT["Use different TP,<br/>precision, or GPU"]

The Solution

NIM LLM 2.x Supported Models

ModelContainer ImageNIM VersionSizesKey Feature
GPT-OSS 120Bnim/openai/gpt-oss-120b2.0.1120BOpenAI open-source, MXFP4 only
GPT-OSS 20Bnim/openai/gpt-oss-20b2.0.120BOpenAI open-source, MXFP4 only
Llama 3.1 8B Instructnim/meta/llama-3.1-8b-instruct2.0.18BBF16/FP8/NVFP4 + LoRA
Llama 3.1 70B Instructnim/meta/llama-3.1-70b-instruct2.0.170BBF16/FP8/NVFP4 + LoRA
Llama 3.3 70B Instructnim/meta/llama-3.3-70b-instruct2.0.170BBF16/FP8/NVFP4 + LoRA
Nemotron Super 49Bnim/nvidia/llama-3.3-nemotron-super-49b-v1.52.0.149BBF16/FP8/NVFP4 + LoRA
Nemotron 3 Nanonim/nvidia/nemotron-3-nano2.0.1SmallBF16/FP8/NVFP4 + LoRA
Nemotron 3 Super 120Bnim/nvidia/nemotron-3-super-120b-a12b2.0.2120B MoEGPU-dependent profiles
StarCoder2 7Bnim/bigcode/starcoder2-7b2.0.17BCode generation, BF16 only
Model-Free NIMnim/nvidia/model-free-nim2.0.1AnyAny vLLM-supported model

Profile Matrix: Llama 3.1/3.3 70B

The most common enterprise deployment β€” full profile coverage:

PrecisionTP1TP2TP4TP8
BF16vllm-bf16-tp1-pp1vllm-bf16-tp2-pp1vllm-bf16-tp4-pp1vllm-bf16-tp8-pp1
BF16 + LoRAvllm-bf16-tp1-pp1-loravllm-bf16-tp2-pp1-loravllm-bf16-tp4-pp1-loravllm-bf16-tp8-pp1-lora
FP8vllm-fp8-tp1-pp1vllm-fp8-tp2-pp1vllm-fp8-tp4-pp1vllm-fp8-tp8-pp1
FP8 + LoRAvllm-fp8-tp1-pp1-loravllm-fp8-tp2-pp1-loravllm-fp8-tp4-pp1-loravllm-fp8-tp8-pp1-lora
NVFP4vllm-nvfp4-tp1-pp1vllm-nvfp4-tp2-pp1vllm-nvfp4-tp4-pp1vllm-nvfp4-tp8-pp1
NVFP4 + LoRAvllm-nvfp4-tp1-pp1-loravllm-nvfp4-tp2-pp1-loravllm-nvfp4-tp4-pp1-loravllm-nvfp4-tp8-pp1-lora

Profile Matrix: Llama 3.1 8B

Single-GPU model β€” TP1 only:

PrecisionTP1
BF16vllm-bf16-tp1-pp1
BF16 + LoRAvllm-bf16-tp1-pp1-lora
FP8vllm-fp8-tp1-pp1
FP8 + LoRAvllm-fp8-tp1-pp1-lora
NVFP4vllm-nvfp4-tp1-pp1
NVFP4 + LoRAvllm-nvfp4-tp1-pp1-lora

Profile Matrix: GPT-OSS (20B / 120B)

MXFP4 quantization only (new precision format):

PrecisionTP1TP2TP4TP8
MXFP4vllm-mxfp4-tp1-pp1vllm-mxfp4-tp2-pp1vllm-mxfp4-tp4-pp1vllm-mxfp4-tp8-pp1
MXFP4 + LoRAvllm-mxfp4-tp1-pp1-loravllm-mxfp4-tp2-pp1-loravllm-mxfp4-tp4-pp1-loravllm-mxfp4-tp8-pp1-lora

Precision Format Comparison

FormatVRAM Savings vs BF16Quality ImpactBest For
BF16BaselineNoneMaximum accuracy, LoRA fine-tuning
FP8~50%MinimalProduction inference on H100/H200
NVFP4~75%SmallCost-optimized, smaller GPUs
MXFP4~75%SmallGPT-OSS models specifically

GPU Compatibility Matrix

Which GPUs work with which models:

GPULlama 3.1 8BLlama 3.1/3.3 70BNemotron 49BNemotron 120BGPT-OSS 20BGPT-OSS 120BModel-Free
A10Gβ€”βœ…β€”β€”βœ…β€”β€”
A100 40GBβœ…βœ…βœ…βœ…βœ…βœ…βœ…
A100 80GBβœ…βœ…βœ…βœ…βœ…βœ…βœ…
L40Sβœ…βœ…βœ…βœ…βœ…βœ…β€”
H100 80GBβœ…βœ…βœ…βœ…βœ…βœ…βœ…
H100 NVLβœ…βœ…βœ…βœ…β€”βœ…βœ…
H200βœ…βœ…βœ…βœ…βœ…βœ…βœ…
H200 NVLβœ…βœ…βœ…βœ…β€”β€”βœ…
GH200 144GBβœ…βœ…βœ…βœ…βœ…βœ…β€”
GH200 480GBβœ…βœ…βœ…β€”βœ…β€”βœ…
B200βœ…βœ…βœ…βœ…βœ…βœ…β€”
B300 SXM6βœ…βœ…βœ…βœ…βœ…βœ…βœ…
GB200βœ…βœ…βœ…βœ…βœ…βœ…β€”
GB10βœ…β€”βœ…β€”βœ…β€”β€”
RTX PRO 4500 Blackwellβœ…β€”βœ…βœ…β€”β€”βœ…
RTX PRO 6000 Blackwellβœ…βœ…βœ…βœ…βœ…βœ…β€”

NIM LLM 1.x Models (Legacy)

Models still on NIM 1.x (1.15.0) β€” not yet migrated to 2.x:

ModelContainerNotes
DeepSeek-V3.1-Terminusdeepseek-ai/deepseek-v3.1-terminusLarge MoE model
DeepSeek-V3.2-Expdeepseek-ai/deepseek-v32-exp-nimExperimental
GLM-5zai-org/glm-5Chinese/English bilingual
MiniMax-M2.5minimax-ai/minimax-m25Large MoE
Qwen3-32Bqwen/qwen3-32bAlso DGX Spark variant
Qwen3-Coder-Nextqwen/qwen3-coder-nextCode generation
Qwen3-Next-80B-A3Bqwen/qwen3-next-80b-a3b-instructMoE, also thinking variant
Nemotron Nano 9B v2 DGX Sparknvidia/nvidia-nemotron-nano-9b-v2-dgx-sparkEdge deployment
Riva Translate 4Bnvidia/riva-translate-4b-instruct-v1.1Translation NIM
Healthcare Text2SQL (8B/49B)nvidia/llama-3.1-nemotron-nano-8b-healthcare-text2sql-v1.0Domain-specific

For 1.x model details, see NIM LLM 1.15.0 Supported Models.

Model-Free NIM Validated Models

Officially tested with model-free NIM (nim/nvidia/model-free-nim):

  • gpt-oss-20b
  • apriel-nemotron
  • codestral

Any vLLM-supported architecture works with model-free NIM β€” these are just the officially validated ones.

Quick reference for the most common deployment:

GPUVRAMRecommended ProfileMin GPUs
A100 40GB40GBvllm-fp8-tp4-pp14
A100 80GB80GBvllm-fp8-tp2-pp12
L40S48GBvllm-fp8-tp4-pp14
H100 80GB80GBvllm-fp8-tp1-pp11
H200141GBvllm-bf16-tp1-pp11
GH200 480GB480GBvllm-bf16-tp1-pp11
B200192GBvllm-bf16-tp1-pp11

How to Verify on Your Cluster

# Check GPU type on your nodes
kubectl get nodes -o json | jq -r '.items[] | .metadata.name + ": " + .status.capacity["nvidia.com/gpu"] + " Γ— " + (.metadata.labels["nvidia.com/gpu.product"] // "unknown")'

# List profiles for your specific hardware
kubectl run nim-profiles --rm -it --restart=Never \
  --image=nvcr.io/nim/meta/llama-3.1-70b-instruct:1.7.3 \
  --overrides='{"spec":{"containers":[{"name":"nim-profiles","image":"nvcr.io/nim/meta/llama-3.1-70b-instruct:1.7.3","command":["list-model-profiles"],"resources":{"limits":{"nvidia.com/gpu":"1"}}}]}}' \
  -- list-model-profiles

Common Issues

IssueCauseFix
No compatible profilesGPU not verified or VRAM too smallCheck GPU compatibility matrix above; use higher TP or FP8/NVFP4
FP8 not available on A100 40GBFP8 needs β‰₯ model weight size in VRAMUse multi-GPU TP or NVFP4 for A100 40GB
NVFP4 not available on some modelsNot all models have NVFP4 quantizationFall back to FP8 or BF16
Model stuck on 1.x NIMNot yet migrated to 2.xUse 1.x container tags; check 1.15.0 docs
Blackwell GPUs not listedNIM version too oldUpdate to NIM 2.0.1+ for Blackwell support

Best Practices

  • Start with FP8 β€” best VRAM/quality tradeoff on H100/H200/B200
  • Use NVFP4 for cost optimization β€” 75% VRAM savings enables smaller GPU counts
  • Check list-model-profiles β€” always verify on your actual hardware before deploying
  • Match NIM version to model β€” some models require 2.0.2 (e.g., Nemotron 120B)
  • Model-free NIM for unsupported models β€” any vLLM architecture works even if not in the matrix
  • Pin to verified GPU types β€” use nodeSelector with nvidia.com/gpu.product label

Key Takeaways

  • NIM 2.x supports 10 model-specific containers + model-free NIM for any vLLM model
  • Profiles combine precision (BF16/FP8/NVFP4/MXFP4) Γ— tensor parallelism (TP1-TP8) Γ— LoRA
  • GPU verification spans A100 through Blackwell (B200/B300/GB200) and RTX PRO
  • FP8 is the recommended default for H100/H200 β€” ~50% VRAM savings with minimal quality loss
  • Always run list-model-profiles to confirm compatibility on your specific hardware
#nvidia-nim #gpu-compatibility #support-matrix #model-profiles #hardware-requirements
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens