🤖 AI & ML Recipes

Run AI and ML workloads on Kubernetes with GPU scheduling, NVIDIA KAI Scheduler, model serving frameworks, and distributed training patterns.

71 recipes available

Beginner

Deploy Granite 4.0 Speech on Kubernetes

Deploy IBM Granite 4.0 1B Speech model on Kubernetes for automatic speech recognition. Lightweight 2B model runs on CPU or small GPU for STT workloads.

⏱ 15 minutes K8s 1.28+

Test LLM Inference Endpoints with curl

Validate Kubernetes-hosted LLM inference services using curl against OpenAI-compatible /v1/models, /v1/completions, and /v1/chat/completions endpoints.

⏱ 10 minutes K8s Any

Intermediate

AI Model Storage: hostPath vs PVC for Inference

Deploy AI models on Kubernetes using hostPath and PersistentVolumeClaim storage. Compare performance, security trade-offs, and production patterns for model serving.

🤖 AI & ML Recipes

Beginner

Deploy Granite 4.0 Speech on Kubernetes

Test LLM Inference Endpoints with curl

Intermediate

AI Model Storage: hostPath vs PVC for Inference

AIPerf Benchmark LLMs on Kubernetes

Deploy Fish Audio TTS on Kubernetes

Deploy Llama 3.1 8B Instruct on K8s

Deploy Microsoft Phi-4 on Kubernetes

Deploy Phi-4 Reasoning Vision on K8s

Deploy Qwen3 TTS on Kubernetes

Deploy Qwen3.5 35B MoE on Kubernetes

Deploy Qwen3.5 9B Multimodal on K8s

Deploy Whisper Speech-to-Text on K8s

GenAI-Perf Benchmark LLM Serving

GenAI-Perf Benchmark Triton on K8s

Kubeflow Training Operator on Kubernetes

Shared Model Caching Across Pods on Kubernetes

Time-Slicing vs MIG vs Full GPU Allocation

TensorRT-LLM vs vLLM on Triton

Deploying Vector Databases on Kubernetes

Compare NCCL Intra-Node vs Inter-Node Performance

Run NCCL AllGather Benchmarks for Model Parallel Validation

Benchmark NCCL AllReduce Performance on Kubernetes

Run NCCL Tests on Kubernetes for GPU Network Validation

Deploy Mistral 7B with vLLM on Kubernetes

Quantize LLMs for Efficient GPU Inference on Kubernetes

Kubernetes LLM Serving Frameworks Compared

Install NVIDIA GPU Operator on Kubernetes

Installing NVIDIA KAI Scheduler for AI Workloads

Hierarchical Queues and Resource Fairness with KAI Scheduler

Advanced

AIPerf Concurrency Sweep on K8s

AIPerf Multi-Model Benchmark on K8s

AIPerf Goodput and SLO Benchmarks

Batch AI Workloads with Volcano Scheduler on Kubernetes

AIPerf Trace Replay Benchmarks on K8s

Dell PowerEdge XE7740 GPU Node Setup

Deploy GLM-5 754B on Kubernetes

Deploy Llama 2 70B on Kubernetes

Deploy Kimi K2.5 1.1T MoE on Kubernetes

Deploy LTX Video Generation on K8s

Deploy MiniMax M2.5 229B on Kubernetes

Deploy NVIDIA Nemotron 120B MoE on K8s

Deploy Qwen3 235B MoE on Kubernetes

Deploy Qwen3 Coder 80B on Kubernetes

Deploy Qwen3.5 397B MoE on Kubernetes

RetinaNet Object Detection on K8s

Deploy Sarvam 105B on Kubernetes

Stable Diffusion XL on Kubernetes

Distributed Inference on Kubernetes

Distributed Training with Kubeflow Training Operator

LeaderWorkerSet Operator for AI Workloads

Llama Stack on Kubernetes with NVIDIA NIM

MLPerf Benchmarking on Kubernetes

MPI Operator for Distributed Training

Deploy NVIDIA Clara on Kubernetes

NVIDIA H200 GPU Workloads on Kubernetes

NVIDIA H300 GPU Workloads on Kubernetes

NVIDIA NeMo Training on Kubernetes

NVIDIA Pyxis and Enroot for SLURM

Run:AI GPU Quotas on OpenShift

SLURM and Kubernetes Integration

Triton Autoscaling with GPU Metrics

Triton Multi-Model Serving on Kubernetes

Triton TensorRT-LLM on Kubernetes

Triton with vLLM Backend on Kubernetes

Deploy Mistral 7B with NVIDIA NIM on Kubernetes

Autoscale LLM Inference on Kubernetes

Multi-GPU and Tensor Parallel LLM Inference on Kubernetes

Build a RAG Pipeline on Kubernetes

GPU Sharing and Bin Packing with KAI Scheduler

Batch Scheduling with PodGroups in KAI Scheduler

Topology-Aware Scheduling with KAI Scheduler

Want more ai & ml patterns?