GitOps for AI Workloads on Kubernetes
Deploy AI models with GitOps on Kubernetes. Version ML models in Git, ArgoCD for model rollouts, Flux for GPU cluster sync.
π‘ Quick Answer: GitOps for AI workloads means model versions, serving configs, and GPU resource specs are all declared in Git and reconciled by ArgoCD or Flux. Model updates become Git commits: PR the new model version β review β merge β ArgoCD rolls out new inference pods with zero-downtime canary. This gives you versioned, auditable, rollback-capable AI deployments.
The Problem
AI deployments without GitOps look like this: someone runs kubectl apply with a new model image tag, nobody knows what version is running, rollbacks mean finding the old manifest, and multi-cluster deployments are manual. CNCFβs 2026 survey shows GitOps as the maturity signal for Kubernetes operations β and AI workloads need it more than web apps because model changes are riskier (wrong model = wrong answers).
flowchart LR
subgraph GIT["Git Repository"]
MODEL["model-config.yaml<br/>image: nim:1.7.3<br/>model: llama-3-70b<br/>tp: 4"]
INFRA["infra/<br/>GPU quotas, namespaces,<br/>PVCs, secrets"]
end
subgraph GITOPS["GitOps Controller"]
ARGO["ArgoCD / Flux<br/>reconcile every 5m"]
end
subgraph CLUSTER["GPU Cluster"]
INF_PODS["Inference Pods<br/>(4Γ A100)"]
MONITOR["Prometheus<br/>tokens/s, latency"]
end
GIT --> ARGO --> CLUSTER
MONITOR -->|"Alert if regression"| ARGOThe Solution
Repository Structure for AI GitOps
ai-platform/
βββ base/
β βββ inference/
β β βββ deployment.yaml
β β βββ service.yaml
β β βββ hpa.yaml
β β βββ kustomization.yaml
β βββ training/
β βββ job-template.yaml
β βββ kustomization.yaml
βββ models/
β βββ llama-3-70b/
β β βββ kustomization.yaml
β β βββ values.yaml # Model-specific config
β βββ mistral-7b/
β β βββ kustomization.yaml
β β βββ values.yaml
β βββ embedding-model/
β βββ values.yaml
βββ environments/
β βββ staging/
β β βββ kustomization.yaml # staging overrides
β βββ production/
β βββ kustomization.yaml # production overrides
βββ clusters/
βββ gpu-cluster-us/
β βββ kustomization.yaml
βββ gpu-cluster-eu/
βββ kustomization.yamlModel Config as Code
# models/llama-3-70b/values.yaml
# Every model change is a Git commit
model:
name: meta-llama/Meta-Llama-3-70B-Instruct
version: "3.1"
image: vllm/vllm-openai:v0.6.3
serving:
tensorParallelSize: 4
maxModelLen: 8192
gpuMemoryUtilization: 0.90
dtype: auto
resources:
gpu: 4
gpuType: nvidia-a100-80gb
memory: 320Gi
cpu: 16
scaling:
minReplicas: 2
maxReplicas: 8
targetTokensPerSecond: 500
# Change log (in commit message):
# v3.1: Upgraded to Llama 3.1, increased context to 8192
# v3.0: Initial deployment, tp=4, 2 replicasArgoCD Application for Model Serving
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: llama-3-70b-production
namespace: argocd
spec:
project: ai-inference
source:
repoURL: https://github.com/myorg/ai-platform.git
path: models/llama-3-70b
targetRevision: main
kustomize:
patches:
- target:
kind: Deployment
name: inference
patch: |
- op: replace
path: /spec/template/spec/containers/0/image
value: vllm/vllm-openai:v0.6.3
destination:
server: https://gpu-cluster-us.internal
namespace: ai-inference
syncPolicy:
automated:
selfHeal: true
prune: true
syncOptions:
- ServerSideApply=true
- RespectIgnoreDifferences=true
# Canary strategy for model updates
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas # Let HPA manage replicasCanary Model Rollout with Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: inference-llama-70b
namespace: ai-inference
spec:
replicas: 4
strategy:
canary:
steps:
- setWeight: 10 # 10% traffic to new model version
- pause: { duration: 10m } # Monitor for 10 minutes
- analysis:
templates:
- templateName: model-quality-check
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
canaryService: inference-canary
stableService: inference-stable
template:
spec:
containers:
- name: vllm
image: vllm/vllm-openai:v0.6.3
resources:
limits:
nvidia.com/gpu: 4
---
# Analysis: check model quality metrics before promoting
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: model-quality-check
spec:
metrics:
- name: latency-p99
provider:
prometheus:
address: http://prometheus:9090
query: |
histogram_quantile(0.99,
rate(vllm_request_duration_seconds_bucket{rollouts_pod_template_hash="{{args.canary-hash}}"}[5m]))
successCondition: result[0] < 5.0 # P99 latency < 5s
- name: error-rate
provider:
prometheus:
address: http://prometheus:9090
query: |
rate(vllm_request_failure_total{rollouts_pod_template_hash="{{args.canary-hash}}"}[5m])
/ rate(vllm_request_total{rollouts_pod_template_hash="{{args.canary-hash}}"}[5m])
successCondition: result[0] < 0.01 # Error rate < 1%Flux for Multi-Cluster GPU Sync
# Sync model configs to multiple GPU clusters
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: ai-models
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: ai-platform
path: ./models
prune: true
# Health checks: ensure model pods are serving
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: inference-llama-70b
namespace: ai-inference
timeout: 30m # GPU pods take longer to startModel Version Promotion Workflow
# Developer/ML Engineer workflow:
# 1. Update model config in Git
git checkout -b model/llama-3.1-upgrade
vim models/llama-3-70b/values.yaml # Change model version
# 2. Commit and push
git add -A
git commit -m "Upgrade Llama 3 to 3.1: 8K context, improved reasoning
- tensor_parallel_size unchanged (4)
- max_model_len: 4096 β 8192
- Tested in staging: P99 latency 3.2s, 0% error rate
- Benchmark: +12% on MMLU vs 3.0"
git push origin model/llama-3.1-upgrade
# 3. PR β review β merge
# ArgoCD detects change β canary rollout begins
# Argo Rollouts checks latency + error rate β promotes if healthy
# 4. Rollback (if needed)
git revert HEAD
git push origin main
# ArgoCD auto-reverts to previous model versionTraining Job via GitOps
# Training jobs also managed via Git
apiVersion: batch/v1
kind: Job
metadata:
name: finetune-llama-v42
namespace: ai-training
annotations:
argocd.argoproj.io/hook: PreSync # Run before inference update
spec:
template:
spec:
containers:
- name: trainer
image: myorg/llm-trainer:v3.0
args:
- "--base-model=meta-llama/Meta-Llama-3-70B"
- "--dataset=s3://datasets/custom-v42"
- "--output=s3://models/llama-3-finetuned-v42"
- "--lora-rank=16"
resources:
limits:
nvidia.com/gpu: 8
restartPolicy: NeverCommon Issues
| Issue | Cause | Fix |
|---|---|---|
| ArgoCD timeout on model deploy | GPU pod takes 10+ min to load model | Increase sync timeout to 30m |
| Canary gets no traffic | Service selector mismatch | Verify canary service matches rollout labels |
| Model rollback slow | Large model image pull | Use pre-cached images on GPU nodes |
| Git repo too large | Model weights in Git | Store weights in S3/GCS; Git tracks config only |
| Flux health check fails | Model not ready in time | Increase Kustomization timeout |
Best Practices
- Git tracks config, not weights β model binaries in object storage, Git has pointers
- Canary for every model update β wrong model = wrong answers; validate before promoting
- Separate inference and training repos β different cadence, different reviewers
- Use Argo Rollouts analysis β automated quality gates on latency, error rate, throughput
- Tag model versions semantically β
v3.1-8k-fp16tells you what changed - Monitor tokens/second after rollout β the real metric for inference quality
Key Takeaways
- GitOps makes AI deployments versioned, auditable, and rollback-capable
- Model configs, GPU specs, and scaling params are declared in Git
- ArgoCD + Argo Rollouts enables canary model rollouts with quality gates
- Flux syncs model configs across multi-cluster GPU infrastructure
- Git commits replace
kubectl applyβ every model change has a PR and review - 2026 trend: GitOps is the maturity signal for AI infrastructure on Kubernetes

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
