Domain-Specific Language Models on Kubernetes
Deploy and fine-tune domain-specific LLMs on Kubernetes. Legal, healthcare, finance, and code models with LoRA fine-tuning, NIM serving, and RAG pipelines.
π‘ Quick Answer: Domain-Specific Language Models (DSLMs) are smaller, cheaper, and more accurate than general-purpose LLMs for specialized tasks. Deploy them on Kubernetes using NIM or vLLM for serving, fine-tune with LoRA adapters on a single GPU, and enhance with RAG pipelines for domain knowledge. A 7B domain model often outperforms a 70B general model in its specialty.
The Problem
General-purpose LLMs (GPT-4, Llama 70B) are expensive, slow, and often inaccurate for specialized domains. A legal firm doesnβt need a model that can write poetry β they need one that understands contract law precisely. In 2026, DSLMs are rising because theyβre 10-100Γ cheaper to run, faster to respond, and more accurate in their domain.
flowchart TB
subgraph GENERAL["General LLM (70B)"]
G1["$$$$ expensive"]
G2["Good at everything"]
G3["Great at nothing specific"]
end
subgraph DOMAIN["Domain-Specific LLM (7B)"]
D1["$ cheap"]
D2["Expert in ONE domain"]
D3["Outperforms 70B in specialty"]
end
BASE["Base Model<br/>(Llama 3.1 8B)"] -->|"LoRA Fine-Tune<br/>+ Domain Data"| DOMAIN
DOMAIN --> LEGAL["Legal DSLM"]
DOMAIN --> HEALTH["Healthcare DSLM"]
DOMAIN --> FINANCE["Finance DSLM"]
DOMAIN --> CODE["Code DSLM"]The Solution
LoRA Fine-Tuning Job on Kubernetes
apiVersion: batch/v1
kind: Job
metadata:
name: finetune-legal-model
spec:
template:
spec:
containers:
- name: trainer
image: nvcr.io/nvidia/pytorch:24.04-py3
command: ["python", "finetune.py"]
args:
- "--base-model=meta-llama/Meta-Llama-3.1-8B"
- "--dataset=/data/legal-corpus"
- "--output=/models/legal-llama-8b"
- "--lora-r=16"
- "--lora-alpha=32"
- "--epochs=3"
- "--batch-size=4"
- "--learning-rate=2e-4"
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token
key: token
resources:
limits:
nvidia.com/gpu: 1 # Single A100 for LoRA
memory: "40Gi"
volumeMounts:
- name: data
mountPath: /data
- name: models
mountPath: /models
- name: shm
mountPath: /dev/shm
volumes:
- name: data
persistentVolumeClaim:
claimName: training-data
- name: models
persistentVolumeClaim:
claimName: model-storage
- name: shm
emptyDir:
medium: Memory
sizeLimit: 16Gi
restartPolicy: NeverServe Domain Model with NIM
# Model-free NIM with custom LoRA model
apiVersion: apps/v1
kind: Deployment
metadata:
name: legal-llm
spec:
template:
spec:
containers:
- name: nim
image: nvcr.io/nim/nim-llm:2.0.2
env:
- name: NIM_MODEL_PATH
value: "/models/legal-llama-8b"
- name: NIM_MAX_MODEL_LEN
value: "8192"
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: 1
volumeMounts:
- name: models
mountPath: /models
volumes:
- name: models
persistentVolumeClaim:
claimName: model-storage
---
apiVersion: v1
kind: Service
metadata:
name: legal-llm
spec:
selector:
app: legal-llm
ports:
- port: 8000RAG Pipeline for Domain Knowledge
# Vector store for domain documents
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: domain-vectordb
spec:
template:
spec:
containers:
- name: qdrant
image: qdrant/qdrant:v1.12.0
ports:
- containerPort: 6333
volumeMounts:
- name: data
mountPath: /qdrant/storage
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 100Gi
---
# RAG service that combines retrieval + domain LLM
apiVersion: apps/v1
kind: Deployment
metadata:
name: legal-rag-service
spec:
template:
spec:
containers:
- name: rag
image: myorg/rag-service:v1.0
env:
- name: LLM_URL
value: "http://legal-llm:8000/v1"
- name: VECTOR_DB_URL
value: "http://domain-vectordb:6333"
- name: EMBEDDING_MODEL
value: "BAAI/bge-large-en-v1.5"
ports:
- containerPort: 8080Multi-Domain Router
Serve multiple domain models and route based on query:
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-router
spec:
template:
spec:
containers:
- name: router
image: myorg/model-router:v1.0
env:
- name: LEGAL_MODEL_URL
value: "http://legal-llm:8000/v1"
- name: FINANCE_MODEL_URL
value: "http://finance-llm:8000/v1"
- name: HEALTH_MODEL_URL
value: "http://health-llm:8000/v1"
- name: GENERAL_MODEL_URL
value: "http://general-llm:8000/v1"
- name: CLASSIFIER_MODEL
value: "domain-classifier-v1"
ports:
- containerPort: 8080Domain Model Examples
| Domain | Base Model | Fine-Tune Data | GPU Needs | Use Case |
|---|---|---|---|---|
| Legal | Llama 3.1 8B | Contract corpus, case law | 1Γ A100 | Contract review, clause extraction |
| Healthcare | Llama 3.1 8B | Clinical notes, PubMed | 1Γ A100 | Medical coding, diagnosis assist |
| Finance | Llama 3.1 8B | SEC filings, earnings calls | 1Γ A100 | Risk analysis, compliance |
| Code | CodeLlama 13B | Internal codebase | 1Γ A100 | Code completion, review |
| Customer Support | Llama 3.1 8B | Support tickets, KB articles | 1Γ A100 | Auto-response, ticket routing |
Common Issues
| Issue | Cause | Fix |
|---|---|---|
| Fine-tuned model worse than base | Bad training data or overfitting | Clean data, reduce epochs, increase `lora-r` |
| Model hallucinating domain facts | No RAG, relying on memorization | Add RAG pipeline with verified documents |
| High GPU cost for multiple domains | Each domain needs dedicated GPU | Use LoRA adapter switching (single base model) |
| Slow LoRA training | Large dataset on slow storage | Use SSD/NVMe PVCs, reduce dataset size |
| Model doesnβt follow format | Fine-tune data lacks instruction format | Use instruction-tuned base model + formatted data |
Best Practices
- Start with RAG before fine-tuning β often sufficient and much cheaper
- Use LoRA, not full fine-tuning β trains in hours on 1 GPU vs days on many
- Validate with domain experts β automated metrics miss domain-specific errors
- Version your models β tag images and PVCs with model version + training date
- A/B test against general models β prove the domain model is actually better
- Use 7B-8B base models β sweet spot for cost vs capability for most domains
Key Takeaways
- DSLMs outperform general LLMs in their specialty while being 10-100Γ cheaper to run
- LoRA fine-tuning trains domain models on a single GPU in hours
- Model-free NIM (`nim-llm:2.0.2`) serves any custom model without rebuilding containers
- RAG + domain model > general model for knowledge-intensive tasks
- Multi-domain routing lets one cluster serve legal, finance, health, and code models
- 2026 trend: enterprises building domain model portfolios, not relying on one general LLM

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
