πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
ai advanced ⏱ 25 minutes K8s 1.28+

Domain-Specific Language Models on Kubernetes

Deploy and fine-tune domain-specific LLMs on Kubernetes. Legal, healthcare, finance, and code models with LoRA fine-tuning, NIM serving, and RAG pipelines.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Domain-Specific Language Models (DSLMs) are smaller, cheaper, and more accurate than general-purpose LLMs for specialized tasks. Deploy them on Kubernetes using NIM or vLLM for serving, fine-tune with LoRA adapters on a single GPU, and enhance with RAG pipelines for domain knowledge. A 7B domain model often outperforms a 70B general model in its specialty.

The Problem

General-purpose LLMs (GPT-4, Llama 70B) are expensive, slow, and often inaccurate for specialized domains. A legal firm doesn’t need a model that can write poetry β€” they need one that understands contract law precisely. In 2026, DSLMs are rising because they’re 10-100Γ— cheaper to run, faster to respond, and more accurate in their domain.

flowchart TB
    subgraph GENERAL["General LLM (70B)"]
        G1["$$$$ expensive"]
        G2["Good at everything"]
        G3["Great at nothing specific"]
    end
    subgraph DOMAIN["Domain-Specific LLM (7B)"]
        D1["$ cheap"]
        D2["Expert in ONE domain"]
        D3["Outperforms 70B in specialty"]
    end
    
    BASE["Base Model<br/>(Llama 3.1 8B)"] -->|"LoRA Fine-Tune<br/>+ Domain Data"| DOMAIN
    DOMAIN --> LEGAL["Legal DSLM"]
    DOMAIN --> HEALTH["Healthcare DSLM"]
    DOMAIN --> FINANCE["Finance DSLM"]
    DOMAIN --> CODE["Code DSLM"]

The Solution

LoRA Fine-Tuning Job on Kubernetes

apiVersion: batch/v1
kind: Job
metadata:
  name: finetune-legal-model
spec:
  template:
    spec:
      containers:
        - name: trainer
          image: nvcr.io/nvidia/pytorch:24.04-py3
          command: ["python", "finetune.py"]
          args:
            - "--base-model=meta-llama/Meta-Llama-3.1-8B"
            - "--dataset=/data/legal-corpus"
            - "--output=/models/legal-llama-8b"
            - "--lora-r=16"
            - "--lora-alpha=32"
            - "--epochs=3"
            - "--batch-size=4"
            - "--learning-rate=2e-4"
          env:
            - name: HF_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-token
                  key: token
          resources:
            limits:
              nvidia.com/gpu: 1       # Single A100 for LoRA
              memory: "40Gi"
          volumeMounts:
            - name: data
              mountPath: /data
            - name: models
              mountPath: /models
            - name: shm
              mountPath: /dev/shm
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: training-data
        - name: models
          persistentVolumeClaim:
            claimName: model-storage
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: 16Gi
      restartPolicy: Never

Serve Domain Model with NIM

# Model-free NIM with custom LoRA model
apiVersion: apps/v1
kind: Deployment
metadata:
  name: legal-llm
spec:
  template:
    spec:
      containers:
        - name: nim
          image: nvcr.io/nim/nim-llm:2.0.2
          env:
            - name: NIM_MODEL_PATH
              value: "/models/legal-llama-8b"
            - name: NIM_MAX_MODEL_LEN
              value: "8192"
          ports:
            - containerPort: 8000
          resources:
            limits:
              nvidia.com/gpu: 1
          volumeMounts:
            - name: models
              mountPath: /models
      volumes:
        - name: models
          persistentVolumeClaim:
            claimName: model-storage
---
apiVersion: v1
kind: Service
metadata:
  name: legal-llm
spec:
  selector:
    app: legal-llm
  ports:
    - port: 8000

RAG Pipeline for Domain Knowledge

# Vector store for domain documents
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: domain-vectordb
spec:
  template:
    spec:
      containers:
        - name: qdrant
          image: qdrant/qdrant:v1.12.0
          ports:
            - containerPort: 6333
          volumeMounts:
            - name: data
              mountPath: /qdrant/storage
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        resources:
          requests:
            storage: 100Gi
---
# RAG service that combines retrieval + domain LLM
apiVersion: apps/v1
kind: Deployment
metadata:
  name: legal-rag-service
spec:
  template:
    spec:
      containers:
        - name: rag
          image: myorg/rag-service:v1.0
          env:
            - name: LLM_URL
              value: "http://legal-llm:8000/v1"
            - name: VECTOR_DB_URL
              value: "http://domain-vectordb:6333"
            - name: EMBEDDING_MODEL
              value: "BAAI/bge-large-en-v1.5"
          ports:
            - containerPort: 8080

Multi-Domain Router

Serve multiple domain models and route based on query:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-router
spec:
  template:
    spec:
      containers:
        - name: router
          image: myorg/model-router:v1.0
          env:
            - name: LEGAL_MODEL_URL
              value: "http://legal-llm:8000/v1"
            - name: FINANCE_MODEL_URL
              value: "http://finance-llm:8000/v1"
            - name: HEALTH_MODEL_URL
              value: "http://health-llm:8000/v1"
            - name: GENERAL_MODEL_URL
              value: "http://general-llm:8000/v1"
            - name: CLASSIFIER_MODEL
              value: "domain-classifier-v1"
          ports:
            - containerPort: 8080

Domain Model Examples

DomainBase ModelFine-Tune DataGPU NeedsUse Case
LegalLlama 3.1 8BContract corpus, case law1Γ— A100Contract review, clause extraction
HealthcareLlama 3.1 8BClinical notes, PubMed1Γ— A100Medical coding, diagnosis assist
FinanceLlama 3.1 8BSEC filings, earnings calls1Γ— A100Risk analysis, compliance
CodeCodeLlama 13BInternal codebase1Γ— A100Code completion, review
Customer SupportLlama 3.1 8BSupport tickets, KB articles1Γ— A100Auto-response, ticket routing

Common Issues

IssueCauseFix
Fine-tuned model worse than baseBad training data or overfittingClean data, reduce epochs, increase `lora-r`
Model hallucinating domain factsNo RAG, relying on memorizationAdd RAG pipeline with verified documents
High GPU cost for multiple domainsEach domain needs dedicated GPUUse LoRA adapter switching (single base model)
Slow LoRA trainingLarge dataset on slow storageUse SSD/NVMe PVCs, reduce dataset size
Model doesn’t follow formatFine-tune data lacks instruction formatUse instruction-tuned base model + formatted data

Best Practices

  • Start with RAG before fine-tuning β€” often sufficient and much cheaper
  • Use LoRA, not full fine-tuning β€” trains in hours on 1 GPU vs days on many
  • Validate with domain experts β€” automated metrics miss domain-specific errors
  • Version your models β€” tag images and PVCs with model version + training date
  • A/B test against general models β€” prove the domain model is actually better
  • Use 7B-8B base models β€” sweet spot for cost vs capability for most domains

Key Takeaways

  • DSLMs outperform general LLMs in their specialty while being 10-100Γ— cheaper to run
  • LoRA fine-tuning trains domain models on a single GPU in hours
  • Model-free NIM (`nim-llm:2.0.2`) serves any custom model without rebuilding containers
  • RAG + domain model > general model for knowledge-intensive tasks
  • Multi-domain routing lets one cluster serve legal, finance, health, and code models
  • 2026 trend: enterprises building domain model portfolios, not relying on one general LLM
#domain-specific-llm #fine-tuning #lora #rag #nim
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens