What is the quick answer for Deploying Vector Databases on Kubernetes?

Deploy and operate vector databases (Milvus, Weaviate, Qdrant) on Kubernetes for RAG pipelines, semantic search, and AI applications with persistent.

Deploying Vector Databases on Kubernetes

💡 Quick Answer: Deploy Qdrant (simplest): helm install qdrant qdrant/qdrant. Deploy Milvus (most scalable): helm install milvus milvus/milvus --set cluster.enabled=true. Deploy Weaviate (most features): helm install weaviate weaviate/weaviate. All three support persistent storage via PVCs, horizontal scaling, and provide REST/gRPC APIs for embedding storage and similarity search.
Key concept: Vector databases store high-dimensional embeddings and perform approximate nearest neighbor (ANN) search, enabling RAG, semantic search, and recommendation systems.
Gotcha: Vector databases are memory-intensive—each vector (1536 dims, float32) uses ~6KB. 1M vectors ≈ 6GB RAM. Plan capacity accordingly.

The Problem

AI applications need to store and search through millions of embedding vectors:

RAG pipelines require fast retrieval of relevant document chunks
Semantic search needs sub-100ms similarity queries across millions of vectors
Traditional databases can’t efficiently perform approximate nearest neighbor search
Scaling vector search while maintaining low latency is complex

The Solution

Deploy purpose-built vector databases on Kubernetes that provide persistent storage, horizontal scaling, and high-performance ANN search.

Architecture Overview

flowchart TB
    subgraph cluster["☸️ KUBERNETES CLUSTER"]
        subgraph app["Application Layer"]
            RAG["RAG Pipeline"]
            SEARCH["Semantic Search"]
            REC["Recommendations"]
        end
        subgraph vectordb["Vector Database"]
            Q1["Qdrant Node 1"]
            Q2["Qdrant Node 2"]
            Q3["Qdrant Node 3"]
        end
        subgraph storage["Storage"]
            PVC1["PVC 1<br/>Vectors"]
            PVC2["PVC 2<br/>Vectors"]
            PVC3["PVC 3<br/>Vectors"]
        end
        subgraph embed["Embedding Service"]
            EMB["Text Embedding<br/>Model"]
        end
    end
    
    RAG & SEARCH & REC --> vectordb
    Q1 --> PVC1
    Q2 --> PVC2
    Q3 --> PVC3
    RAG --> EMB

Option 1: Qdrant (Simplest, Rust-based)

helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm repo update

helm install qdrant qdrant/qdrant \
  -n vector-db \
  --create-namespace \
  --set replicaCount=3 \
  --set persistence.size=50Gi \
  --set persistence.storageClassName=fast-ssd \
  --set resources.requests.memory=4Gi \
  --set resources.limits.memory=8Gi

# qdrant-custom-values.yaml
replicaCount: 3
persistence:
  size: 100Gi
  storageClassName: fast-ssd
resources:
  requests:
    cpu: "2"
    memory: 8Gi
  limits:
    cpu: "4"
    memory: 16Gi
config:
  cluster:
    enabled: true
    p2p:
      port: 6335
  storage:
    optimizers:
      default_segment_number: 4
      indexing_threshold: 20000
    performance:
      max_search_threads: 0  # Auto
  service:
    grpc_port: 6334
    http_port: 6333

Option 2: Milvus (Most Scalable)

helm repo add milvus https://zilliztech.github.io/milvus-helm
helm repo update

helm install milvus milvus/milvus \
  -n vector-db \
  --create-namespace \
  -f milvus-values.yaml

# milvus-values.yaml
cluster:
  enabled: true
etcd:
  replicaCount: 3
  persistence:
    size: 10Gi
minio:
  mode: distributed
  replicas: 4
  persistence:
    size: 100Gi
pulsar:
  enabled: true
  broker:
    replicaCount: 2
queryNode:
  replicas: 3
  resources:
    requests:
      cpu: "2"
      memory: 8Gi
    limits:
      cpu: "4"
      memory: 16Gi
indexNode:
  replicas: 2
  resources:
    requests:
      cpu: "4"
      memory: 16Gi
dataNode:
  replicas: 2
proxy:
  replicas: 2

Option 3: Weaviate (Most Features)

helm repo add weaviate https://weaviate.github.io/weaviate-helm
helm repo update

helm install weaviate weaviate/weaviate \
  -n vector-db \
  --create-namespace \
  -f weaviate-values.yaml

# weaviate-values.yaml
replicas: 3
storage:
  size: 100Gi
  storageClassName: fast-ssd
resources:
  requests:
    cpu: "2"
    memory: 8Gi
  limits:
    cpu: "4"
    memory: 16Gi
modules:
  text2vec-transformers:
    enabled: true
  generative-openai:
    enabled: true
env:
  QUERY_DEFAULTS_LIMIT: 25
  AUTHENTICATION_APIKEY_ENABLED: true
  AUTHENTICATION_APIKEY_ALLOWED_KEYS: "your-api-key"

Testing: Insert and Query Vectors

# test_qdrant.py - Example with Qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(host="qdrant.vector-db.svc.cluster.local", port=6333)

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Insert vectors
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(id=1, vector=[0.1]*1536, payload={"text": "Kubernetes networking"}),
        PointStruct(id=2, vector=[0.2]*1536, payload={"text": "Pod security"}),
    ]
)

# Search
results = client.search(
    collection_name="documents",
    query_vector=[0.15]*1536,
    limit=5,
)

Comparison

Feature	Qdrant	Milvus	Weaviate
Language	Rust	Go	Go
Protocol	REST/gRPC	gRPC	REST/GraphQL
Filtering	Rich	Rich	Rich
Scalability	Good	Excellent	Good
Complexity	Low	High	Medium
GPU Support	❌	✅ (indexing)	❌
Built-in Embeddings	❌	❌	✅

Common Issues

Issue 1: OOM on large collections

# Increase memory limits and enable memory-mapped storage
# Qdrant: enable mmap
config:
  storage:
    storage_type: mmap   # Keeps vectors on disk, not RAM

# Milvus: use disk index
# Set index type to DISKANN for large collections

Issue 2: Slow query performance

# Ensure proper index is built
# Qdrant: indexes build automatically after threshold
# Milvus: manually create index
# pymilvus: collection.create_index(field_name="vector", index_params={"index_type": "IVF_SQ8", "nlist": 1024})

# Check if collection is loaded in memory
# Milvus: collection.load()

Best Practices

Size memory for your dataset — Calculate: vectors × dimensions × 4 bytes × 1.5 (overhead)
Use SSD-backed PVCs — Vector search is I/O intensive with memory-mapped indexes
Enable replication — Run 3+ replicas for high availability
Choose the right index — HNSW for <10M vectors, IVF/DiskANN for larger datasets
Batch inserts — Insert in batches of 1000+ vectors for best throughput
Monitor memory usage — Vector DBs are memory-hungry; set alerts at 80% usage

Key Takeaways

Qdrant is the simplest to deploy and operate, ideal for small-medium scale
Milvus offers the best scalability for billions of vectors with distributed architecture
Weaviate provides the most features including built-in embedding modules
All three support persistent storage, replication, and horizontal scaling on Kubernetes
Memory planning is critical—underprovisioning leads to OOM or degraded performance
PVCs with SSD storage are essential for production vector database deployments