πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
ai intermediate ⏱ 15 minutes K8s 1.28+

Kubernetes for AI Search and Discoverability

Deploy AI-searchable services on Kubernetes: llms.txt implementation, RAG-optimized APIs, structured data for AI chatbots, and infrastructure patterns

By Luca Berton β€’ β€’ πŸ“– 6 min read

πŸ’‘ Quick Answer: With 24% of all shoppers (48% Gen Z) using AI chatbots for product research in 2025, Kubernetes services need AI discoverability: implement llms.txt, serve structured RAG-friendly content, deploy embedding APIs, and expose machine-readable endpoints that AI agents can consume.

The Problem

AI is replacing traditional search:

  • 24% of global shoppers use AI chatbots to research before purchasing (2025)
  • 48% of Gen Z use AI chatbots instead of Google for product research
  • Traditional SEO alone is insufficient β€” AI agents parse differently than crawlers
  • Services need to be discoverable by LLMs, not just search engines
  • RAG pipelines need structured, chunked content from your APIs

The Solution

Implement llms.txt for AI Discoverability

# ConfigMap with llms.txt content
apiVersion: v1
kind: ConfigMap
metadata:
  name: llms-txt
  namespace: production
data:
  llms.txt: |
    # My Service
    > Brief description of what this service does

    ## Docs
    - [API Reference](https://api.example.com/docs): Full API documentation
    - [Getting Started](https://example.com/quickstart): Quick start guide
    - [Pricing](https://example.com/pricing): Plans and pricing

    ## Features
    - Real-time data processing
    - REST and GraphQL APIs
    - Kubernetes-native deployment

  llms-full.txt: |
    # My Service - Complete Documentation
    
    ## Overview
    Full service documentation optimized for LLM consumption...
    (Complete, detailed content here)
---
# Serve via Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: llms-txt-ingress
  annotations:
    nginx.ingress.kubernetes.io/configuration-snippet: |
      location = /llms.txt {
        alias /etc/llms/llms.txt;
        default_type text/plain;
      }
      location = /llms-full.txt {
        alias /etc/llms/llms-full.txt;
        default_type text/plain;
      }

RAG-Optimized API Endpoints

# Deploy a content API designed for AI consumption
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-content-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-content-api
  template:
    spec:
      containers:
        - name: api
          image: registry.example.com/ai-content-api:2.0
          env:
            - name: CHUNK_SIZE
              value: "512"
            - name: OVERLAP
              value: "50"
            - name: INCLUDE_METADATA
              value: "true"
            - name: EMBEDDING_MODEL
              value: "bge-small-en-v1.5"
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
---
# Service exposing RAG-friendly endpoints
apiVersion: v1
kind: Service
metadata:
  name: ai-content-api
  namespace: production
spec:
  selector:
    app: ai-content-api
  ports:
    - port: 80
      targetPort: 8080
# RAG-friendly API design patterns:

GET /api/v1/content?format=chunks
β†’ Returns pre-chunked content with metadata (title, section, url)

GET /api/v1/content/search?q=kubernetes+scaling&format=context
β†’ Returns relevant passages optimized for LLM context windows

GET /api/v1/content/embeddings
β†’ Returns pre-computed embeddings for all content

GET /.well-known/ai-plugin.json
β†’ OpenAI plugin manifest for ChatGPT/agent discovery

GET /llms.txt
β†’ Concise service description for LLM crawlers

GET /sitemap-ai.xml
β†’ AI-specific sitemap with content priorities
apiVersion: apps/v1
kind: Deployment
metadata:
  name: embedding-service
  namespace: ai-serving
spec:
  replicas: 3
  selector:
    matchLabels:
      app: embedding-service
  template:
    spec:
      containers:
        - name: embeddings
          image: registry.example.com/embedding-server:1.0
          args:
            - "--model=BAAI/bge-large-en-v1.5"
            - "--port=8080"
            - "--max-batch-size=64"
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 2000m
              memory: 4Gi
              nvidia.com/gpu: "1"
            limits:
              nvidia.com/gpu: "1"
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
---
# Vector database for semantic retrieval
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: vector-db
  namespace: ai-serving
spec:
  replicas: 3
  selector:
    matchLabels:
      app: vector-db
  template:
    spec:
      containers:
        - name: qdrant
          image: qdrant/qdrant:v1.12
          ports:
            - containerPort: 6333
            - containerPort: 6334
          volumeMounts:
            - name: data
              mountPath: /qdrant/storage
          resources:
            requests:
              cpu: 1000m
              memory: 4Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi

AI Agent Gateway

# Gateway for AI agents to discover and interact with services
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-agent-gateway
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-agent-gateway
  template:
    spec:
      containers:
        - name: gateway
          image: registry.example.com/ai-gateway:1.5
          env:
            - name: RATE_LIMIT_PER_AGENT
              value: "100"    # req/min per AI agent
            - name: REQUIRE_AGENT_ID
              value: "true"
            - name: STRUCTURED_OUTPUT
              value: "true"
            - name: ALLOWED_AGENTS
              value: "chatgpt,perplexity,claude,gemini"
          ports:
            - containerPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ai-agent-gateway
  annotations:
    nginx.ingress.kubernetes.io/rate-limit-rpm: "1000"
spec:
  rules:
    - host: ai.example.com
      http:
        paths:
          - path: /v1
            pathType: Prefix
            backend:
              service:
                name: ai-agent-gateway
                port:
                  number: 80

Structured Data for AI Consumption

# CronJob to generate and update AI-optimized structured data
apiVersion: batch/v1
kind: CronJob
metadata:
  name: generate-ai-content
  namespace: production
spec:
  schedule: "0 */6 * * *"    # Every 6 hours
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: generator
              image: registry.example.com/ai-content-gen:1.0
              command:
                - /bin/sh
                - -c
                - |
                  # Generate llms.txt from current content
                  python3 generate_llms_txt.py \
                    --source=/content \
                    --output=/static/llms.txt
                  
                  # Generate chunked content index
                  python3 generate_chunks.py \
                    --chunk-size=512 \
                    --overlap=50 \
                    --output=/static/content-chunks.jsonl
                  
                  # Update vector embeddings
                  python3 update_embeddings.py \
                    --input=/static/content-chunks.jsonl \
                    --qdrant-url=http://vector-db:6333
                  
                  # Generate AI sitemap
                  python3 generate_ai_sitemap.py \
                    --output=/static/sitemap-ai.xml
              volumeMounts:
                - name: content
                  mountPath: /content
                - name: static
                  mountPath: /static
          volumes:
            - name: content
              persistentVolumeClaim:
                claimName: content-pvc
            - name: static
              persistentVolumeClaim:
                claimName: static-pvc
          restartPolicy: OnFailure

robots.txt for AI Crawlers

# robots.txt - Allow AI crawlers
User-agent: GPTBot
Allow: /
Crawl-delay: 1

User-agent: ChatGPT-User
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-ai.xml

Common Issues

AI agents can’t parse dynamic JavaScript content

  • Cause: SPAs render client-side; AI crawlers need static HTML
  • Fix: Server-side rendering (SSR) or static site generation; serve plain text at /llms.txt

Rate limiting blocks legitimate AI agents

  • Cause: AI agents make many rapid requests during indexing
  • Fix: Higher rate limits for known AI user-agents; separate endpoint for bulk access

Embedding drift after content updates

  • Cause: Content changed but vector store still has old embeddings
  • Fix: CronJob re-indexes; webhook triggers on content change; TTL on vectors

Best Practices

  1. Implement llms.txt β€” standard for AI discoverability (llmstxt.org)
  2. Serve structured content β€” JSON-LD, chunked APIs, metadata-rich responses
  3. Allow AI crawlers in robots.txt β€” don’t block GPTBot, ClaudeBot, etc.
  4. Pre-chunk content β€” 512 tokens with 50-token overlap for RAG
  5. Maintain embeddings β€” keep vector store synchronized with content
  6. Rate limit per agent β€” protect infrastructure while enabling access
  7. Monitor AI referrals β€” track which AI agents drive traffic/conversions

Key Takeaways

  • 24% of shoppers (48% Gen Z) use AI chatbots for product research β€” trend accelerating
  • llms.txt is the robots.txt equivalent for AI discoverability
  • RAG-friendly APIs serve pre-chunked, metadata-rich content
  • Vector databases (Qdrant, Weaviate) enable semantic search over your content
  • AI agent gateways manage access, rate limiting, and structured responses
  • CronJobs keep embeddings and AI content indexes synchronized
  • Allow AI crawlers in robots.txt β€” blocking them reduces your AI visibility
  • Dual strategy: traditional SEO + AI discoverability (llms.txt, structured data, embeddings)
#ai-search #llms-txt #rag #api-design #seo
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens