Agentic AI and Multi-Agent Systems
Deploy autonomous AI agents and multi-agent orchestration on Kubernetes. LangGraph, CrewAI, AutoGen, tool-calling agents, agent-to-agent communication.
π‘ Quick Answer: Agentic AI systems run autonomous task-executing agents that call tools, make decisions, and coordinate with other agents. On Kubernetes, deploy agents as microservices with inference backends (NIM, vLLM), use NATS or Redis for agent-to-agent messaging, and scale with HPA based on queue depth. Frameworks: LangGraph, CrewAI, AutoGen.
The Problem
2026βs biggest AI shift is from chat-based AI to autonomous agents that execute multi-step workflows β researching, writing code, calling APIs, and delegating subtasks to other agents. Running these on Kubernetes requires: inference backends for LLM reasoning, tool execution environments, inter-agent communication, persistent memory, and orchestration for complex multi-agent workflows.
flowchart TB
USER["User Task:<br/>'Analyze competitor pricing'"] --> ORCH["Orchestrator Agent"]
ORCH --> RESEARCH["Research Agent<br/>(web search tools)"]
ORCH --> ANALYST["Analyst Agent<br/>(data processing)"]
ORCH --> WRITER["Writer Agent<br/>(report generation)"]
RESEARCH -->|"findings"| ANALYST
ANALYST -->|"analysis"| WRITER
WRITER -->|"report"| USER
subgraph INFRA["Kubernetes Infrastructure"]
LLM["LLM Backend<br/>(NIM / vLLM)"]
TOOLS["Tool Server<br/>(MCP / API)"]
MEM["Vector Store<br/>(Qdrant / Weaviate)"]
MSG["Message Bus<br/>(NATS / Redis)"]
end
ORCH & RESEARCH & ANALYST & WRITER --> LLM & TOOLS & MEM & MSGThe Solution
Architecture Overview
Each agent runs as a Kubernetes Deployment with access to shared infrastructure:
# Agent Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: research-agent
labels:
app: research-agent
agent-role: researcher
spec:
replicas: 2
selector:
matchLabels:
app: research-agent
template:
metadata:
labels:
app: research-agent
spec:
containers:
- name: agent
image: myorg/research-agent:v1.0
env:
- name: LLM_BASE_URL
value: "http://nim-llm:8000/v1"
- name: NATS_URL
value: "nats://nats:4222"
- name: VECTOR_STORE_URL
value: "http://qdrant:6333"
- name: AGENT_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: research-agent
spec:
selector:
app: research-agent
ports:
- port: 8080LangGraph Multi-Agent on Kubernetes
# agent_app.py β deployed as container
from langgraph.graph import StateGraph, MessagesState
from langchain_openai import ChatOpenAI
import os
llm = ChatOpenAI(
base_url=os.environ["LLM_BASE_URL"],
api_key="not-needed", # Local NIM
model="meta/llama-3.1-70b-instruct",
)
# Define agent nodes
def researcher(state: MessagesState):
"""Research agent with web search tools"""
response = llm.bind_tools([web_search, url_fetch]).invoke(state["messages"])
return {"messages": [response]}
def analyst(state: MessagesState):
"""Analysis agent with data tools"""
response = llm.bind_tools([pandas_query, chart_gen]).invoke(state["messages"])
return {"messages": [response]}
def writer(state: MessagesState):
"""Report writing agent"""
response = llm.invoke(state["messages"])
return {"messages": [response]}
# Build the graph
graph = StateGraph(MessagesState)
graph.add_node("researcher", researcher)
graph.add_node("analyst", analyst)
graph.add_node("writer", writer)
graph.add_edge("researcher", "analyst")
graph.add_edge("analyst", "writer")
app = graph.compile()CrewAI Deployment
# CrewAI agent crew as a Kubernetes Job
apiVersion: batch/v1
kind: Job
metadata:
name: competitor-analysis-crew
spec:
template:
spec:
containers:
- name: crew
image: myorg/crewai-competitor-analysis:v1.0
env:
- name: OPENAI_API_BASE
value: "http://nim-llm:8000/v1"
- name: OPENAI_API_KEY
value: "local"
- name: SERPER_API_KEY
valueFrom:
secretKeyRef:
name: api-keys
key: serper
resources:
requests:
cpu: "1"
memory: "2Gi"
restartPolicy: Never
backoffLimit: 2Inter-Agent Communication with NATS
# NATS message bus for agent-to-agent communication
apiVersion: apps/v1
kind: Deployment
metadata:
name: nats
spec:
replicas: 3
template:
spec:
containers:
- name: nats
image: nats:2.10
ports:
- containerPort: 4222
args: ["--cluster", "nats://0.0.0.0:6222", "--routes", "nats://nats:6222"]
---
# Agents publish/subscribe to task channels
# research-agent publishes to: agents.research.results
# analyst-agent subscribes to: agents.research.results
# analyst-agent publishes to: agents.analysis.resultsScaling Agents with HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: research-agent-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: research-agent
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: nats_pending_messages
selector:
matchLabels:
subject: agents.research.tasks
target:
type: AverageValue
averageValue: "5" # Scale up when >5 pending tasks per replicaAgent Memory with Vector Store
# Qdrant for agent long-term memory
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qdrant
spec:
replicas: 1
template:
spec:
containers:
- name: qdrant
image: qdrant/qdrant:v1.12.0
ports:
- containerPort: 6333
volumeMounts:
- name: data
mountPath: /qdrant/storage
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50GiTool Execution Sandbox
Agents need sandboxed environments to execute code and tools safely:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tool-executor
spec:
template:
spec:
containers:
- name: executor
image: myorg/tool-executor:v1.0
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
limits:
cpu: "2"
memory: "4Gi"
# Use gVisor for additional isolation
runtimeClassName: gvisorComplete Multi-Agent Stack
# Deploy the full stack
kubectl apply -f - <<EOF
# 1. LLM Backend (NIM)
# 2. NATS (inter-agent messaging)
# 3. Qdrant (agent memory)
# 4. Tool Server (MCP-compatible)
# 5. Orchestrator Agent
# 6. Worker Agents (research, analyst, writer, coder)
# 7. HPA for each agent role
# 8. NetworkPolicy (agents β LLM, agents β NATS, agents β tools)
EOFCommon Issues
| Issue | Cause | Fix |
|---|---|---|
| Agent loops forever | No max iteration limit | Set `max_iterations=10` in agent config |
| LLM timeout on complex tasks | Long chain-of-thought | Increase `request_timeout`, use streaming |
| Agent memory growing unbounded | Conversation history too long | Use summarization or sliding window |
| Inter-agent deadlock | Circular dependencies | Design DAG workflows, add timeouts |
| Tool execution OOM | Agent-generated code is resource-hungry | Use gVisor + resource limits on tool executor |
| Cold start latency | Agent pods scaling from 0 | Use KEDA with `minReplicaCount: 1` for critical agents |
Best Practices
- Separate agents from LLM backends β agents are stateless, LLMs need GPUs
- Use NATS/Redis for async communication β donβt couple agents via HTTP
- Sandbox tool execution β agents can generate arbitrary code; use gVisor/Kata
- Set iteration limits β prevent runaway agent loops from burning tokens
- Persist agent memory β vector stores for long-term, Redis for short-term
- Monitor token usage β agentic workflows consume 10-100Γ more tokens than chat
- Use structured outputs β JSON mode for agent-to-agent data exchange
Key Takeaways
- Agentic AI runs autonomous agents that plan, use tools, and coordinate
- Deploy agents as microservices on Kubernetes with shared LLM backends
- NATS or Redis enables async agent-to-agent communication
- Scale agents independently with HPA based on queue depth
- Sandbox tool execution with gVisor or Kata Containers
- 2026βs top trend: AI moving from chat to autonomous workflow execution

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
