Hermes Agent Self-Hosted AI on Kubernetes
Deploy Hermes Agent (Nous Research) on Kubernetes as a persistent self-hosted AI agent with memory, automated skill creation, multi-platform
π‘ Quick Answer: Hermes Agent (by Nous Research, MIT license) is a persistent self-hosted AI agent that learns over time, creates its own skills, connects to 5+ chat platforms (Telegram, Discord, Slack, WhatsApp, Signal), runs scheduled automations, and spawns parallel sub-agents. Deploy on Kubernetes with local vLLM or OpenRouter for model inference, persistent storage for memory, and container hardening for security.
The Problem
- Chatbots forget everything between sessions β no persistent context
- AI assistants locked to one platform (IDE only, or chat only)
- Need autonomous background tasks (reports, audits, monitoring) not just Q&A
- Commercial solutions send your data to external servers
- Want an agent that gets smarter over time as it learns your projects and preferences
The Solution
Hermes Agent Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Kubernetes Cluster β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Hermes Agent Pod β β
β β β β
β β ββββββββββββββββ βββββββββββββββββ ββββββββββββββ β β
β β β Gateway β β Memory Store β β Skill β β β
β β β (messaging) β β (~/.hermes/) β β Engine β β β
β β β β β β β (40+ built β β β
β β β β’ Telegram β β β’ Preferences β β in + auto β β β
β β β β’ Discord β β β’ Projects β β created) β β β
β β β β’ Slack β β β’ Context β β β β β
β β β β’ WhatsApp β β β’ History β β β β β
β β β β’ Signal β β β β β β β
β β β β’ CLI β β β β β β β
β β ββββββββββββββββ βββββββββββββββββ ββββββββββββββ β β
β β β β
β β ββββββββββββββββ βββββββββββββββββ ββββββββββββββ β β
β β β Cron β β Sub-Agents β β Browser β β β
β β β Scheduler β β (parallel) β β Automation β β β
β β ββββββββββββββββ βββββββββββββββββ ββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ β
β β Model Backend (choose one) β β
β β β’ Local vLLM (fully on-prem) β β
β β β’ OpenRouter (200+ models) β β
β β β’ Any OpenAI-compatible API β β
β β β’ Nous Portal (native OAuth) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββDeploy on Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: hermes-agent
namespace: hermes
labels:
app: hermes-agent
spec:
replicas: 1 # Single instance (stateful agent)
strategy:
type: Recreate # Don't run two instances
selector:
matchLabels:
app: hermes-agent
template:
metadata:
labels:
app: hermes-agent
spec:
containers:
- name: hermes
image: ghcr.io/nous-research/hermes-agent:latest
ports:
- containerPort: 8080
name: api
env:
# Model backend
- name: HERMES_MODEL_PROVIDER
value: "vllm" # or: openrouter, openai-compatible
- name: HERMES_MODEL_ENDPOINT
value: "http://vllm-server.ai:8000/v1"
- name: HERMES_MODEL
value: "NousResearch/Hermes-3-Llama-3.1-70B"
# Messaging platforms
- name: HERMES_TELEGRAM_TOKEN
valueFrom:
secretKeyRef:
name: hermes-secrets
key: telegram-token
- name: HERMES_DISCORD_TOKEN
valueFrom:
secretKeyRef:
name: hermes-secrets
key: discord-token
- name: HERMES_SLACK_TOKEN
valueFrom:
secretKeyRef:
name: hermes-secrets
key: slack-token
# Security
- name: HERMES_TELEMETRY
value: "disabled"
- name: HERMES_EXECUTION_MODE
value: "docker" # Sandboxed execution
volumeMounts:
- name: hermes-data
mountPath: /home/hermes/.hermes
- name: skills
mountPath: /home/hermes/.hermes/skills
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumes:
- name: hermes-data
persistentVolumeClaim:
claimName: hermes-memory-pvc
- name: skills
persistentVolumeClaim:
claimName: hermes-skills-pvc
---
# Persistent storage for agent memory
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hermes-memory-pvc
namespace: hermes
spec:
accessModes: [ReadWriteOnce]
storageClassName: fast-ssd
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hermes-skills-pvc
namespace: hermes
spec:
accessModes: [ReadWriteOnce]
storageClassName: fast-ssd
resources:
requests:
storage: 5GiModel Backend: Local vLLM
# Self-hosted inference with vLLM (fully on-prem, zero data leakage)
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-hermes
namespace: hermes
spec:
replicas: 1
selector:
matchLabels:
app: vllm-hermes
template:
metadata:
labels:
app: vllm-hermes
spec:
nodeSelector:
nvidia.com/gpu.present: "true"
containers:
- name: vllm
image: vllm/vllm-openai:latest
args:
- --model=NousResearch/Hermes-3-Llama-3.1-70B
- --tensor-parallel-size=4
- --max-model-len=32768
- --enable-auto-tool-choice
- --tool-call-parser=hermes
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: "4"
memory: "320Gi"
volumeMounts:
- name: model-cache
mountPath: /root/.cache/huggingface
volumes:
- name: model-cache
persistentVolumeClaim:
claimName: model-cache-pvc
---
apiVersion: v1
kind: Service
metadata:
name: vllm-server
namespace: hermes
spec:
selector:
app: vllm-hermes
ports:
- port: 8000
targetPort: 8000Key Features Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: hermes-config
namespace: hermes
data:
config.yaml: |
# Persistent Memory
memory:
enabled: true
path: /home/hermes/.hermes/memory
# Agent remembers preferences, projects, environment
# Gets smarter the longer it runs
# Automated Skill Creation
skills:
auto_create: true
# When Hermes solves a hard problem, it writes a reusable
# SKILL.md so it never forgets how
format: "agentskills.io" # Open standard
community_hub: "agentskills.io" # Browse and install skills
# Multi-Platform Messaging
gateway:
platforms:
- telegram
- discord
- slack
- whatsapp
- signal
- cli
# Voice memo transcription supported
# Cross-platform conversation continuation
# Scheduled Automations
cron:
enabled: true
jobs:
- name: daily-report
schedule: "0 9 * * *"
task: "Generate daily project status report"
deliver_to: telegram
- name: weekly-audit
schedule: "0 10 * * 1"
task: "Audit infrastructure and report findings"
deliver_to: slack
- name: morning-briefing
schedule: "0 7 * * 1-5"
task: "Check email, calendar, news. Brief me."
deliver_to: telegram
# Parallel Sub-Agents
subagents:
enabled: true
max_concurrent: 5
# Each gets own conversation and terminal
# Zero-context-cost turns via RPC
# Execution Backends
execution:
local_terminal: true
docker:
enabled: true
security:
read_only_root: true
drop_capabilities: all
pid_limit: 100
ssh:
enabled: false # Enable for remote servers
modal:
enabled: false # Enable for cloud/HPC
# Browser Automation
browser:
enabled: true
# Web search, page extraction, screenshots
# Full navigation, click, type automationSecurity Hardening
# NetworkPolicy β Hermes only reaches model backend and messaging APIs
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: hermes-egress
namespace: hermes
spec:
podSelector:
matchLabels:
app: hermes-agent
policyTypes:
- Egress
egress:
# DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
# Local vLLM model
- to:
- podSelector:
matchLabels:
app: vllm-hermes
ports:
- protocol: TCP
port: 8000
# Messaging platform APIs (Telegram, Discord, Slack)
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- protocol: TCP
port: 443# Pod Security β hardened container
apiVersion: v1
kind: Namespace
metadata:
name: hermes
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restrictedComparison: Hermes vs Other Agent Runtimes
Feature Hermes Agent OpenShell Poolside
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
License MIT Apache-2.0 Commercial
Persistent Memory β
β β
Multi-Platform Chat β
(5+) β β
Auto Skill Creation β
β β
Scheduled Tasks β
(cron) β β
Sandboxed Exec β
(Docker) β
(proxy) β
(pods)
Browser Control β
β β
Sub-Agents β
β β
GPU Inference β
(vLLM) β β
Network Policies Basic Fine-grained Fine-grained
Enterprise Focus Personal/Team Developer Enterprise
Zero Telemetry β
β
ConfigurableCommon Issues
Memory grows unbounded over months
- Cause: Agent accumulates context without pruning
- Fix: Configure memory retention policies; or periodically review
~/.hermes/memory/
Sub-agent spawning fails
- Cause: Docker socket not available or PID limits hit
- Fix: Mount Docker socket or use Kubernetes job-based sub-agents; increase
pid_limit
Voice memo transcription not working
- Cause: Whisper model not loaded or audio codec unsupported
- Fix: Ensure whisper dependency available; check audio format (opus/ogg supported)
Agent loses context after pod restart
- Cause: Memory not persisted (emptyDir instead of PVC)
- Fix: Use PersistentVolumeClaim for
~/.hermes/directory
Best Practices
- Use PVCs for memory β the agentβs value comes from persistent learning
- Local vLLM for privacy β zero data leaves your cluster
- Start with one platform β add more as you trust the agentβs behavior
- Review auto-created skills β agent writes skills it can reuse; audit them
- Cron for recurring tasks β donβt re-ask; schedule reports and audits
- Container hardening β read-only root, drop all capabilities, non-root user
- Backup memory PVC β this is the agentβs brain; losing it resets learning
Key Takeaways
- Hermes Agent: open-source (MIT), self-hosted persistent AI agent by Nous Research
- Gets smarter over time β remembers preferences, projects, environment across sessions
- Auto-creates reusable skills (agentskills.io open standard) when solving hard problems
- Multi-platform: Telegram, Discord, Slack, WhatsApp, Signal, CLI β single gateway
- Built-in cron scheduler for autonomous background tasks
- Parallel sub-agents for concurrent workstreams
- Model-agnostic: local vLLM, OpenRouter (200+ models), any OpenAI-compatible API
- Zero telemetry, all data stored locally, MIT license, fully auditable
- Deploy on K8s with PVC for memory persistence + container security hardening

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Master ML lifecycle management with MLflow on Kubernetes β tracking, registry, and deployment.
Start Learning βAutomate Kubernetes node configuration and cluster bootstrapping with Ansible.
Start Learning βCourses by CopyPasteLearn.com β Learn IT by Doing
