Poolside AI Foundation Models on Kubernetes
Deploy Poolside AI foundation models for enterprise software agents on Kubernetes. On-prem and VPC deployment, multi-agent orchestration, sandboxed
π‘ Quick Answer: Poolside AI builds foundation models optimized for long-horizon software engineering tasks β code generation, multi-agent orchestration, and tool use in sandboxed environments. Deploy on-prem or in your VPC on Kubernetes with strict data boundaries, RBAC for both humans and agents, and air-gap support for defense/regulated use cases.
The Problem
- General-purpose LLMs lack deep software engineering capabilities (planning, tool use, multi-step execution)
- Enterprise code never leaves the security boundary β cloud-only AI is a non-starter
- Multi-agent systems need orchestration, policy governance, and end-to-end tracing
- Air-gapped and multi-cloud environments require flexible deployment without cloud dependencies
- Need measurable outcomes (code quality, velocity) not just token generation
The Solution
Poolside Platform Architecture on Kubernetes
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Enterprise Kubernetes Cluster (On-Prem / VPC) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Poolside Control Plane β β
β β ββββββββββββββββ βββββββββββββββββ βββββββββββββββββββββ β β
β β β Agent β β Policy Engine β β Trace Collector β β β
β β β Orchestrator β β (governance) β β (audit trail) β β β
β β ββββββββ¬ββββββββ βββββββββ¬ββββββββ βββββββββββββββββββββ β β
β ββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββββ β
β β β β
β ββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββ β
β β Foundation Model Serving (GPU Nodes) β β
β β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββ β
β β β Poolside FM β β Code Agent β β Sandboxed Execution ββ β
β β β (inference) β β (planning) β β (tool use, terminal) ββ β
β β β H100 Γ 8 β β β β ββ β
β β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Data & Knowledge Layer β β
β β β’ Git repositories β’ Databases β’ Private corpora β β
β β β’ Data warehouses β’ Documentation β’ API specs β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββDeploy Poolside Foundation Model
# Poolside FM inference deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: poolside-fm
namespace: poolside-ai
labels:
app: poolside-fm
component: inference
spec:
replicas: 1
selector:
matchLabels:
app: poolside-fm
template:
metadata:
labels:
app: poolside-fm
spec:
nodeSelector:
nvidia.com/gpu.product: "NVIDIA-H100-80GB-HBM3"
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
containers:
- name: poolside-inference
image: registry.example.com/poolside/foundation-model:latest
ports:
- containerPort: 8080
name: api
- containerPort: 9090
name: metrics
env:
- name: MODEL_PATH
value: "/models/poolside-fm-enterprise"
- name: TENSOR_PARALLEL_SIZE
value: "8"
- name: MAX_CONTEXT_LENGTH
value: "131072"
- name: MAX_BATCH_SIZE
value: "64"
- name: ENABLE_TOOL_USE
value: "true"
- name: SANDBOX_MODE
value: "strict"
volumeMounts:
- name: model-weights
mountPath: /models
readOnly: true
- name: shm
mountPath: /dev/shm
resources:
limits:
nvidia.com/gpu: "8"
memory: "640Gi"
requests:
cpu: "32"
memory: "640Gi"
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 120
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 180
periodSeconds: 30
volumes:
- name: model-weights
persistentVolumeClaim:
claimName: poolside-model-pvc
- name: shm
emptyDir:
medium: Memory
sizeLimit: 64Gi
---
apiVersion: v1
kind: Service
metadata:
name: poolside-fm
namespace: poolside-ai
spec:
selector:
app: poolside-fm
ports:
- name: api
port: 8080
targetPort: 8080
- name: metrics
port: 9090
targetPort: 9090Multi-Agent Orchestration
# Agent orchestrator β manages multi-agent workflows
apiVersion: apps/v1
kind: Deployment
metadata:
name: poolside-orchestrator
namespace: poolside-ai
spec:
replicas: 2
selector:
matchLabels:
app: poolside-orchestrator
template:
metadata:
labels:
app: poolside-orchestrator
spec:
containers:
- name: orchestrator
image: registry.example.com/poolside/orchestrator:latest
ports:
- containerPort: 8081
name: api
env:
- name: FM_ENDPOINT
value: "http://poolside-fm.poolside-ai:8080"
- name: POLICY_ENGINE_ENDPOINT
value: "http://poolside-policy.poolside-ai:8082"
- name: TRACE_ENDPOINT
value: "http://poolside-traces.poolside-ai:4317"
- name: MAX_AGENT_CONCURRENCY
value: "20"
- name: EXECUTION_TIMEOUT_SECONDS
value: "300"
- name: SANDBOX_RUNTIME
value: "kubernetes" # Spawn sandboxed pods for tool use
volumeMounts:
- name: agent-config
mountPath: /etc/poolside/agents
volumes:
- name: agent-config
configMap:
name: poolside-agent-policies
---
apiVersion: v1
kind: ConfigMap
metadata:
name: poolside-agent-policies
namespace: poolside-ai
data:
policies.yaml: |
agents:
code-writer:
description: "Generates and modifies source code"
tools:
- file_read
- file_write
- git_operations
- terminal_execute
sandboxed: true
max_execution_time: 120s
resource_limits:
cpu: "2"
memory: "4Gi"
allowed_paths:
- "/workspace/**"
blocked_commands:
- "rm -rf /"
- "curl * | bash"
code-reviewer:
description: "Reviews code changes for quality and security"
tools:
- file_read
- git_diff
- linter_run
- security_scan
sandboxed: true
read_only: true
planner:
description: "Decomposes tasks into subtasks for other agents"
tools:
- task_create
- agent_delegate
- progress_track
sandboxed: false
max_delegations: 10
governance:
require_approval_for:
- git_push
- deploy_to_production
- database_write
audit_all_actions: true
trace_retention_days: 90Sandboxed Execution Environment
# Poolside spawns ephemeral pods for agent tool execution
apiVersion: v1
kind: ResourceQuota
metadata:
name: poolside-sandbox-quota
namespace: poolside-sandboxes
spec:
hard:
pods: "50"
requests.cpu: "100"
requests.memory: "200Gi"
limits.cpu: "200"
limits.memory: "400Gi"
---
# NetworkPolicy: sandboxes can't reach production services
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sandbox-isolation
namespace: poolside-sandboxes
spec:
podSelector: {} # All pods in namespace
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
app: poolside-orchestrator
egress:
# Allow DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
# Allow access to internal git only
- to:
- namespaceSelector:
matchLabels:
app: gitea
ports:
- protocol: TCP
port: 3000
# Block all other egress (no internet from sandboxes)
---
# PodSecurityStandard for sandboxes
apiVersion: v1
kind: Namespace
metadata:
name: poolside-sandboxes
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restrictedData & Knowledge Connectors
# Connect Poolside to internal repositories and data sources
apiVersion: v1
kind: ConfigMap
metadata:
name: poolside-data-connectors
namespace: poolside-ai
data:
connectors.yaml: |
sources:
- name: internal-git
type: git
endpoint: "https://gitea.internal.example.com"
auth:
secretRef: git-credentials
sync_interval: 5m
repositories:
- "org/backend-services"
- "org/frontend-apps"
- "org/infrastructure"
- name: documentation
type: confluence
endpoint: "https://wiki.example.com"
auth:
secretRef: confluence-token
spaces:
- "ARCH" # Architecture decisions
- "RUNBOOKS" # Operations runbooks
- "API" # API documentation
- name: database-schema
type: postgresql
endpoint: "postgres.data.svc:5432"
auth:
secretRef: db-readonly-creds
schema_only: true # Only schema, no data
databases:
- "orders"
- "inventory"
- name: api-specs
type: openapi
sources:
- "https://api.example.com/v1/openapi.json"
- "https://payments.example.com/spec.yaml"
boundaries:
# Data never leaves these namespaces
allowed_namespaces:
- poolside-ai
- poolside-sandboxes
# PII detection and masking
pii_detection: enabled
mask_patterns:
- "\\b\\d{3}-\\d{2}-\\d{4}\\b" # SSN
- "\\b[A-Z]{2}\\d{2}[A-Z0-9]{18}\\b" # IBANDeveloper Surface Integration
# Expose Poolside API for IDE extensions and CLI tools
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: poolside-developer-api
namespace: poolside-ai
annotations:
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: poolside-api-auth
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
tls:
- hosts:
- poolside.internal.example.com
secretName: poolside-tls
rules:
- host: poolside.internal.example.com
http:
paths:
- path: /v1/
pathType: Prefix
backend:
service:
name: poolside-orchestrator
port:
number: 8081# Developer CLI usage
poolside ask "Refactor the payment service to use event sourcing"
# Agent plans β writes code β runs tests β creates PR
# IDE extension (VS Code / JetBrains)
# Connects to: poolside.internal.example.com/v1/
# Features: inline completion, chat, multi-file edit, terminal agent
# TUI for terminal-first developers
poolside tui
# Interactive terminal interface with agent delegationAir-Gapped Deployment
# For disconnected/air-gapped environments:
# 1. Mirror images to internal registry
skopeo copy \
docker://registry.poolside.ai/poolside/foundation-model:latest \
docker://registry.airgap.example.com/poolside/foundation-model:latest
# 2. Transfer model weights via removable media
# Download on connected machine:
poolside-cli model download --model enterprise-fm --output /media/transfer/
# Copy to air-gapped cluster:
kubectl cp /media/transfer/enterprise-fm poolside-ai/model-loader:/models/
# 3. No external network dependencies
# - All inference runs locally
# - Knowledge connectors point to internal services only
# - No telemetry sent externally
# - License validated via offline keyMonitoring and Governance
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: poolside-metrics
namespace: poolside-ai
spec:
selector:
matchLabels:
app: poolside-fm
endpoints:
- port: metrics
interval: 15s
---
# Key metrics to monitor:
# poolside_inference_requests_total β total API calls
# poolside_tokens_generated_total β output tokens
# poolside_agent_tasks_completed_total β successful agent executions
# poolside_agent_tasks_failed_total β failed executions
# poolside_sandbox_pods_active β concurrent sandboxed executions
# poolside_policy_violations_total β governance policy blocks
# poolside_tool_executions_total β tool calls by type
# poolside_e2e_task_duration_seconds β end-to-end task completion timeCommon Issues
Agent sandbox pod pending (no resources)
- Cause: ResourceQuota exhausted or insufficient node capacity
- Fix: Increase quota; or add node autoscaling for sandbox namespace
Slow model loading (>5 min startup)
- Cause: Large model weights loading from networked storage (NFS)
- Fix: Use local NVMe PVs for model storage; or pre-load with init container
Policy engine blocking legitimate actions
- Cause: Governance rules too restrictive for the use case
- Fix: Review
policies.yaml; add specific allow rules; check trace logs for blocked action details
Knowledge connector stale data
- Cause: Sync interval too long or webhook not configured
- Fix: Reduce
sync_interval; configure git webhooks for push-based updates
Best Practices
- Deploy model weights on local NVMe β network storage adds 2-5 min to cold starts
- Strict NetworkPolicies on sandboxes β agents executing code should never reach production
- RBAC for agents AND humans β Poolside supports role-based access for both
- Audit everything β enable full trace collection; 90-day retention for compliance
- Start with read-only agents β code reviewer before code writer builds trust
- PII detection enabled β prevent models from memorizing sensitive data from connectors
- Resource quotas on sandbox namespace β prevent runaway agent spawning
Key Takeaways
- Poolside AI: foundation models purpose-built for software engineering (code gen, planning, tool use)
- Deploy on-prem/VPC on Kubernetes β data never leaves your boundary
- Multi-agent orchestration with policy governance and end-to-end tracing
- Sandboxed execution: agents run tools in isolated pods with NetworkPolicies
- Knowledge layer connects to git, databases, wikis, API specs within strict boundaries
- Developer surfaces: IDE extensions, TUI, CLI β intelligence where work happens
- Air-gap ready: offline model loading, no external dependencies, offline licensing
- Enterprise governance: audit trails, policy violations, role-based access for humans and agents

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
