HAProxy Keepalived Multi-Tenant GPU Ingress
Configure HAProxy with Keepalived VIPs for per-tenant GPU cluster ingress with Jinja2 templates and per-tenant logging.
π‘ Quick Answer: Deploy HAProxy + Keepalived with per-tenant VIPs, NodePort backends, and rsyslog logging to
/var/log/haproxy-<tenant>.log. Templatize with Jinja2 β adding a tenant = adding a dict entry.
The Problem
In multi-tenant GPU clusters, each team needs its own ingress endpoint for model serving, notebooks, and APIs. Sharing a single ingress creates noisy-neighbor issues, makes per-tenant monitoring impossible, and complicates access control.
The Solution
HAProxy with Keepalived provides dedicated VIPs per tenant. Each tenant gets its own frontend, backend, and log file. Jinja2 templates make tenant addition a one-line config change.
Keepalived Configuration
# keepalived.conf (managed via ConfigMap or Ansible)
vrrp_instance VI_APPS {
state MASTER
interface ens192
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass k8sgpu
}
virtual_ipaddress {
10.0.100.10/24 # tenant-alpha VIP
10.0.100.11/24 # tenant-beta VIP
10.0.100.12/24 # tenant-gamma VIP
}
}HAProxy Jinja2 Template
# haproxy.cfg.j2
global
log /dev/log local0
maxconn 4096
daemon
defaults
log global
mode http
option httplog
timeout connect 5s
timeout client 300s
timeout server 300s
{% for tenant in tenants %}
# === Tenant: {{ tenant.name }} ===
frontend ft_{{ tenant.name }}
bind {{ tenant.vip }}:443 ssl crt /etc/haproxy/certs/{{ tenant.name }}.pem
log /dev/log local{{ tenant.log_facility }} info
default_backend bk_{{ tenant.name }}
http-request set-header X-Tenant-ID {{ tenant.name }}
# Per-tenant rate limiting
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request deny deny_status 429 if { sc_http_req_rate(0) gt {{ tenant.rate_limit | default(100) }} }
backend bk_{{ tenant.name }}
balance roundrobin
{% for node in gpu_nodes %}
server {{ node.name }} {{ node.ip }}:{{ tenant.nodeport }} check
{% endfor %}
{% endfor %}Tenant Configuration (Ansible vars)
# group_vars/all.yml
tenants:
- name: alpha
vip: 10.0.100.10
nodeport: 30001
log_facility: 1
rate_limit: 200
team: "ML Training"
- name: beta
vip: 10.0.100.11
nodeport: 30002
log_facility: 2
rate_limit: 100
team: "Inference Serving"
- name: gamma
vip: 10.0.100.12
nodeport: 30003
log_facility: 3
rate_limit: 150
team: "Research"
gpu_nodes:
- name: gpu-worker-1
ip: 10.0.1.101
- name: gpu-worker-2
ip: 10.0.1.102
- name: gpu-worker-3
ip: 10.0.1.103Per-Tenant rsyslog Logging
# /etc/rsyslog.d/49-haproxy-tenants.conf
local1.* /var/log/haproxy-alpha.log
local2.* /var/log/haproxy-beta.log
local3.* /var/log/haproxy-gamma.logNodePort Services in OpenShift
apiVersion: v1
kind: Service
metadata:
name: model-serving
namespace: tenant-alpha
spec:
type: NodePort
selector:
app: inference-server
ports:
- port: 8080
targetPort: 8080
nodePort: 30001graph TD
A[Client] --> B[Keepalived VIP 10.0.100.10]
A --> C[Keepalived VIP 10.0.100.11]
B --> D[HAProxy frontend: tenant-alpha]
C --> E[HAProxy frontend: tenant-beta]
D --> F[NodePort 30001 on GPU nodes]
E --> G[NodePort 30002 on GPU nodes]
F --> H[tenant-alpha pods]
G --> I[tenant-beta pods]
D -->|rsyslog| J[/var/log/haproxy-alpha.log]
E -->|rsyslog| K[/var/log/haproxy-beta.log]Common Issues
- VIP not reachable β verify Keepalived VRRP is running; check
ip addr showfor VIP; ensure firewall allows VRRP (protocol 112) - Backend health checks failing β NodePort service must be type NodePort; verify pods are running in tenant namespace
- Per-tenant logs not splitting β rsyslog facility numbers must match HAProxy config; restart rsyslog after adding rules
- SSL certificate mismatch β each tenant frontend needs its own certificate; use wildcard or per-tenant certs
Best Practices
- One VIP per tenant for full traffic isolation and independent monitoring
- Jinja2 templates ensure consistent config β add tenant = add dict entry
- Per-tenant log files enable independent troubleshooting and SLO tracking
- Rate limiting per frontend prevents one tenant from overwhelming shared infrastructure
- Use health checks on all backend servers for automatic failover
- Keepalived provides HA β if primary HAProxy fails, backup takes over VIPs
Key Takeaways
- HAProxy + Keepalived provides per-tenant VIP isolation on bare metal
- Jinja2 templates make tenant addition a single dict entry change
- Per-tenant rsyslog splitting enables independent p50/p95 latency monitoring
- NodePort backends route to tenant-specific services in isolated namespaces
- Rate limiting per frontend prevents cross-tenant resource exhaustion

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
