πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Networking advanced ⏱ 15 minutes K8s 1.28+

HAProxy Keepalived Multi-Tenant GPU Ingress

Configure HAProxy with Keepalived VIPs for per-tenant GPU cluster ingress with Jinja2 templates and per-tenant logging.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Deploy HAProxy + Keepalived with per-tenant VIPs, NodePort backends, and rsyslog logging to /var/log/haproxy-<tenant>.log. Templatize with Jinja2 β€” adding a tenant = adding a dict entry.

The Problem

In multi-tenant GPU clusters, each team needs its own ingress endpoint for model serving, notebooks, and APIs. Sharing a single ingress creates noisy-neighbor issues, makes per-tenant monitoring impossible, and complicates access control.

The Solution

HAProxy with Keepalived provides dedicated VIPs per tenant. Each tenant gets its own frontend, backend, and log file. Jinja2 templates make tenant addition a one-line config change.

Keepalived Configuration

# keepalived.conf (managed via ConfigMap or Ansible)
vrrp_instance VI_APPS {
    state MASTER
    interface ens192
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass k8sgpu
    }
    virtual_ipaddress {
        10.0.100.10/24    # tenant-alpha VIP
        10.0.100.11/24    # tenant-beta VIP
        10.0.100.12/24    # tenant-gamma VIP
    }
}

HAProxy Jinja2 Template

# haproxy.cfg.j2
global
    log /dev/log local0
    maxconn 4096
    daemon

defaults
    log     global
    mode    http
    option  httplog
    timeout connect 5s
    timeout client  300s
    timeout server  300s

{% for tenant in tenants %}
# === Tenant: {{ tenant.name }} ===
frontend ft_{{ tenant.name }}
    bind {{ tenant.vip }}:443 ssl crt /etc/haproxy/certs/{{ tenant.name }}.pem
    log /dev/log local{{ tenant.log_facility }} info
    default_backend bk_{{ tenant.name }}
    http-request set-header X-Tenant-ID {{ tenant.name }}
    # Per-tenant rate limiting
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request deny deny_status 429 if { sc_http_req_rate(0) gt {{ tenant.rate_limit | default(100) }} }

backend bk_{{ tenant.name }}
    balance roundrobin
{% for node in gpu_nodes %}
    server {{ node.name }} {{ node.ip }}:{{ tenant.nodeport }} check
{% endfor %}

{% endfor %}

Tenant Configuration (Ansible vars)

# group_vars/all.yml
tenants:
  - name: alpha
    vip: 10.0.100.10
    nodeport: 30001
    log_facility: 1
    rate_limit: 200
    team: "ML Training"
  - name: beta
    vip: 10.0.100.11
    nodeport: 30002
    log_facility: 2
    rate_limit: 100
    team: "Inference Serving"
  - name: gamma
    vip: 10.0.100.12
    nodeport: 30003
    log_facility: 3
    rate_limit: 150
    team: "Research"

gpu_nodes:
  - name: gpu-worker-1
    ip: 10.0.1.101
  - name: gpu-worker-2
    ip: 10.0.1.102
  - name: gpu-worker-3
    ip: 10.0.1.103

Per-Tenant rsyslog Logging

# /etc/rsyslog.d/49-haproxy-tenants.conf
local1.* /var/log/haproxy-alpha.log
local2.* /var/log/haproxy-beta.log
local3.* /var/log/haproxy-gamma.log

NodePort Services in OpenShift

apiVersion: v1
kind: Service
metadata:
  name: model-serving
  namespace: tenant-alpha
spec:
  type: NodePort
  selector:
    app: inference-server
  ports:
    - port: 8080
      targetPort: 8080
      nodePort: 30001
graph TD
    A[Client] --> B[Keepalived VIP 10.0.100.10]
    A --> C[Keepalived VIP 10.0.100.11]
    
    B --> D[HAProxy frontend: tenant-alpha]
    C --> E[HAProxy frontend: tenant-beta]
    
    D --> F[NodePort 30001 on GPU nodes]
    E --> G[NodePort 30002 on GPU nodes]
    
    F --> H[tenant-alpha pods]
    G --> I[tenant-beta pods]
    
    D -->|rsyslog| J[/var/log/haproxy-alpha.log]
    E -->|rsyslog| K[/var/log/haproxy-beta.log]

Common Issues

  • VIP not reachable β€” verify Keepalived VRRP is running; check ip addr show for VIP; ensure firewall allows VRRP (protocol 112)
  • Backend health checks failing β€” NodePort service must be type NodePort; verify pods are running in tenant namespace
  • Per-tenant logs not splitting β€” rsyslog facility numbers must match HAProxy config; restart rsyslog after adding rules
  • SSL certificate mismatch β€” each tenant frontend needs its own certificate; use wildcard or per-tenant certs

Best Practices

  • One VIP per tenant for full traffic isolation and independent monitoring
  • Jinja2 templates ensure consistent config β€” add tenant = add dict entry
  • Per-tenant log files enable independent troubleshooting and SLO tracking
  • Rate limiting per frontend prevents one tenant from overwhelming shared infrastructure
  • Use health checks on all backend servers for automatic failover
  • Keepalived provides HA β€” if primary HAProxy fails, backup takes over VIPs

Key Takeaways

  • HAProxy + Keepalived provides per-tenant VIP isolation on bare metal
  • Jinja2 templates make tenant addition a single dict entry change
  • Per-tenant rsyslog splitting enables independent p50/p95 latency monitoring
  • NodePort backends route to tenant-specific services in isolated namespaces
  • Rate limiting per frontend prevents cross-tenant resource exhaustion
#haproxy #keepalived #multi-tenant #vip #ingress #load-balancer
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens