How to Implement Rate Limiting in Kubernetes
Protect your services with rate limiting. Configure rate limits using Ingress, service mesh, and API gateways to prevent abuse and ensure fair usage.
How to Implement Rate Limiting in Kubernetes
Rate limiting protects services from abuse, ensures fair resource usage, and prevents cascading failures. Implement rate limiting at the ingress, service mesh, or application level.
NGINX Ingress Rate Limiting
# rate-limited-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
# Limit requests per second per IP
nginx.ingress.kubernetes.io/limit-rps: "10"
# Limit connections per IP
nginx.ingress.kubernetes.io/limit-connections: "5"
# Burst size (requests allowed above limit temporarily)
nginx.ingress.kubernetes.io/limit-burst-multiplier: "5"
# Return 429 when rate limited
nginx.ingress.kubernetes.io/limit-rate-after: "10m"
# Whitelist certain IPs
nginx.ingress.kubernetes.io/limit-whitelist: "10.0.0.0/8,192.168.1.0/24"
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80Rate Limit by Request Size
# size-limited-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: upload-ingress
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/limit-rate: "100k" # 100KB/s
nginx.ingress.kubernetes.io/limit-rate-after: "1m" # After 1MB
spec:
ingressClassName: nginx
rules:
- host: upload.example.com
http:
paths:
- path: /upload
pathType: Prefix
backend:
service:
name: upload-service
port:
number: 80Istio Rate Limiting
# istio-rate-limit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: rate-limit-filter
namespace: istio-system
spec:
workloadSelector:
labels:
istio: ingressgateway
configPatches:
- applyTo: HTTP_FILTER
match:
context: GATEWAY
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
value:
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 100
tokens_per_fill: 10
fill_interval: 1s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value:
numerator: 100
denominator: HUNDRED
filter_enforced:
runtime_key: local_rate_limit_enforced
default_value:
numerator: 100
denominator: HUNDRED
response_headers_to_add:
- append: false
header:
key: x-rate-limit
value: "true"Istio with External Rate Limit Service
# rate-limit-service.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: rate-limit-envoy-filter
namespace: istio-system
spec:
workloadSelector:
labels:
istio: ingressgateway
configPatches:
- applyTo: HTTP_FILTER
match:
context: GATEWAY
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: production-ratelimit
failure_mode_deny: true
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: rate_limit_cluster
transport_api_version: V3
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: rate-limit-cluster
namespace: istio-system
spec:
configPatches:
- applyTo: CLUSTER
patch:
operation: ADD
value:
name: rate_limit_cluster
type: STRICT_DNS
connect_timeout: 10s
lb_policy: ROUND_ROBIN
http2_protocol_options: {}
load_assignment:
cluster_name: rate_limit_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: ratelimit.default.svc.cluster.local
port_value: 8081Kong Rate Limiting
# kong-rate-limit.yaml
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: rate-limit
spec:
plugin: rate-limiting
config:
second: 5
minute: 100
hour: 1000
policy: local
fault_tolerant: true
hide_client_headers: false
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
konghq.com/plugins: rate-limit
spec:
ingressClassName: kong
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80Rate Limit by Header (API Key)
# kong-rate-limit-by-key.yaml
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: rate-limit-by-key
spec:
plugin: rate-limiting
config:
second: 10
policy: redis
redis_host: redis.default.svc.cluster.local
redis_port: 6379
limit_by: header
header_name: X-API-KeyApplication-Level Rate Limiting
# redis-for-rate-limiting.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
ports:
- port: 6379
selector:
app: redis# Python rate limiter example
from flask import Flask, request, jsonify
import redis
import time
app = Flask(__name__)
r = redis.Redis(host='redis', port=6379)
def rate_limit(key, limit, window):
"""Token bucket rate limiter"""
current = time.time()
window_key = f"rate:{key}:{int(current // window)}"
count = r.incr(window_key)
if count == 1:
r.expire(window_key, window)
if count > limit:
return False, limit - count
return True, limit - count
@app.before_request
def check_rate_limit():
client_ip = request.remote_addr
allowed, remaining = rate_limit(client_ip, limit=100, window=60)
if not allowed:
return jsonify({"error": "Rate limit exceeded"}), 429Rate Limit ConfigMap
# rate-limit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: rate-limit-config
data:
config.yaml: |
domain: production
descriptors:
# Global rate limit
- key: generic_key
value: default
rate_limit:
unit: minute
requests_per_unit: 1000
# Per-path rate limits
- key: path
value: /api/v1/users
rate_limit:
unit: second
requests_per_unit: 10
# Per-user rate limits
- key: user_id
rate_limit:
unit: minute
requests_per_unit: 100
# Different tiers
- key: api_tier
value: free
rate_limit:
unit: hour
requests_per_unit: 100
- key: api_tier
value: premium
rate_limit:
unit: hour
requests_per_unit: 10000Envoy Rate Limit Service
# rate-limit-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ratelimit
spec:
replicas: 2
selector:
matchLabels:
app: ratelimit
template:
metadata:
labels:
app: ratelimit
spec:
containers:
- name: ratelimit
image: envoyproxy/ratelimit:v1.4.0
ports:
- containerPort: 8080
- containerPort: 8081
- containerPort: 6070
env:
- name: RUNTIME_ROOT
value: /data
- name: RUNTIME_SUBDIRECTORY
value: ratelimit
- name: REDIS_SOCKET_TYPE
value: tcp
- name: REDIS_URL
value: redis:6379
volumeMounts:
- name: config
mountPath: /data/ratelimit/config
volumes:
- name: config
configMap:
name: rate-limit-config
---
apiVersion: v1
kind: Service
metadata:
name: ratelimit
spec:
ports:
- name: http
port: 8080
- name: grpc
port: 8081
- name: debug
port: 6070
selector:
app: ratelimitCustom Response for Rate Limited Requests
# custom-error-response.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
nginx.ingress.kubernetes.io/limit-rps: "10"
nginx.ingress.kubernetes.io/custom-http-errors: "429"
nginx.ingress.kubernetes.io/default-backend: rate-limit-backend
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80Monitor Rate Limiting
# Prometheus queries for rate limiting
# Rate limited requests
sum(rate(nginx_ingress_controller_requests{status="429"}[5m])) by (ingress)
# Requests approaching limit
sum(rate(nginx_ingress_controller_requests[5m])) by (ingress, remote_addr)
# Rate limit efficiency
sum(rate(nginx_ingress_controller_requests{status="429"}[5m]))
/
sum(rate(nginx_ingress_controller_requests[5m]))Best Practices
1. Layer Rate Limits
- Global limits at ingress
- Per-service limits in mesh
- Per-user limits in application
2. Use Sliding Windows
- Prevents burst at window boundaries
- More fair distribution
3. Return Helpful Headers
- X-RateLimit-Limit
- X-RateLimit-Remaining
- X-RateLimit-Reset
4. Different Limits by Tier
- Anonymous < Authenticated < Premium
5. Graceful Degradation
- Don't fail if rate limiter is down
- Log but allow trafficSummary
Rate limiting protects services from abuse and overload. Use NGINX Ingress annotations for simple per-IP limits, service mesh for fine-grained control, and external rate limit services for complex rules. Implement multiple layers with different granularities. Always return helpful headers so clients can adjust their behavior, and monitor rate limit metrics to tune thresholds.
📘 Go Further with Kubernetes Recipes
Love this recipe? There’s so much more! This is just one of 100+ hands-on recipes in our comprehensive Kubernetes Recipes book.
Inside the book, you’ll master:
- ✅ Production-ready deployment strategies
- ✅ Advanced networking and security patterns
- ✅ Observability, monitoring, and troubleshooting
- ✅ Real-world best practices from industry experts
“The practical, recipe-based approach made complex Kubernetes concepts finally click for me.”
👉 Get Your Copy Now — Start building production-grade Kubernetes skills today!
📘 Get All 100+ Recipes in One Book
Stop searching — get every production-ready pattern with detailed explanations, best practices, and copy-paste YAML.
Want More Kubernetes Recipes?
This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.