πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Security advanced ⏱ 15 minutes K8s 1.28+

Kyverno Webhook Topology and Admission Latency

Optimize Kyverno webhook topology for minimal admission latency: webhook configuration tuning, failure policies, timeout settings, and lessons from migrating

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Minimize Kyverno admission latency by consolidating webhook configurations, setting appropriate timeouts (5-10s), using failurePolicy: Ignore for non-critical policies, placing Kyverno Pods close to the API server, and migrating complex policies to CEL for compiled evaluation.

The Problem

Admission webhooks add latency to every API request:

  • Each webhook call adds 5-50ms network round-trip
  • Multiple webhooks chain sequentially (not parallel)
  • Slow webhooks delay Pod creation, scaling, deployments
  • Webhook failures can block entire cluster operations
  • Migration from OPA/Gatekeeper to Kyverno needs careful planning

The Solution

Webhook Architecture

kubectl apply β†’ API Server β†’ Authentication β†’ Authorization
  β†’ Mutating Admission Webhooks (sequential):
    β”œβ”€β”€ Kyverno mutate (1st call)       ~5-15ms
    β”œβ”€β”€ Istio sidecar injection         ~10-20ms
    └── Other mutating webhooks
  β†’ Object Schema Validation
  β†’ Validating Admission Webhooks (sequential):
    β”œβ”€β”€ Kyverno validate (2nd call)     ~5-15ms
    β”œβ”€β”€ OPA/Gatekeeper (if present)     ~10-30ms
    └── Other validating webhooks
  β†’ etcd Write
  
Total admission latency: sum of all webhook calls
Target: < 100ms total for simple resources

Kyverno Webhook Configuration

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: kyverno-resource-validating-webhook-cfg
webhooks:
  - name: validate.kyverno.svc
    clientConfig:
      service:
        name: kyverno-svc
        namespace: kyverno
        path: /validate
    rules:
      - apiGroups: ["", "apps", "batch"]
        apiVersions: ["v1", "v1beta1"]
        operations: ["CREATE", "UPDATE"]
        resources: ["pods", "deployments", "statefulsets", "jobs"]
        scope: "Namespaced"
    failurePolicy: Fail        # Critical policies: block on webhook failure
    timeoutSeconds: 10         # Max wait before timeout
    sideEffects: None
    admissionReviewVersions: ["v1"]
    matchPolicy: Equivalent
    namespaceSelector:
      matchExpressions:
        - key: kyverno.io/exclude
          operator: DoesNotExist

Optimize Timeout Settings

# For critical security policies (must not be bypassed)
failurePolicy: Fail
timeoutSeconds: 10

# For non-critical policies (best-effort, don't block cluster)
failurePolicy: Ignore
timeoutSeconds: 5

# For background-only policies (no admission call)
background: true
# β†’ No webhook call at all; evaluated asynchronously

Reduce Webhook Calls with Scope Filtering

# BAD: Matches everything (unnecessary calls for non-relevant resources)
rules:
  - apiGroups: ["*"]
    resources: ["*"]
    operations: ["*"]

# GOOD: Only match what your policies actually validate
rules:
  - apiGroups: ["", "apps"]
    resources: ["pods", "deployments", "statefulsets"]
    operations: ["CREATE", "UPDATE"]
    scope: "Namespaced"

Kyverno Scaling for Low Latency

# Kyverno Helm values for production
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: kyverno
spec:
  valuesContent: |
    replicaCount: 3

    resources:
      limits:
        cpu: 1000m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 512Mi

    # Spread across nodes for HA
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: kyverno

    # Co-locate with API server for lowest latency
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists

    # Webhook configuration
    config:
      webhooks:
        - namespaceSelector:
            matchExpressions:
              - key: kubernetes.io/metadata.name
                operator: NotIn
                values:
                  - kube-system
                  - kyverno

Migration from OPA/Gatekeeper

Migration strategy (zero-downtime):

Phase 1: Deploy Kyverno alongside Gatekeeper (Audit mode)
β”œβ”€β”€ Install Kyverno with all policies in Audit mode
β”œβ”€β”€ Compare PolicyReports with Gatekeeper violations
β”œβ”€β”€ Fix any differences in policy logic
└── Duration: 2 weeks

Phase 2: Parallel enforcement
β”œβ”€β”€ Enable Enforce on Kyverno for validated policies
β”œβ”€β”€ Keep Gatekeeper running (redundant validation)
β”œβ”€β”€ Monitor for false positives
└── Duration: 1 week

Phase 3: Disable Gatekeeper
β”œβ”€β”€ Remove Gatekeeper constraints (one by one)
β”œβ”€β”€ Verify no regressions
β”œβ”€β”€ Remove Gatekeeper operator
└── Duration: 1 week

Phase 4: Optimize
β”œβ”€β”€ Consolidate webhook configurations
β”œβ”€β”€ Tune timeouts based on observed latency
β”œβ”€β”€ Convert complex Rego policies to CEL
└── Measure improvement

Measure Admission Latency

# API server metrics for webhook duration
kubectl get --raw /metrics | grep apiserver_admission_webhook_admission_duration_seconds

# Kyverno-specific metrics
kubectl port-forward -n kyverno svc/kyverno-svc 8000:443 &
curl -sk https://localhost:8000/metrics | grep kyverno_admission_review_duration

# Trace a specific request
kubectl apply -f test-pod.yaml -v=8 2>&1 | grep -i "admission\|webhook\|duration"

# Aggregated latency percentiles
kubectl get --raw /metrics | grep 'apiserver_admission_webhook_admission_duration_seconds_bucket{name="validate.kyverno.svc"'

Common Latency Patterns

Scenario                          Expected Latency    Notes
────────────────────────────────────────────────────────────────
Simple label check (YAML)         2-5ms              Fast pattern match
CEL expression (compiled)         1-3ms              Faster than YAML
Image verification (cosign)       50-200ms           Network call to registry
API call (context lookup)         20-100ms           Cross-namespace lookups
Complex Rego (if migrated)        10-50ms            Depends on complexity
Background policy                 0ms                No admission webhook call

Common Issues

Webhook timeout causes Pod creation failure

  • Cause: Kyverno overloaded or network partition
  • Fix: Use failurePolicy: Ignore for non-security policies; scale Kyverno

All cluster operations blocked after Kyverno crash

  • Cause: failurePolicy: Fail + all Kyverno replicas down
  • Fix: Emergency: delete ValidatingWebhookConfiguration; fix: ensure 3+ replicas with PDB

Latency spike during policy update

  • Cause: Policy compilation/caching after CRD update
  • Fix: Expected behavior; resolves within 5-10s after policy change

Namespace exclusion not working

  • Cause: namespaceSelector not applied to webhook config
  • Fix: Check kyverno.io/exclude label on namespace; restart Kyverno to refresh webhook

Best Practices

  1. 3+ replicas with PDB β€” webhook availability = cluster availability
  2. Scope webhook rules narrowly β€” don’t match resources you don’t policy
  3. CEL over YAML β€” compiled expressions are 2-5x faster
  4. Background for non-blocking β€” move audit-only policies off critical path
  5. failurePolicy: Ignore for nice-to-have policies
  6. failurePolicy: Fail only for security-critical policies
  7. Measure before and after β€” use API server admission metrics
  8. Exclude system namespaces β€” kube-system, kyverno, cert-manager

Key Takeaways

  • Each webhook adds 2-200ms depending on policy complexity
  • CEL policies: 1-3ms; image verification: 50-200ms; API lookups: 20-100ms
  • Scope rules narrowly to minimize unnecessary webhook invocations
  • failurePolicy: Ignore prevents Kyverno outage from blocking the cluster
  • 3+ replicas with anti-affinity for HA; prefer control-plane nodes
  • Migration from Gatekeeper: 4-phase zero-downtime approach
  • Monitor apiserver_admission_webhook_admission_duration_seconds histogram
  • Background scanning moves non-critical checks off the admission hot path
#kyverno #webhook #admission-control #performance #latency
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens