πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Observability intermediate ⏱ 20 minutes K8s 1.28+

Alertmanager Routing, Grouping, and Silences

Configure Alertmanager routing trees, receiver integrations, inhibition rules, silences, and alert grouping for production Kubernetes monitoring stacks.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Alertmanager receives alerts from Prometheus and routes them to receivers (Slack, PagerDuty, email) based on label matching. Configure routing trees with route.routes[], group related alerts with group_by, suppress duplicates with inhibit_rules, and temporarily mute known issues with silences. Use amtool CLI or the web UI to manage silences.

The Problem

Without proper routing, all alerts go to one channel and overwhelm on-call teams. Critical pages get buried in noise. Alertmanager solves this by routing alerts to the right people, grouping related alerts into single notifications, suppressing redundant alerts, and allowing temporary silences during maintenance.

flowchart TB
    PROM["Prometheus<br/>fires alerts"] --> AM["Alertmanager"]
    AM -->|"severity=critical<br/>+ team=platform"| PD["PagerDuty<br/>(page on-call)"]
    AM -->|"severity=warning<br/>+ team=platform"| SLACK["#platform-alerts<br/>(Slack)"]
    AM -->|"severity=info"| EMAIL["Email digest"]
    AM -->|"SILENCED<br/>(maintenance)"| DEV_NULL["Suppressed"]

The Solution

Routing Tree Configuration

# alertmanager.yml
global:
  resolve_timeout: 5m
  slack_api_url: 'https://hooks.slack.com/services/T00/B00/xxx'
  pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'

route:
  receiver: 'default-slack'
  group_by: ['alertname', 'namespace', 'service']
  group_wait: 30s                    # Wait before first notification
  group_interval: 5m                 # Wait between grouped updates
  repeat_interval: 4h                # Re-send if not resolved
  
  routes:
    # Critical β†’ PagerDuty (page immediately)
    - match:
        severity: critical
      receiver: 'pagerduty-critical'
      group_wait: 10s                # Page fast
      repeat_interval: 1h
      continue: false
    
    # Warning β†’ team-specific Slack channels
    - match:
        severity: warning
      routes:
        - match:
            team: platform
          receiver: 'slack-platform'
        - match:
            team: data
          receiver: 'slack-data'
        - match_re:
            namespace: 'prod-.*'
          receiver: 'slack-production'
    
    # Info β†’ email digest
    - match:
        severity: info
      receiver: 'email-digest'
      group_wait: 10m
      repeat_interval: 24h

receivers:
  - name: 'default-slack'
    slack_configs:
      - channel: '#alerts-general'
        send_resolved: true
        title: '[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}'
        text: >-
          {{ range .Alerts }}
          *{{ .Labels.alertname }}* in {{ .Labels.namespace }}
          {{ .Annotations.summary }}
          {{ end }}
  
  - name: 'pagerduty-critical'
    pagerduty_configs:
      - service_key_file: '/etc/alertmanager/pagerduty-key'
        severity: critical
        description: '{{ .CommonLabels.alertname }}: {{ .CommonAnnotations.summary }}'
  
  - name: 'slack-platform'
    slack_configs:
      - channel: '#platform-alerts'
        send_resolved: true
  
  - name: 'slack-data'
    slack_configs:
      - channel: '#data-alerts'
  
  - name: 'email-digest'
    email_configs:
      - to: 'oncall@example.com'
        send_resolved: false

# Inhibition: suppress warning if critical exists for same alert
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'namespace']
  
  # Don't alert on pods if the node is down
  - source_match:
      alertname: 'NodeDown'
    target_match_re:
      alertname: 'Pod.*'
    equal: ['node']

Managing Silences

# Install amtool
# (comes with alertmanager binary)

# Create a silence (2-hour maintenance window)
amtool silence add \
  --alertmanager.url=http://alertmanager:9093 \
  --author="oncall" \
  --comment="Scheduled maintenance on gpu-node-3" \
  --duration=2h \
  'node="gpu-node-3"'

# List active silences
amtool silence query --alertmanager.url=http://alertmanager:9093

# Expire a silence early
amtool silence expire --alertmanager.url=http://alertmanager:9093 <silence-id>

# Query current alerts
amtool alert query --alertmanager.url=http://alertmanager:9093

Deploy on Kubernetes

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: main
  namespace: monitoring
spec:
  replicas: 3                        # HA: 3 replicas with mesh
  configSecret: alertmanager-config
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
---
apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-config
  namespace: monitoring
type: Opaque
stringData:
  alertmanager.yaml: |
    # Paste routing config here

Test Routing with amtool

# Test which receiver an alert would route to
amtool config routes test \
  --alertmanager.url=http://alertmanager:9093 \
  severity=critical team=platform namespace=production

# Output: pagerduty-critical

# Show the routing tree
amtool config routes show --alertmanager.url=http://alertmanager:9093

Common Issues

IssueCauseFix
Alerts not routing to SlackWrong channel name or API URLTest with amtool config routes test
Duplicate notificationscontinue: true on parent routeSet continue: false on specific routes
Too many grouped alertsgroup_by too broadAdd specific labels to group_by
Silence not workingLabel mismatch in silence matcherCheck exact label values with amtool alert query
PagerDuty not receivingWrong service keyVerify key in PagerDuty integration settings

Best Practices

  • Route by severity first β€” critical β†’ page, warning β†’ Slack, info β†’ email
  • Group by alertname + namespace β€” reduces notification volume by 10Γ—
  • Use inhibition rules β€” suppress pod alerts when the node is down
  • Test routing before deploying β€” amtool config routes test catches misconfigurations
  • Set send_resolved: true for Slack β€” teams need to know when issues resolve
  • Create silences for planned maintenance β€” prevent unnecessary pages

Key Takeaways

  • Alertmanager routes alerts to receivers based on label matching
  • Routing trees enable team-specific channels: platform β†’ #platform-alerts
  • Grouping reduces noise: 50 pod alerts become 1 notification
  • Inhibition suppresses redundant alerts (warning suppressed when critical fires)
  • Silences temporarily mute alerts during maintenance windows
  • Always test routing with amtool before deploying config changes
#alertmanager #routing #silences #pagerduty #slack-alerts
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens