πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Troubleshooting advanced ⏱ 15 minutes K8s 1.28+

Debug etcd Performance Issues

Diagnose slow etcd causing API latency and leader election storms. Check disk IOPS, compaction, defrag, and network latency.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Slow etcd is almost always a disk I/O problem. Check etcdctl endpoint status for Raft index lag, iostat for disk latency, and etcd metrics for etcd_disk_wal_fsync_duration_seconds. Target: WAL fsync p99 < 10ms. Fix: use dedicated SSD/NVMe for etcd data, defragment, and compact.

The Problem

The Kubernetes API server is slow. kubectl commands take 5-30 seconds instead of milliseconds. Pods take minutes to schedule. You see etcdserver: request timed out or etcdserver: leader changed in API server logs. The root cause is etcd performance degradation.

The Solution

Step 1: Check etcd Health

# On a master node or etcd pod
etcdctl endpoint health --cluster
# https://10.0.1.10:2379 is healthy: successfully committed proposal: took = 3.456ms
# https://10.0.1.11:2379 is healthy: successfully committed proposal: took = 2.123ms
# https://10.0.1.12:2379 is healthy: successfully committed proposal: took = 45.678ms ← SLOW

# Check endpoint status
etcdctl endpoint status --cluster -w table
# Shows: DB SIZE, LEADER, RAFT INDEX, RAFT APPLIED INDEX

Step 2: Check Disk Performance

# On the slow etcd node
iostat -xz 1 5
# Look for: await > 10ms on the etcd disk β†’ too slow

# Check specifically the etcd data directory
# OpenShift: /var/lib/etcd
fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd     --size=22m --bs=2300 --name=etcd-bench
# Target: fdatasync p99 < 10ms

Step 3: Check etcd Metrics

# Key metrics (via Prometheus or direct curl)
# WAL fsync latency β€” most critical
curl -s http://localhost:2379/metrics | grep etcd_disk_wal_fsync_duration_seconds

# Backend commit latency
curl -s http://localhost:2379/metrics | grep etcd_disk_backend_commit_duration_seconds

# Network latency between peers
curl -s http://localhost:2379/metrics | grep etcd_network_peer_round_trip_time_seconds

Step 4: Compact and Defragment

# Get current revision
REV=$(etcdctl endpoint status -w json | jq -r '.[0].Status.header.revision')

# Compact old revisions
etcdctl compact "$REV"

# Defragment each member (one at a time!)
etcdctl defrag --endpoints=https://10.0.1.10:2379
etcdctl defrag --endpoints=https://10.0.1.11:2379
etcdctl defrag --endpoints=https://10.0.1.12:2379

# Check DB size after
etcdctl endpoint status -w table

Step 5: Long-Term Fixes

# Dedicated SSD/NVMe for etcd (must be low-latency)
# On bare metal: separate physical disk
# On cloud: io2 EBS (AWS), pd-ssd (GCP), Premium SSD (Azure)

# Increase etcd snapshot count (reduces compaction frequency)
# In etcd configuration:
ETCD_SNAPSHOT_COUNT: "10000"   # Default: 100000

# Separate etcd network from pod network
# Use dedicated NICs for etcd peer communication

Common Issues

Leader Election Storms

If you see frequent leader changed messages, check network latency between etcd members:

# From each etcd node, ping the others
ping -c 10 <other-etcd-node>
# Latency should be < 2ms for etcd peers

DB Size Growing Continuously

# Check alarm status
etcdctl alarm list
# If NOSPACE alarm is active:
etcdctl alarm disarm
etcdctl compact $(etcdctl endpoint status -w json | jq '.[0].Status.header.revision')
etcdctl defrag

Best Practices

  • Dedicated low-latency storage β€” NVMe or SSD with < 10ms p99 fsync
  • 3 or 5 etcd members β€” more members increase write latency
  • Monitor WAL fsync duration β€” alert if p99 > 10ms
  • Schedule regular compaction β€” etcd auto-compacts but defrag is manual
  • Keep etcd DB under 8GB β€” performance degrades with large databases
  • Separate etcd traffic β€” use dedicated network for peer communication

Key Takeaways

  • etcd performance = disk I/O performance (WAL fsync is the bottleneck)
  • Target: WAL fsync p99 < 10ms, backend commit < 25ms
  • Compact + defragment to reclaim space and improve read performance
  • Leader election storms indicate network latency between members
  • Always use dedicated SSD/NVMe β€” shared storage kills etcd performance
#etcd #performance #latency #disk-io #cluster
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens