Validate CSI Storage Performance with FIO Kubernetes Job
Benchmark CSI storage performance using FIO inside a Kubernetes Job. Create a PVC backed by a CSI StorageClass, run sequential/random read/write
π‘ Quick Answer: Create a PVC with your CSI StorageClass, then run FIO as a Kubernetes Job to benchmark IOPS, throughput, and latency. Compare results across storage classes (gp3 vs io2, standard vs premium) to choose the right backend for your workload.
The Problem
- You deployed a CSI driver but donβt know actual performance
- Storage vendor claims vs reality (advertised IOPS vs delivered)
- Need to compare storage classes before running production workloads
- CKA/CKAD: demonstrate understanding of PVC lifecycle and Jobs
The Solution
PVC for Benchmarking
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fio-bench-pvc
namespace: storage-test
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd # Your CSI StorageClass
resources:
requests:
storage: 50Gi # Large enough to avoid cache effectsFIO Benchmark Job
apiVersion: batch/v1
kind: Job
metadata:
name: fio-benchmark
namespace: storage-test
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
containers:
- name: fio
image: nixery.dev/fio
command: ["/bin/sh", "-c"]
args:
- |
echo "=== Sequential Write ==="
fio --name=seq-write \
--directory=/data \
--ioengine=libaio \
--direct=1 \
--bs=128k \
--size=4G \
--numjobs=4 \
--rw=write \
--group_reporting \
--runtime=60 \
--time_based
echo "=== Sequential Read ==="
fio --name=seq-read \
--directory=/data \
--ioengine=libaio \
--direct=1 \
--bs=128k \
--size=4G \
--numjobs=4 \
--rw=read \
--group_reporting \
--runtime=60 \
--time_based
echo "=== Random Write (4K) ==="
fio --name=rand-write \
--directory=/data \
--ioengine=libaio \
--direct=1 \
--bs=4k \
--size=4G \
--numjobs=4 \
--iodepth=32 \
--rw=randwrite \
--group_reporting \
--runtime=60 \
--time_based
echo "=== Random Read (4K) ==="
fio --name=rand-read \
--directory=/data \
--ioengine=libaio \
--direct=1 \
--bs=4k \
--size=4G \
--numjobs=4 \
--iodepth=32 \
--rw=randread \
--group_reporting \
--runtime=60 \
--time_based
echo "=== Mixed Random 70/30 Read/Write ==="
fio --name=mixed-rw \
--directory=/data \
--ioengine=libaio \
--direct=1 \
--bs=4k \
--size=4G \
--numjobs=4 \
--iodepth=32 \
--rw=randrw \
--rwmixread=70 \
--group_reporting \
--runtime=60 \
--time_based
volumeMounts:
- name: bench-vol
mountPath: /data
resources:
requests:
cpu: "2"
memory: "2Gi"
limits:
cpu: "4"
memory: "4Gi"
volumes:
- name: bench-vol
persistentVolumeClaim:
claimName: fio-bench-pvcRead Results
# Wait for Job to complete
kubectl wait --for=condition=complete job/fio-benchmark -n storage-test --timeout=600s
# View results
kubectl logs job/fio-benchmark -n storage-test
# Key metrics to look for:
# IOPS: read: IOPS=45.2k β 4K random read IOPS
# BW: bw=1823MiB/s β sequential throughput
# lat: avg=2845.21usec β average latency (lower = better)
# clat percentiles:
# 99.00th=[ 5342] β P99 latency in microsecondsInterpreting FIO Results:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Metric Good (NVMe SSD) OK (gp3) Poor (HDD)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Seq Read BW 3000+ MB/s 500 MB/s 150 MB/s
Seq Write BW 2000+ MB/s 250 MB/s 100 MB/s
4K Random Read IOPS 100K+ 16K 200
4K Random Write IOPS 50K+ 5K 150
Avg Latency (4K) <100 ΞΌs <500 ΞΌs >5 ms
P99 Latency (4K) <500 ΞΌs <2 ms >20 msCompare Storage Classes
# Run the same Job with different StorageClasses:
# 1. Change storageClassName in PVC
# 2. Record results in a table
# Example comparison:
# StorageClass Seq Read Rand IOPS P99 Lat
# gp3 500 MB/s 16,000 1.2 ms
# io2 1000 MB/s 64,000 0.3 ms
# local-nvme 3200 MB/s 120,000 0.08 ms
# nfs-client 110 MB/s 800 12 msJSON Output for Automation
# Add --output=json for machine-parseable results
args:
- |
fio --name=rand-read \
--directory=/data \
--ioengine=libaio \
--direct=1 \
--bs=4k \
--size=4G \
--numjobs=4 \
--iodepth=32 \
--rw=randread \
--runtime=60 \
--time_based \
--output-format=json \
--output=/data/results.json
# Extract key metrics
cat /data/results.json | \
jq '.jobs[0].read | {iops, bw_bytes, lat_ns: .lat_ns.mean}'Cleanup
kubectl delete job fio-benchmark -n storage-test
kubectl delete pvc fio-bench-pvc -n storage-testCommon Issues
FIO reports 0 IOPS
- Cause: PVC not bound (StorageClass doesnβt exist or no capacity)
- Fix: Check
kubectl get pvcβ status should beBound
Very low IOPS compared to spec
- Cause: Missing
--direct=1(OS page cache hides real performance) - Fix: Always use
--direct=1for O_DIRECT to bypass cache
Job OOMKilled
- Cause: FIO
--sizelarger than container memory limit with buffered I/O - Fix: Use
--direct=1; or increase memory limits
Best Practices
- Use
--direct=1β bypass OS cache for true storage performance - Test size > RAM β prevents cache from inflating results
- Run 60s minimum β short tests miss steady-state behavior
iodepth=32for random I/O β saturates NVMe queue depth- 4 numjobs β simulates realistic concurrent access
- Test the workload pattern youβll actually use β databases = random 4K; streaming = sequential 128K
Key Takeaways
- Create PVC β run FIO Job β read logs for IOPS/BW/latency
--direct=1+--ioengine=libaiofor accurate CSI benchmarks- Compare storage classes with identical FIO parameters
- Key metrics: IOPS (random), BW (sequential), P99 latency
--output-format=jsonfor automated comparison pipelines- Clean up PVC after benchmarking to release storage

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
