OpenShift crun vs runc Runtime Differences
Understand why pods behave differently on GPU vs CPU nodes in OpenShift. Compare crun and runc container runtimes, seccomp profiles, and syscall filtering.
π‘ Quick Answer: OpenShift CPU nodes use
crun(lightweight, cgroup2-native) while GPU nodes userunc(NVIDIA container runtime). The key difference:crunapplies stricter seccomp filtering by default, blocking syscalls likeio_setup(libaio),perf_event_open, and someioctlvariants thatruncallows.Key insight: When a workload runs on GPU nodes but fails silently on CPU nodes, the container runtime difference β not hardware β is usually the cause.
Gotcha: Both runtimes report the same SCC, same SELinux context, and same mount points. The difference is invisible to standard
oc describe podoutput.
The Problem
You deploy the same workload across OpenShift nodes. It works perfectly on GPU nodes but silently fails or behaves differently on CPU nodes. Standard debugging shows identical configurations:
# Both show privileged SCC
oc get pod -o jsonpath='{.metadata.annotations.openshift\.io/scc}'
# privileged
# Both show same SELinux context
oc exec pod-on-gpu -- cat /proc/1/attr/current
# system_u:system_r:spc_t:s0
oc exec pod-on-cpu -- cat /proc/1/attr/current
# system_u:system_r:spc_t:s0
# Both show same mounts
oc exec pod-on-gpu -- mount | wc -l
# 24
oc exec pod-on-cpu -- mount | wc -l
# 24The Solution
Identify the Runtime
# Check runtime on each node
for node in $(oc get nodes -l node-role.kubernetes.io/worker -o name); do
echo "=== $node ==="
oc debug $node -- chroot /host crio config 2>/dev/null | grep -E "default_runtime|runtime_path" | head -3
doneTypical output:
=== node/worker-1 === # CPU node
default_runtime = "crun"
runtime_path = "/usr/bin/crun"
=== node/gpu-1 === # GPU node
default_runtime = "nvidia"
runtime_path = "/usr/bin/nvidia-container-runtime"Check seccomp Status Inside Pods
# CPU node pod β seccomp active
oc exec -n test pod-on-cpu -- cat /proc/self/status | grep Seccomp
# Seccomp: 2
# Seccomp_filters: 1
# GPU node pod β seccomp inactive
oc exec -n test pod-on-gpu -- cat /proc/self/status | grep Seccomp
# Seccomp: 0
# Seccomp_filters: 0The Seccomp: 2 means filter mode is active (syscalls are being checked against a whitelist). Seccomp: 0 means disabled.
graph TD
A[Pod scheduled to node] --> B{Node type}
B -->|CPU worker| C[CRI-O uses crun]
B -->|GPU worker| D[CRI-O uses nvidia-container-runtime / runc]
C --> E[Default seccomp profile applied]
D --> F[NVIDIA runtime: permissive seccomp]
E --> G[io_setup blocked]
E --> H[perf_event_open blocked]
E --> I[Some ioctl variants blocked]
F --> J[All syscalls allowed]
G --> K[libaio fails silently]
J --> L[Everything works]Key Differences Between crun and runc
| Feature | crun (CPU nodes) | runc/nvidia (GPU nodes) |
|---|---|---|
| Language | C | Go |
| cgroup support | cgroup2 native | cgroup1 + cgroup2 |
| seccomp default | Strict filtering | Permissive / disabled |
io_setup | β Blocked | β Allowed |
perf_event_open | β Blocked | β Allowed |
| Memory overhead | ~50KB | ~15MB |
| Startup time | Faster | Slower |
Affected Workloads
Common tools that fail silently under crunβs seccomp:
- fio with libaio β
io_setup/io_submitblocked β empty output - perf/eBPF tools β
perf_event_openblocked β permission denied - DPDK applications β certain
ioctlvariants blocked - Custom kernel module loaders β
init_moduleblocked - Some Java NIO β native async I/O may fall back to sync
Workarounds
Option 1: Override seccomp per pod
spec:
securityContext:
seccompProfile:
type: UnconfinedOption 2: Force runc on specific pods
Add a RuntimeClass that uses runc:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: runc
handler: runcThen reference it in your pod:
spec:
runtimeClassName: runcOption 3: Use alternative syscalls
For fio specifically, switch from libaio to psync or io_uring.
Common Issues
How to tell which runtime a pod actually used
# Check the CRI-O logs for the pod's container ID
oc debug node/worker-1 -- chroot /host journalctl -u crio --no-pager | grep "runtime" | tail -5RuntimeClass not available in OpenShift
OpenShift supports RuntimeClass from 4.12+. Earlier versions require MachineConfig changes to CRI-O configuration.
NVIDIA runtime is just a wrapper around runc
The nvidia-container-runtime calls runc with additional GPU device mounts. It inherits runcβs permissive seccomp behavior.
Best Practices
- Always test workloads on both CPU and GPU node types before declaring them production-ready
- Check
/proc/self/statusSeccomp field as the first debugging step for node-type-specific failures - Use
stracein privileged pods to identify which syscalls are blocked:strace -f your-command 2>&1 | grep EPERM - Document runtime requirements in your Helm charts or deployment manifests
- Prefer userspace alternatives (
psync,io_uring) over kernel-level syscalls when possible
Key Takeaways
- OpenShift uses different container runtimes on different node types β this is invisible to most debugging tools
crunon CPU nodes enforces stricter seccomp thanrunc/NVIDIA runtime on GPU nodes- SCC, SELinux, and mount outputs are identical β the difference is at the syscall filtering level
- Silent failures (no error, no crash) are the hallmark of seccomp-blocked syscalls
- Check
Seccompin/proc/self/statusto quickly identify if filtering is active

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
