Kubernetes Taints and Tolerations Node Scheduling
Control pod scheduling with Kubernetes taints and tolerations. Dedicate nodes to specific workloads, prevent scheduling on control-plane nodes, implement GPU
π‘ Quick Answer: Taints on nodes repel pods; tolerations on pods allow them to schedule on tainted nodes. Taint a node:
kubectl taint nodes gpu-node nvidia.com/gpu=present:NoSchedule. Only pods with a matching toleration will schedule there. Three effects:NoSchedule(hard),PreferNoSchedule(soft),NoExecute(evict existing pods too).
The Problem
- GPU nodes getting filled with non-GPU workloads
- Batch jobs scheduling on production nodes, consuming resources
- Need to drain a node for maintenance without killing critical pods
- Control-plane nodes running user workloads
- Want dedicated node pools for specific teams or workload types
The Solution
Taint a Node
# Add taint (NoSchedule β new pods won't schedule without toleration)
kubectl taint nodes gpu-node-1 nvidia.com/gpu=present:NoSchedule
# Add taint (NoExecute β evicts existing pods without toleration)
kubectl taint nodes node-maintenance dedicated=maintenance:NoExecute
# Add taint (PreferNoSchedule β soft preference, not hard block)
kubectl taint nodes spot-node-1 cloud.example.com/spot=true:PreferNoSchedule
# Remove taint (trailing minus)
kubectl taint nodes gpu-node-1 nvidia.com/gpu=present:NoSchedule-
# View taints on a node
kubectl describe node gpu-node-1 | grep -A5 TaintsToleration in Pod Spec
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-training
spec:
template:
spec:
# Tolerate the GPU node taint
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "present"
effect: "NoSchedule"
# Also use nodeSelector to ONLY schedule on GPU nodes
nodeSelector:
node-type: gpu
containers:
- name: training
image: registry.example.com/ml-trainer:v1
resources:
limits:
nvidia.com/gpu: 1Taint Effects Explained
Effect β New Pods β Existing Pods
βββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
NoSchedule β Won't schedule without β Not affected
β matching toleration β (keep running)
βββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
PreferNoScheduleβ Tries to avoid scheduling β Not affected
β (soft, not guaranteed) β
βββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
NoExecute β Won't schedule without β EVICTED if no toleration
β matching toleration β (with optional grace period)
βββββββββββββββββ΄ββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββToleration Operators
# Equal: key, value, and effect must all match
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
# Exists: matches any value for the key (value field ignored)
tolerations:
- key: "dedicated"
operator: "Exists"
effect: "NoSchedule"
# Tolerate all taints with specific key (any effect)
tolerations:
- key: "dedicated"
operator: "Exists"
# Tolerate ALL taints (dangerous β schedule anywhere)
tolerations:
- operator: "Exists"NoExecute with Grace Period
# Pod will stay on tainted node for 300 seconds before eviction
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300 # Evict after 5 minutesDedicated Node Pools Pattern
# Taint nodes by purpose
kubectl taint nodes -l node-pool=gpu nvidia.com/gpu=present:NoSchedule
kubectl taint nodes -l node-pool=highmem dedicated=highmem:NoSchedule
kubectl taint nodes -l node-pool=batch dedicated=batch:NoSchedule# GPU workload
spec:
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "present"
effect: "NoSchedule"
nodeSelector:
node-pool: gpu
---
# High-memory workload
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "highmem"
effect: "NoSchedule"
nodeSelector:
node-pool: highmemNode Maintenance (Drain)
# Cordon (prevent new scheduling) + drain (evict existing pods)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
# This automatically adds taint:
# node.kubernetes.io/unschedulable:NoSchedule
# After maintenance, uncordon
kubectl uncordon node-1Built-in Taints (Added Automatically)
Taint β Added When
ββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
node.kubernetes.io/not-ready β Node condition NotReady
node.kubernetes.io/unreachable β Node controller loses contact
node.kubernetes.io/memory-pressure β Node under memory pressure
node.kubernetes.io/disk-pressure β Node disk full
node.kubernetes.io/pid-pressure β Too many processes on node
node.kubernetes.io/unschedulable β kubectl cordon
node-role.kubernetes.io/control-plane β Control-plane node (kubeadm)
ββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββTaint + Toleration + NodeSelector Pattern
# Best practice: use BOTH taint+toleration AND nodeSelector
# - Taint: repels unwanted pods FROM the node
# - NodeSelector: attracts the pod TO the specific node
# Without nodeSelector, toleration just means "allowed" not "required"
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "ml"
effect: "NoSchedule"
nodeSelector:
workload-type: ml # REQUIRED to ensure pod goes to ML nodesCommon Issues
Pod with toleration scheduling on wrong nodes
- Cause: Toleration allows scheduling on tainted nodes, but doesnβt restrict to them
- Fix: Add
nodeSelectorornodeAffinityto force scheduling on specific nodes
DaemonSet pods not running on tainted nodes
- Cause: DaemonSet pods need explicit tolerations for node taints
- Fix: Add tolerations to DaemonSet pod spec (common for monitoring/logging DaemonSets)
All pods evicted after adding NoExecute taint
- Cause:
NoExecuteevicts all existing pods without matching toleration - Fix: Use
NoScheduleif you only want to prevent new pods; or add tolerations before tainting
Canβt schedule system pods after tainting all nodes
- Cause: CoreDNS, metrics-server need to run somewhere
- Fix: System pods typically tolerate control-plane taints; ensure at least some nodes are available
Best Practices
- Taint + NodeSelector together β taint repels others; nodeSelector attracts yours
- NoSchedule over NoExecute β less disruptive for existing workloads
- Label nodes alongside taints β labels for selection, taints for repulsion
- Tolerate built-in taints in critical DaemonSets β monitoring, logging, CNI
- Use
tolerationSecondswith NoExecute β graceful eviction, not immediate - Donβt tolerate all (
operator: Exists) β defeats the purpose of taints - Document your taint strategy β maintain a node pool design doc
Key Takeaways
- Taints on nodes repel pods; tolerations on pods allow scheduling on tainted nodes
- Three effects:
NoSchedule(block new),PreferNoSchedule(soft),NoExecute(evict existing) - Toleration = βI can schedule hereβ not βI must schedule hereβ β add nodeSelector for that
operator: Equalmatches specific key+value;operator: Existsmatches any value- Built-in taints auto-applied: not-ready, unreachable, memory-pressure, disk-pressure
kubectl drain= cordon + evict;kubectl uncordon= remove unschedulable taint- Pattern: dedicated node pools with taint + label + nodeSelector + toleration

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Master ML lifecycle management with MLflow on Kubernetes β tracking, registry, and deployment.
Start Learning βAutomate Kubernetes node configuration and cluster bootstrapping with Ansible.
Start Learning βCourses by CopyPasteLearn.com β Learn IT by Doing
