Dell PowerScale NFS Access Zones for Kubernetes AI Storage
Configure Dell PowerScale (Isilon) access zones and SmartConnect pools for Kubernetes AI workloads. Covers groupnet/subnet/pool hierarchy, NFS export isolation per environment, and IP pool sizing for GPU training cluster storage.
π‘ Quick Answer: Dell PowerScale (Isilon) uses a hierarchy of Groupnet β Subnet β Pool to organize network access to NFS exports. For Kubernetes AI clusters, create separate SmartConnect pools per environment (dev, staging, production) within a shared NFS subnet. Each pool gets a dedicated IP range and DNS name, enabling per-namespace PersistentVolume isolation without separate physical clusters.
The Problem
- Multiple Kubernetes environments (dev, staging, prod) need isolated NFS storage
- AI training jobs generate massive I/O β need dedicated bandwidth per workload
- IP address pools must be sized for concurrent NFS client connections
- SmartConnect DNS-based load balancing requires proper pool configuration
- Backup and management traffic must be separated from data traffic
The Solution
PowerScale Network Hierarchy
Groupnet (groupnet0)
β DNS: ns1.example.com, ns2.example.com
β
βββ Subnet: subnet-data (10.233.192.0/22)
β β Purpose: NFS data traffic for Kubernetes workloads
β β
β βββ Pool: pool-platform-nfs
β β IPs: 10.233.193.1 - 10.233.193.12 (12 IPs)
β β SmartConnect: platform-nfs.storage.example.com
β β Purpose: Platform services (registry, GitOps, monitoring)
β β
β βββ Pool: pool-dev-nfs
β β IPs: 10.233.193.13 - 10.233.193.24 (12 IPs)
β β SmartConnect: dev-nfs.storage.example.com
β β Purpose: Development workloads
β β
β βββ Pool: pool-staging-nfs
β β IPs: 10.233.193.37 - 10.233.193.48 (12 IPs)
β β SmartConnect: staging-nfs.storage.example.com
β β Purpose: Staging/pre-production
β β
β βββ Pool: pool-prod-nfs
β IPs: 10.233.195.37 - 10.233.195.48 (12 IPs)
β SmartConnect: prod-nfs.storage.example.com
β Purpose: Production training jobs
β
βββ Subnet: subnet-smartconnect (10.233.209.0/24)
β β Purpose: SmartConnect service IPs (DNS delegation)
β β
β βββ (SmartConnect zone IPs for DNS round-robin)
β
βββ Subnet: subnet-mgmt (10.233.200.0/22)
β β Purpose: Cluster management, OneFS web UI, SSH
β β
β βββ (Admin access IPs)
β
βββ Subnet: subnet-backup (10.232.210.0/24)
Purpose: Backup replication traffic (isolated)Kubernetes StorageClass Per Pool
# Development StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: powerscale-dev
provisioner: csi-isilon.dellemc.com
parameters:
ClusterName: "cluster1"
AccessZone: "dev-zone"
IsiPath: "/ifs/kubernetes/dev"
NfsHost: "dev-nfs.storage.example.com" # SmartConnect pool DNS
RootClientEnabled: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
---
# Production StorageClass (larger IO, retain policy)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: powerscale-prod
provisioner: csi-isilon.dellemc.com
parameters:
ClusterName: "cluster1"
AccessZone: "prod-zone"
IsiPath: "/ifs/kubernetes/prod"
NfsHost: "prod-nfs.storage.example.com"
RootClientEnabled: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
- hard
- intrIP Pool Sizing for AI Workloads
Pool Sizing Formula:
IPs needed = max_concurrent_nodes Γ connections_per_node Γ safety_factor
Example: 16-node GPU cluster, 4 mount points per node, 1.5Γ safety
IPs = 16 Γ 4 Γ 1.5 = 96 IPs (use /25 subnet minimum)
For SmartConnect load balancing:
- Round-robin distributes connections across pool IPs
- Each IP represents one OneFS node's NFS service
- More IPs = more OneFS nodes serving that pool = more bandwidth
Typical sizing:
Platform pool: 12 IPs (low traffic, metadata-heavy)
Dev pool: 12 IPs (moderate, bursty)
Staging pool: 12 IPs (mirrors production patterns)
Prod pool: 24+ IPs (high throughput, training data)Access Zone Configuration
Access Zone: dev-zone
βββ Base directory: /ifs/kubernetes/dev
βββ Authentication: System (UID/GID mapping)
βββ SmartConnect pool: pool-dev-nfs
βββ NFS exports:
β βββ /ifs/kubernetes/dev/datasets (read-only for training)
β βββ /ifs/kubernetes/dev/checkpoints (read-write for model saves)
β βββ /ifs/kubernetes/dev/scratch (read-write, no snapshots)
βββ Client restrictions: 10.128.0.0/14 (Kubernetes pod CIDR)
Access Zone: prod-zone
βββ Base directory: /ifs/kubernetes/prod
βββ Authentication: System + LDAP (audit trail)
βββ SmartConnect pool: pool-prod-nfs
βββ NFS exports:
β βββ /ifs/kubernetes/prod/datasets (read-only, snapshotted)
β βββ /ifs/kubernetes/prod/checkpoints (read-write, replicated)
β βββ /ifs/kubernetes/prod/models (read-only, published models)
βββ Client restrictions: 10.128.0.0/14 (Kubernetes pod CIDR)NFS Mount Options for GPU Training
# PersistentVolume for training datasets
apiVersion: v1
kind: PersistentVolume
metadata:
name: training-dataset-prod
spec:
capacity:
storage: 10Ti
accessModes:
- ReadOnlyMany # Datasets are read-only during training
nfs:
server: prod-nfs.storage.example.com # SmartConnect DNS
path: /ifs/kubernetes/prod/datasets
mountOptions:
- nfsvers=4.1 # NFSv4.1 for session trunking
- rsize=1048576 # 1MB read chunks (large sequential reads)
- wsize=1048576 # 1MB write chunks
- hard # Retry indefinitely (don't corrupt training)
- intr # Allow interrupt (Ctrl+C kills hung mount)
- noatime # Don't update access time (reduces metadata IO)
- nodiratime # Don't update directory access timeNetwork Separation Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PowerScale Cluster β
β β
β βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β NFS Data Subnetβ β SmartConnect β β Mgmt Subnet β β
β β 10.233.192.0/22β β 10.233.209/24β β 10.233.200/22 β β
β β β β β β β β
β β βββ K8s pods β β βββ DNS SVC β β βββ Admins β β
β β (data I/O) β β (round-robin)β β (Web UI/SSH) β β
β βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β
β βββββββββββββββββββ β
β β Backup Subnet β β
β β 10.232.210.0/24β β
β β β β
β β βββ Replicationβ β
β β (DR traffic) β β
β βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why separate subnets:
- Data traffic doesn't compete with backup replication
- Management access is firewall-restricted (admin only)
- SmartConnect needs its own IPs for DNS delegation
- Each subnet can have different MTU (9000 for data, 1500 for mgmt)Common Issues
SmartConnect DNS not resolving
- Cause: SmartConnect zone not delegated in corporate DNS
- Fix: Create NS delegation:
storage.example.com β PowerScale SmartConnect IPs
NFS mounts hanging during GPU training
- Cause: IP pool exhausted, SmartConnect returning stale IPs
- Fix: Increase pool IP count; verify all OneFS nodes are healthy in pool
Cross-environment data leakage
- Cause: Access zones not properly restricting client IPs
- Fix: Set client restriction per zone to only allow Kubernetes pod CIDR
Unbalanced I/O across OneFS nodes
- Cause: SmartConnect using connection count balancing (not throughput)
- Fix: Switch SmartConnect policy to βRound Robinβ or βCPU Usageβ for AI workloads
Model checkpoint writes slow
- Cause: Small rsize/wsize (default 32KB) causing excessive NFS operations
- Fix: Set
rsize=1048576,wsize=1048576in mountOptions (1MB chunks)
Best Practices
- One SmartConnect pool per environment β isolates failure domains
- Separate subnets for data vs management β prevents contention
- Size IP pools for peak concurrency β not steady-state
- Use NFSv4.1 β session trunking, better locking, delegation
- Large rsize/wsize (1MB) for training data β sequential reads benefit most
- ReadOnlyMany for datasets β prevents accidental corruption during training
- Hard mount + intr for training β never silently fail, but allow kill
- Snapshot datasets, not scratch β scratch dirs regenerate; datasets are precious
Key Takeaways
- PowerScale hierarchy: Groupnet β Subnet β Pool β Access Zone
- Each K8s environment gets its own SmartConnect pool (DNS name + IP range)
- Separate NFS data, SmartConnect, management, and backup on different subnets
- IP pool size determines max concurrent NFS clients (plan for all GPU nodes)
- Mount options critical for AI: NFSv4.1, 1MB chunks, hard+intr, noatime
- Access zones enforce path and client isolation between environments
- SmartConnect load balances connections across OneFS nodes in the pool

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
