πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Storage advanced ⏱ 15 minutes K8s 1.28+

Dell PowerScale NFS Access Zones for Kubernetes AI Storage

Configure Dell PowerScale (Isilon) access zones and SmartConnect pools for Kubernetes AI workloads. Covers groupnet/subnet/pool hierarchy, NFS export isolation per environment, and IP pool sizing for GPU training cluster storage.

By Luca Berton β€’ β€’ πŸ“– 6 min read

πŸ’‘ Quick Answer: Dell PowerScale (Isilon) uses a hierarchy of Groupnet β†’ Subnet β†’ Pool to organize network access to NFS exports. For Kubernetes AI clusters, create separate SmartConnect pools per environment (dev, staging, production) within a shared NFS subnet. Each pool gets a dedicated IP range and DNS name, enabling per-namespace PersistentVolume isolation without separate physical clusters.

The Problem

  • Multiple Kubernetes environments (dev, staging, prod) need isolated NFS storage
  • AI training jobs generate massive I/O β€” need dedicated bandwidth per workload
  • IP address pools must be sized for concurrent NFS client connections
  • SmartConnect DNS-based load balancing requires proper pool configuration
  • Backup and management traffic must be separated from data traffic

The Solution

PowerScale Network Hierarchy

Groupnet (groupnet0)
β”‚   DNS: ns1.example.com, ns2.example.com
β”‚
β”œβ”€β”€ Subnet: subnet-data (10.233.192.0/22)
β”‚   β”‚   Purpose: NFS data traffic for Kubernetes workloads
β”‚   β”‚
β”‚   β”œβ”€β”€ Pool: pool-platform-nfs
β”‚   β”‚   IPs: 10.233.193.1 - 10.233.193.12  (12 IPs)
β”‚   β”‚   SmartConnect: platform-nfs.storage.example.com
β”‚   β”‚   Purpose: Platform services (registry, GitOps, monitoring)
β”‚   β”‚
β”‚   β”œβ”€β”€ Pool: pool-dev-nfs
β”‚   β”‚   IPs: 10.233.193.13 - 10.233.193.24  (12 IPs)
β”‚   β”‚   SmartConnect: dev-nfs.storage.example.com
β”‚   β”‚   Purpose: Development workloads
β”‚   β”‚
β”‚   β”œβ”€β”€ Pool: pool-staging-nfs
β”‚   β”‚   IPs: 10.233.193.37 - 10.233.193.48  (12 IPs)
β”‚   β”‚   SmartConnect: staging-nfs.storage.example.com
β”‚   β”‚   Purpose: Staging/pre-production
β”‚   β”‚
β”‚   └── Pool: pool-prod-nfs
β”‚       IPs: 10.233.195.37 - 10.233.195.48  (12 IPs)
β”‚       SmartConnect: prod-nfs.storage.example.com
β”‚       Purpose: Production training jobs
β”‚
β”œβ”€β”€ Subnet: subnet-smartconnect (10.233.209.0/24)
β”‚   β”‚   Purpose: SmartConnect service IPs (DNS delegation)
β”‚   β”‚
β”‚   └── (SmartConnect zone IPs for DNS round-robin)
β”‚
β”œβ”€β”€ Subnet: subnet-mgmt (10.233.200.0/22)
β”‚   β”‚   Purpose: Cluster management, OneFS web UI, SSH
β”‚   β”‚
β”‚   └── (Admin access IPs)
β”‚
└── Subnet: subnet-backup (10.232.210.0/24)
        Purpose: Backup replication traffic (isolated)

Kubernetes StorageClass Per Pool

# Development StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: powerscale-dev
provisioner: csi-isilon.dellemc.com
parameters:
  ClusterName: "cluster1"
  AccessZone: "dev-zone"
  IsiPath: "/ifs/kubernetes/dev"
  NfsHost: "dev-nfs.storage.example.com"    # SmartConnect pool DNS
  RootClientEnabled: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  - nfsvers=4.1
  - rsize=1048576
  - wsize=1048576

---
# Production StorageClass (larger IO, retain policy)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: powerscale-prod
provisioner: csi-isilon.dellemc.com
parameters:
  ClusterName: "cluster1"
  AccessZone: "prod-zone"
  IsiPath: "/ifs/kubernetes/prod"
  NfsHost: "prod-nfs.storage.example.com"
  RootClientEnabled: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
  - nfsvers=4.1
  - rsize=1048576
  - wsize=1048576
  - hard
  - intr

IP Pool Sizing for AI Workloads

Pool Sizing Formula:
  IPs needed = max_concurrent_nodes Γ— connections_per_node Γ— safety_factor

Example: 16-node GPU cluster, 4 mount points per node, 1.5Γ— safety
  IPs = 16 Γ— 4 Γ— 1.5 = 96 IPs (use /25 subnet minimum)

For SmartConnect load balancing:
  - Round-robin distributes connections across pool IPs
  - Each IP represents one OneFS node's NFS service
  - More IPs = more OneFS nodes serving that pool = more bandwidth

Typical sizing:
  Platform pool:  12 IPs (low traffic, metadata-heavy)
  Dev pool:       12 IPs (moderate, bursty)
  Staging pool:   12 IPs (mirrors production patterns)
  Prod pool:      24+ IPs (high throughput, training data)

Access Zone Configuration

Access Zone: dev-zone
  β”œβ”€β”€ Base directory: /ifs/kubernetes/dev
  β”œβ”€β”€ Authentication: System (UID/GID mapping)
  β”œβ”€β”€ SmartConnect pool: pool-dev-nfs
  β”œβ”€β”€ NFS exports:
  β”‚   β”œβ”€β”€ /ifs/kubernetes/dev/datasets    (read-only for training)
  β”‚   β”œβ”€β”€ /ifs/kubernetes/dev/checkpoints (read-write for model saves)
  β”‚   └── /ifs/kubernetes/dev/scratch     (read-write, no snapshots)
  └── Client restrictions: 10.128.0.0/14 (Kubernetes pod CIDR)

Access Zone: prod-zone
  β”œβ”€β”€ Base directory: /ifs/kubernetes/prod
  β”œβ”€β”€ Authentication: System + LDAP (audit trail)
  β”œβ”€β”€ SmartConnect pool: pool-prod-nfs
  β”œβ”€β”€ NFS exports:
  β”‚   β”œβ”€β”€ /ifs/kubernetes/prod/datasets    (read-only, snapshotted)
  β”‚   β”œβ”€β”€ /ifs/kubernetes/prod/checkpoints (read-write, replicated)
  β”‚   └── /ifs/kubernetes/prod/models      (read-only, published models)
  └── Client restrictions: 10.128.0.0/14 (Kubernetes pod CIDR)

NFS Mount Options for GPU Training

# PersistentVolume for training datasets
apiVersion: v1
kind: PersistentVolume
metadata:
  name: training-dataset-prod
spec:
  capacity:
    storage: 10Ti
  accessModes:
    - ReadOnlyMany          # Datasets are read-only during training
  nfs:
    server: prod-nfs.storage.example.com    # SmartConnect DNS
    path: /ifs/kubernetes/prod/datasets
  mountOptions:
    - nfsvers=4.1           # NFSv4.1 for session trunking
    - rsize=1048576         # 1MB read chunks (large sequential reads)
    - wsize=1048576         # 1MB write chunks
    - hard                  # Retry indefinitely (don't corrupt training)
    - intr                  # Allow interrupt (Ctrl+C kills hung mount)
    - noatime               # Don't update access time (reduces metadata IO)
    - nodiratime            # Don't update directory access time

Network Separation Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    PowerScale Cluster                         β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  NFS Data Subnetβ”‚  β”‚ SmartConnect β”‚  β”‚  Mgmt Subnet  β”‚  β”‚
β”‚  β”‚  10.233.192.0/22β”‚  β”‚ 10.233.209/24β”‚  β”‚ 10.233.200/22 β”‚  β”‚
β”‚  β”‚                 β”‚  β”‚              β”‚  β”‚               β”‚  β”‚
β”‚  β”‚  ←── K8s pods   β”‚  β”‚ ←── DNS SVC  β”‚  β”‚ ←── Admins   β”‚  β”‚
β”‚  β”‚  (data I/O)     β”‚  β”‚ (round-robin)β”‚  β”‚ (Web UI/SSH) β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                        β”‚
β”‚  β”‚  Backup Subnet  β”‚                                        β”‚
β”‚  β”‚  10.232.210.0/24β”‚                                        β”‚
β”‚  β”‚                 β”‚                                        β”‚
β”‚  β”‚  ←── Replicationβ”‚                                        β”‚
β”‚  β”‚  (DR traffic)   β”‚                                        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why separate subnets:
  - Data traffic doesn't compete with backup replication
  - Management access is firewall-restricted (admin only)
  - SmartConnect needs its own IPs for DNS delegation
  - Each subnet can have different MTU (9000 for data, 1500 for mgmt)

Common Issues

SmartConnect DNS not resolving

  • Cause: SmartConnect zone not delegated in corporate DNS
  • Fix: Create NS delegation: storage.example.com β†’ PowerScale SmartConnect IPs

NFS mounts hanging during GPU training

  • Cause: IP pool exhausted, SmartConnect returning stale IPs
  • Fix: Increase pool IP count; verify all OneFS nodes are healthy in pool

Cross-environment data leakage

  • Cause: Access zones not properly restricting client IPs
  • Fix: Set client restriction per zone to only allow Kubernetes pod CIDR

Unbalanced I/O across OneFS nodes

  • Cause: SmartConnect using connection count balancing (not throughput)
  • Fix: Switch SmartConnect policy to β€œRound Robin” or β€œCPU Usage” for AI workloads

Model checkpoint writes slow

  • Cause: Small rsize/wsize (default 32KB) causing excessive NFS operations
  • Fix: Set rsize=1048576,wsize=1048576 in mountOptions (1MB chunks)

Best Practices

  1. One SmartConnect pool per environment β€” isolates failure domains
  2. Separate subnets for data vs management β€” prevents contention
  3. Size IP pools for peak concurrency β€” not steady-state
  4. Use NFSv4.1 β€” session trunking, better locking, delegation
  5. Large rsize/wsize (1MB) for training data β€” sequential reads benefit most
  6. ReadOnlyMany for datasets β€” prevents accidental corruption during training
  7. Hard mount + intr for training β€” never silently fail, but allow kill
  8. Snapshot datasets, not scratch β€” scratch dirs regenerate; datasets are precious

Key Takeaways

  • PowerScale hierarchy: Groupnet β†’ Subnet β†’ Pool β†’ Access Zone
  • Each K8s environment gets its own SmartConnect pool (DNS name + IP range)
  • Separate NFS data, SmartConnect, management, and backup on different subnets
  • IP pool size determines max concurrent NFS clients (plan for all GPU nodes)
  • Mount options critical for AI: NFSv4.1, 1MB chunks, hard+intr, noatime
  • Access zones enforce path and client isolation between environments
  • SmartConnect load balances connections across OneFS nodes in the pool
#storage #nfs #networking #configuration #architecture
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens