NetworkPolicy Deny-Default for GPU Tenants
Implement deny-by-default NetworkPolicy for GPU tenant namespaces with NCCL port exceptions and DNS egress on Kubernetes.
π‘ Quick Answer: Apply a deny-all NetworkPolicy first, then add allow rules for intra-namespace traffic (including NCCL ports), DNS egress to kube-system, and specific cross-namespace services. NCCL uses dynamic ports β allow all ports within namespace.
The Problem
Without NetworkPolicy, any pod can reach any other pod across namespaces. A compromised training job in tenant-alpha could scan and access services in tenant-beta. GPU workloads add complexity because NCCL distributed training uses dynamic high ports for inter-pod communication.
The Solution
Deny All (Base Policy)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all
namespace: tenant-alpha
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressAllow Intra-Namespace (Including NCCL)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: tenant-alpha
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector: {}
# All ports β NCCL uses dynamic ports (29500, 29400, etc.)
egress:
- to:
- podSelector: {}Allow DNS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: tenant-alpha
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: openshift-dns
ports:
- protocol: UDP
port: 5353
- protocol: TCP
port: 5353Allow Ingress from HAProxy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress
namespace: tenant-alpha
spec:
podSelector:
matchLabels:
app: inference-server
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: openshift-ingress
ports:
- port: 8080Allow Monitoring Scrape
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-monitoring
namespace: tenant-alpha
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: openshift-monitoring
ports:
- port: 9090
- port: 9400 # DCGM exportergraph TD
A[deny-all base] --> B[Block everything]
C[allow-same-namespace] --> D[NCCL training pods]
C --> E[Service mesh within tenant]
F[allow-dns] --> G[DNS resolution only]
H[allow-ingress] --> I[HAProxy to inference pods]
J[allow-monitoring] --> K[Prometheus scrape]
L[Tenant Alpha] -->|Blocked| M[Tenant Beta]
L -->|Blocked| N[Tenant Gamma]Common Issues
- NCCL training fails β deny-all blocks inter-pod communication; allow-same-namespace must cover all ports (NCCL uses 29400-29500+ dynamically)
- DNS resolution fails β OpenShift uses port 5353 on openshift-dns, not standard 53 on kube-system
- Prometheus canβt scrape β add allow-monitoring policy for openshift-monitoring namespace
- External image pulls blocked β egress to container registries may need explicit allow rules
Best Practices
- Always start with deny-all, then add specific allows
- Allow all ports within namespace for NCCL β restricting ports breaks distributed training
- Include DNS egress in every tenant β pods canβt function without name resolution
- Test NetworkPolicy with
kubectl exec+curlbefore deploying training jobs - Use namespace labels (not IP ranges) for policy selectors β IPs change, labels donβt
Key Takeaways
- Deny-by-default is the foundation of multi-tenant GPU security
- NCCL needs all ports allowed between pods in the same namespace
- DNS egress is required for basic pod functionality
- Cross-namespace traffic is blocked β each tenant is isolated
- NetworkPolicy is enforced by the CNI plugin (OVN-Kubernetes on OpenShift)

Recommended
Kubernetes Recipes β The Complete Book100+ production-ready patterns with detailed explanations, best practices, and copy-paste YAML. Everything in one place.
Get the Book βLearn by Doing
CopyPasteLearn β Hands-on Cloud & DevOps CoursesMaster Kubernetes, Ansible, Terraform, and MLOps with interactive, copy-paste-run lessons. Start free.
Browse Courses βπ Deepen Your Skills β Hands-on Courses
Courses by CopyPasteLearn.com β Learn IT by Doing
