πŸ“šBook Signing at KubeCon EU 2026Meet us at Booking.com HQ (Mon 18:30-21:00) & vCluster booth #521 (Tue 24 Mar, 12:30-1:30pm) β€” free book giveaway!RSVP Booking.com Event
Deployments intermediate ⏱ 15 minutes K8s 1.28+

CloudNativePG PostgreSQL Operator

Deploy highly available PostgreSQL clusters on Kubernetes using CloudNativePG operator with automated failover and backups.

By Luca Berton β€’ β€’ πŸ“– 5 min read

πŸ’‘ Quick Answer: Install CloudNativePG operator and create a Cluster CR to get a production-ready PostgreSQL cluster with streaming replication, automatic failover, and continuous backup to S3.

The Problem

Running PostgreSQL on Kubernetes with StatefulSets requires manual replication setup, failover scripting, backup orchestration, and connection pooling. A single misconfigured replica can cause data loss. You need an operator that handles the full PostgreSQL lifecycle natively.

The Solution

CloudNativePG (CNPG) manages the entire PostgreSQL lifecycle β€” provisioning, replication, failover, backup, and monitoring β€” through Kubernetes-native CRDs.

Install CloudNativePG Operator

# Install via Helm
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update

helm install cnpg cnpg/cloudnative-pg \
  --namespace cnpg-system \
  --create-namespace \
  --set monitoring.podMonitorEnabled=true

# Verify operator is running
kubectl get pods -n cnpg-system
kubectl get crds | grep cnpg

Basic PostgreSQL Cluster

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: app-db
  namespace: production
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16.4

  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
      effective_cache_size: "768MB"
      work_mem: "8MB"
      maintenance_work_mem: "128MB"
      wal_buffers: "16MB"
      max_wal_size: "2GB"
      min_wal_size: "512MB"

  bootstrap:
    initdb:
      database: appdb
      owner: appuser
      secret:
        name: app-db-credentials

  storage:
    size: 50Gi
    storageClass: gp3-encrypted

  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: "2"
      memory: 2Gi

  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname

Database Credentials Secret

apiVersion: v1
kind: Secret
metadata:
  name: app-db-credentials
  namespace: production
type: kubernetes.io/basic-auth
stringData:
  username: appuser
  password: "change-me-to-a-strong-password"

Continuous Backup to S3

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: app-db
  namespace: production
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16.4

  bootstrap:
    initdb:
      database: appdb
      owner: appuser

  storage:
    size: 50Gi
    storageClass: gp3-encrypted

  backup:
    barmanObjectStore:
      destinationPath: s3://my-pg-backups/app-db/
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: SECRET_ACCESS_KEY
      wal:
        compression: gzip
        maxParallel: 4
      data:
        compression: gzip
    retentionPolicy: "30d"

Scheduled Backups

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: app-db-daily
  namespace: production
spec:
  schedule: "0 0 2 * * *"  # Daily at 2 AM
  backupOwnerReference: self
  cluster:
    name: app-db
  method: barmanObjectStore

Restore from Backup (PITR)

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: app-db-restored
  namespace: production
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16.4

  bootstrap:
    recovery:
      source: app-db-backup
      recoveryTarget:
        targetTime: "2026-03-13T07:00:00Z"

  externalClusters:
    - name: app-db-backup
      barmanObjectStore:
        destinationPath: s3://my-pg-backups/app-db/
        s3Credentials:
          accessKeyId:
            name: s3-creds
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: s3-creds
            key: SECRET_ACCESS_KEY

  storage:
    size: 50Gi
    storageClass: gp3-encrypted

Connection Pooling with PgBouncer

apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: app-db-pooler-rw
  namespace: production
spec:
  cluster:
    name: app-db
  instances: 2
  type: rw
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "1000"
      default_pool_size: "25"
      min_pool_size: "5"
  template:
    metadata:
      labels:
        app: app-db-pooler
    spec:
      containers:
        - name: pgbouncer
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
---
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: app-db-pooler-ro
  namespace: production
spec:
  cluster:
    name: app-db
  instances: 2
  type: ro
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "2000"
      default_pool_size: "50"

Application Connection

# Services created automatically by CNPG:
# app-db-rw   β†’ primary (read-write)
# app-db-ro   β†’ replicas (read-only)
# app-db-r    β†’ any instance (round-robin)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  template:
    spec:
      containers:
        - name: app
          image: myapp:latest
          env:
            # Write connection via pooler
            - name: DATABASE_URL
              value: "postgresql://appuser@app-db-pooler-rw:5432/appdb"
            # Read connection via pooler
            - name: DATABASE_READ_URL
              value: "postgresql://appuser@app-db-pooler-ro:5432/appdb"
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: app-db-app
                  key: password

Monitoring with Prometheus

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: app-db
  namespace: production
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16.4

  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
      - name: cnpg-default-monitoring
        key: queries

  storage:
    size: 50Gi
---
# Import CNPG Grafana dashboard
# Dashboard ID: 20417 (CloudNativePG)

Verify Cluster Health

# Cluster status
kubectl cnpg status app-db -n production

# Check replication lag
kubectl cnpg status app-db -n production --verbose

# Promote a replica (manual failover)
kubectl cnpg promote app-db app-db-2 -n production

# List backups
kubectl get backups -n production

# Check WAL archiving
kubectl cnpg status app-db -n production | grep -A5 "WAL archiving"

# Connect to primary
kubectl cnpg psql app-db -n production -- -c "SELECT pg_is_in_recovery();"

# Benchmark
kubectl cnpg pgbench app-db -n production \
  --job-name=bench-init -- --initialize --scale=10
kubectl cnpg pgbench app-db -n production \
  --job-name=bench-run -- --time=60 --client=10 --jobs=2
graph TD
    A[CNPG Operator] --> B[Cluster CR]
    B --> C[Primary Pod app-db-1]
    B --> D[Replica Pod app-db-2]
    B --> E[Replica Pod app-db-3]
    C -->|Streaming Replication| D
    C -->|Streaming Replication| E
    C --> F[app-db-rw Service]
    D --> G[app-db-ro Service]
    E --> G
    F --> H[PgBouncer Pooler RW]
    G --> I[PgBouncer Pooler RO]
    C -->|WAL Archive| J[S3 Backup]
    K[ScheduledBackup] -->|Daily| J

Common Issues

  • Cluster stuck in Setting up primary β€” check StorageClass exists and PVC can bind; verify kubectl get pvc -n production
  • Replication lag increasing β€” check replica resource limits; increase max_wal_senders and network bandwidth
  • Backup failing to S3 β€” verify S3 credentials secret exists and IAM role has s3:PutObject, s3:GetObject, s3:ListBucket
  • Failover not happening β€” CNPG uses lease-based failover; check operator logs kubectl logs -n cnpg-system deploy/cnpg-cloudnative-pg
  • PgBouncer connection errors β€” ensure max_client_conn in Pooler > total app connections; check default_pool_size matches PostgreSQL max_connections

Best Practices

  • Always deploy 3+ instances for HA with pod anti-affinity across nodes
  • Enable continuous WAL archiving to S3/GCS from day one β€” not just scheduled backups
  • Use PgBouncer Pooler for connection management β€” prevents connection exhaustion
  • Separate read-write and read-only traffic via app-db-rw and app-db-ro services
  • Set retentionPolicy to keep at least 7 days of backups
  • Install the kubectl cnpg plugin for cluster management
  • Enable PodMonitor for Prometheus metrics and import Grafana dashboard 20417
  • Test PITR recovery regularly in a staging environment

Key Takeaways

  • CNPG manages PostgreSQL lifecycle entirely through Kubernetes CRDs
  • Automatic failover with streaming replication and lease-based leader election
  • Built-in continuous backup to S3/GCS/Azure with point-in-time recovery
  • PgBouncer Pooler CRD handles connection pooling natively
  • Three auto-created Services: -rw (primary), -ro (replicas), -r (any)
  • kubectl cnpg plugin provides status, failover, psql, and benchmark commands
#cnpg #postgresql #database #operator #high-availability
Luca Berton
Written by Luca Berton

Principal Solutions Architect specializing in Kubernetes, AI/GPU infrastructure, and cloud-native platforms. Author of Kubernetes Recipes and creator of CopyPasteLearn courses.

Kubernetes Recipes book cover

Want More Kubernetes Recipes?

This recipe is from Kubernetes Recipes, our 750-page practical guide with hundreds of production-ready patterns.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens