Database

S4E On-Prem uses PostgreSQL as its primary relational database. This guide covers deployment, configuration, schema management, and operational best practices.

Overview

PostgreSQL stores:

User accounts and authentication data
Asset inventories and metadata
Scan definitions and configurations
Vulnerability findings and scan results
Actions, playbooks, and audit logs
System configuration and feature flags

Deployment Options

Option 1: In-Cluster PostgreSQL (Helm Subchart)

The S4E umbrella Helm chart includes a PostgreSQL subchart that deploys a database instance within your Kubernetes cluster.

# s4e-values.yaml
postgresql:
  enabled: true
  auth:
    username: s4e_app
    password: "<strong-password>"
    database: s4e_production
  primary:
    persistence:
      enabled: true
      size: 100Gi
      storageClass: ssd
    resources:
      requests:
        cpu: 1000m
        memory: 2Gi
      limits:
        cpu: 4000m
        memory: 8Gi

Suitable for

Small to medium deployments. For large-scale or high-availability requirements, consider an external managed database.

Option 2: External PostgreSQL

Connect S4E to an existing PostgreSQL instance (self-managed or cloud-hosted):

# s4e-values.yaml
postgresql:
  enabled: false

core:
  env:
    DB_HOST: "pg-cluster.database.internal"
    DB_PORT: "5432"
    DB_NAME: "s4e_production"
    DB_USER: "s4e_app"
    DB_SSL_MODE: "verify-full"
  secrets:
    DB_PASS: "<database-password>"

Supported external PostgreSQL services:

Provider	Service
Self-hosted	PostgreSQL 13+
AWS	Amazon RDS for PostgreSQL
Azure	Azure Database for PostgreSQL
GCP	Cloud SQL for PostgreSQL

Option 3: High-Availability Cluster

For production environments requiring HA, deploy PostgreSQL with streaming replication:

Use Patroni or CrunchyData PGO for automated failover.
Configure a primary with one or more read replicas.
Point S4E to the primary for writes and optionally to replicas for read-heavy operations.

core:
  env:
    DB_HOST: "pg-primary.s4e.svc.cluster.local"
    DB_READ_HOST: "pg-replica.s4e.svc.cluster.local"
    DB_PORT: "5432"

Schema Management

S4E uses Flask-Migrate (Alembic) for database schema migrations. Migrations are bundled with each release and applied during upgrades.

Applying Migrations

After installation or upgrade:

kubectl -n s4e exec -it deployment/s4e-core -- flask db upgrade

Checking Migration Status

kubectl -n s4e exec -it deployment/s4e-core -- flask db current
kubectl -n s4e exec -it deployment/s4e-core -- flask db history

Backup before migration

Always create a database backup before applying migrations, especially for major version upgrades. See the backup section below.

Connection Configuration

Connection Pooling

S4E services use SQLAlchemy connection pooling. Tune these parameters based on your replica count and database capacity:

Parameter	Environment Variable	Default	Guidance
Pool size	`DB_POOL_SIZE`	10	Number of persistent connections per service replica
Max overflow	`DB_MAX_OVERFLOW`	20	Additional connections allowed beyond pool size
Pool timeout	`DB_POOL_TIMEOUT`	30	Seconds to wait for a connection
Pool recycle	`DB_POOL_RECYCLE`	1800	Seconds before a connection is recycled

Total connection estimate:

Total connections = (DB_POOL_SIZE + DB_MAX_OVERFLOW) x total_replicas

For example, with 2 core replicas, 3 scan replicas, and 2 trigger replicas:

(10 + 20) x 7 = 210 maximum connections

PostgreSQL max_connections

Ensure your PostgreSQL max_connections setting exceeds the calculated total. The default PostgreSQL value of 100 is insufficient for most S4E deployments.

SSL Configuration

For encrypted connections to PostgreSQL:

core:
  env:
    DB_SSL_MODE: "verify-full"
    DB_SSL_CA: "/etc/ssl/certs/pg-ca.crt"

Mount the CA certificate as a Kubernetes Secret or ConfigMap.

Performance Tuning

PostgreSQL Configuration

Recommended postgresql.conf settings for S4E workloads:

# Memory
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 64MB
maintenance_work_mem = 512MB

# Write-Ahead Log
wal_buffers = 64MB
min_wal_size = 1GB
max_wal_size = 4GB

# Query Planning
random_page_cost = 1.1
effective_io_concurrency = 200

# Connections
max_connections = 300

# Logging
log_min_duration_statement = 1000
log_checkpoints = on
log_connections = on
log_disconnections = on

Memory tuning

Set shared_buffers to approximately 25% of available RAM. Set effective_cache_size to approximately 75% of available RAM.

Index Maintenance

S4E migrations create the necessary indexes. For ongoing maintenance:

-- Analyze tables for query planner statistics
ANALYZE;

-- Reindex if query performance degrades
REINDEX DATABASE s4e_production;

-- Check for bloated tables
SELECT schemaname, tablename, n_dead_tup, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Backup and Restore

Automated Backups

Configure a CronJob for regular backups:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: pg-backup
  namespace: s4e
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: pg-backup
              image: postgres:15
              command:
                - /bin/sh
                - -c
                - |
                  pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
                    -Fc -f /backups/s4e-$(date +%Y%m%d-%H%M%S).dump
              envFrom:
                - secretRef:
                    name: s4e-db-credentials
              volumeMounts:
                - name: backup-volume
                  mountPath: /backups
          volumes:
            - name: backup-volume
              persistentVolumeClaim:
                claimName: pg-backup-pvc
          restartPolicy: OnFailure

Manual Backup

kubectl -n s4e exec -it statefulset/postgresql -- \
  pg_dump -U s4e_app -d s4e_production -Fc -f /tmp/backup.dump

kubectl -n s4e cp postgresql-0:/tmp/backup.dump ./s4e-backup.dump

Restore

kubectl -n s4e cp ./s4e-backup.dump postgresql-0:/tmp/backup.dump

kubectl -n s4e exec -it statefulset/postgresql -- \
  pg_restore -U s4e_app -d s4e_production --clean --if-exists /tmp/backup.dump

Restore impact

Restoring a backup will overwrite all current data. Stop all S4E application services before restoring to prevent data corruption.

Monitoring

Key PostgreSQL metrics to monitor:

Metric	Warning Threshold	Critical Threshold
Active connections	> 80% of max_connections	> 95% of max_connections
Transaction rate	Baseline-dependent	Sudden drop to zero
Replication lag (HA)	> 10 seconds	> 60 seconds
Disk usage	> 80%	> 90%
Dead tuples ratio	> 20% of live tuples	> 50% of live tuples

Next Steps

RabbitMQ configuration -- set up the message broker.
Environment variables -- complete variable reference.
Recovery procedures -- disaster recovery planning.