Database
S4E On-Prem uses PostgreSQL as its primary relational database. This guide covers deployment, configuration, schema management, and operational best practices.
Overview
PostgreSQL stores:
- User accounts and authentication data
- Asset inventories and metadata
- Scan definitions and configurations
- Vulnerability findings and scan results
- Actions, playbooks, and audit logs
- System configuration and feature flags
Deployment Options
Option 1: In-Cluster PostgreSQL (Helm Subchart)
The S4E umbrella Helm chart includes a PostgreSQL subchart that deploys a database instance within your Kubernetes cluster.
# s4e-values.yaml
postgresql:
enabled: true
auth:
username: s4e_app
password: "<strong-password>"
database: s4e_production
primary:
persistence:
enabled: true
size: 100Gi
storageClass: ssd
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 4000m
memory: 8Gi
Suitable for
Small to medium deployments. For large-scale or high-availability requirements, consider an external managed database.
Option 2: External PostgreSQL
Connect S4E to an existing PostgreSQL instance (self-managed or cloud-hosted):
# s4e-values.yaml
postgresql:
enabled: false
core:
env:
DB_HOST: "pg-cluster.database.internal"
DB_PORT: "5432"
DB_NAME: "s4e_production"
DB_USER: "s4e_app"
DB_SSL_MODE: "verify-full"
secrets:
DB_PASS: "<database-password>"
Supported external PostgreSQL services:
| Provider | Service |
|---|---|
| Self-hosted | PostgreSQL 13+ |
| AWS | Amazon RDS for PostgreSQL |
| Azure | Azure Database for PostgreSQL |
| GCP | Cloud SQL for PostgreSQL |
Option 3: High-Availability Cluster
For production environments requiring HA, deploy PostgreSQL with streaming replication:
- Use Patroni or CrunchyData PGO for automated failover.
- Configure a primary with one or more read replicas.
- Point S4E to the primary for writes and optionally to replicas for read-heavy operations.
core:
env:
DB_HOST: "pg-primary.s4e.svc.cluster.local"
DB_READ_HOST: "pg-replica.s4e.svc.cluster.local"
DB_PORT: "5432"
Schema Management
S4E uses Flask-Migrate (Alembic) for database schema migrations. Migrations are bundled with each release and applied during upgrades.
Applying Migrations
After installation or upgrade:
Checking Migration Status
kubectl -n s4e exec -it deployment/s4e-core -- flask db current
kubectl -n s4e exec -it deployment/s4e-core -- flask db history
Backup before migration
Always create a database backup before applying migrations, especially for major version upgrades. See the backup section below.
Connection Configuration
Connection Pooling
S4E services use SQLAlchemy connection pooling. Tune these parameters based on your replica count and database capacity:
| Parameter | Environment Variable | Default | Guidance |
|---|---|---|---|
| Pool size | DB_POOL_SIZE |
10 | Number of persistent connections per service replica |
| Max overflow | DB_MAX_OVERFLOW |
20 | Additional connections allowed beyond pool size |
| Pool timeout | DB_POOL_TIMEOUT |
30 | Seconds to wait for a connection |
| Pool recycle | DB_POOL_RECYCLE |
1800 | Seconds before a connection is recycled |
Total connection estimate:
For example, with 2 core replicas, 3 scan replicas, and 2 trigger replicas:
PostgreSQL max_connections
Ensure your PostgreSQL max_connections setting exceeds the calculated total. The default PostgreSQL value of 100 is insufficient for most S4E deployments.
SSL Configuration
For encrypted connections to PostgreSQL:
Mount the CA certificate as a Kubernetes Secret or ConfigMap.
Performance Tuning
PostgreSQL Configuration
Recommended postgresql.conf settings for S4E workloads:
# Memory
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 64MB
maintenance_work_mem = 512MB
# Write-Ahead Log
wal_buffers = 64MB
min_wal_size = 1GB
max_wal_size = 4GB
# Query Planning
random_page_cost = 1.1
effective_io_concurrency = 200
# Connections
max_connections = 300
# Logging
log_min_duration_statement = 1000
log_checkpoints = on
log_connections = on
log_disconnections = on
Memory tuning
Set shared_buffers to approximately 25% of available RAM. Set effective_cache_size to approximately 75% of available RAM.
Index Maintenance
S4E migrations create the necessary indexes. For ongoing maintenance:
-- Analyze tables for query planner statistics
ANALYZE;
-- Reindex if query performance degrades
REINDEX DATABASE s4e_production;
-- Check for bloated tables
SELECT schemaname, tablename, n_dead_tup, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
Backup and Restore
Automated Backups
Configure a CronJob for regular backups:
apiVersion: batch/v1
kind: CronJob
metadata:
name: pg-backup
namespace: s4e
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: pg-backup
image: postgres:15
command:
- /bin/sh
- -c
- |
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
-Fc -f /backups/s4e-$(date +%Y%m%d-%H%M%S).dump
envFrom:
- secretRef:
name: s4e-db-credentials
volumeMounts:
- name: backup-volume
mountPath: /backups
volumes:
- name: backup-volume
persistentVolumeClaim:
claimName: pg-backup-pvc
restartPolicy: OnFailure
Manual Backup
kubectl -n s4e exec -it statefulset/postgresql -- \
pg_dump -U s4e_app -d s4e_production -Fc -f /tmp/backup.dump
kubectl -n s4e cp postgresql-0:/tmp/backup.dump ./s4e-backup.dump
Restore
kubectl -n s4e cp ./s4e-backup.dump postgresql-0:/tmp/backup.dump
kubectl -n s4e exec -it statefulset/postgresql -- \
pg_restore -U s4e_app -d s4e_production --clean --if-exists /tmp/backup.dump
Restore impact
Restoring a backup will overwrite all current data. Stop all S4E application services before restoring to prevent data corruption.
Monitoring
Key PostgreSQL metrics to monitor:
| Metric | Warning Threshold | Critical Threshold |
|---|---|---|
| Active connections | > 80% of max_connections | > 95% of max_connections |
| Transaction rate | Baseline-dependent | Sudden drop to zero |
| Replication lag (HA) | > 10 seconds | > 60 seconds |
| Disk usage | > 80% | > 90% |
| Dead tuples ratio | > 20% of live tuples | > 50% of live tuples |
Next Steps
- RabbitMQ configuration -- set up the message broker.
- Environment variables -- complete variable reference.
- Recovery procedures -- disaster recovery planning.