Recovery

This guide covers backup strategies, restore procedures, and disaster recovery planning for S4E On-Prem deployments.

Backup Strategy

What to Back Up

Component	Data Type	Priority	Method
PostgreSQL	All application data (users, assets, scans, results)	Critical	`pg_dump` / continuous archiving
MongoDB	Crawl results, raw scan output	High	`mongodump`
RabbitMQ	Queue definitions (not message data)	Medium	Definition export
Redis	Cache data (ephemeral, can be rebuilt)	Low	RDB snapshot (optional)
Helm values	Deployment configuration	Critical	Git repository
TLS certificates	Ingress certificates	High	Certificate management system

Configuration as code

Store all Helm values, ArgoCD applications, and Kubernetes manifests in a Git repository. This is the most reliable backup for your deployment configuration.

Backup Schedule

Recommended practice

S4E does not manage backups on your behalf. The schedule below is a recommendation for on-prem deployments — implement it using your preferred backup tooling.

Component	Recommended Frequency	Recommended Retention
PostgreSQL (full)	Daily	30 days
PostgreSQL (WAL archiving)	Continuous	7 days
MongoDB	Daily	30 days
RabbitMQ definitions	Weekly	4 weeks

Recovery Verification Checklist

After any recovery operation, verify:

[ ] All pods are Running and Ready.
[ ] API health endpoint returns 200: curl https://s4e.company.com/api/health/ready
[ ] User login succeeds.
[ ] Asset list loads correctly.
[ ] A test scan can be initiated and completes.
[ ] Historical scan results are accessible.
[ ] RabbitMQ queues are created and consumers are connected.
[ ] Scheduled scans are still configured.
[ ] Monitoring and alerting are operational.

Next Steps

Common errors -- resolve specific error conditions.
Logs & debugging -- investigate issues during recovery.
Database configuration -- optimize post-recovery database settings.