This guide covers backup strategies, restore procedures, and disaster recovery planning for S4E On-Prem deployments.
Backup Strategy
What to Back Up
| Component | Data Type | Priority | Method |
|---|---|---|---|
| PostgreSQL | All application data (users, assets, scans, results) | Critical | pg_dump / continuous archiving |
| MongoDB | Crawl results, raw scan output | High | mongodump |
| RabbitMQ | Queue definitions (not message data) | Medium | Definition export |
| Redis | Cache data (ephemeral, can be rebuilt) | Low | RDB snapshot (optional) |
| Helm values | Deployment configuration | Critical | Git repository |
| TLS certificates | Ingress certificates | High | Certificate management system |
Configuration as code
Store all Helm values, ArgoCD applications, and Kubernetes manifests in a Git repository. This is the most reliable backup for your deployment configuration.
Backup Schedule
Recommended practice
S4E does not manage backups on your behalf. The schedule below is a recommendation for on-prem deployments — implement it using your preferred backup tooling.
| Component | Recommended Frequency | Recommended Retention |
|---|---|---|
| PostgreSQL (full) | Daily | 30 days |
| PostgreSQL (WAL archiving) | Continuous | 7 days |
| MongoDB | Daily | 30 days |
| RabbitMQ definitions | Weekly | 4 weeks |
Recovery Verification Checklist
After any recovery operation, verify:
- [ ] All pods are Running and Ready.
- [ ] API health endpoint returns 200:
curl https://s4e.company.com/api/health/ready - [ ] User login succeeds.
- [ ] Asset list loads correctly.
- [ ] A test scan can be initiated and completes.
- [ ] Historical scan results are accessible.
- [ ] RabbitMQ queues are created and consumers are connected.
- [ ] Scheduled scans are still configured.
- [ ] Monitoring and alerting are operational.
Next Steps
- Common errors -- resolve specific error conditions.
- Logs & debugging -- investigate issues during recovery.
- Database configuration -- optimize post-recovery database settings.