Efficient RabbitMQ configuration is critical for S4E On-Prem performance. This guide covers queue tuning, consumer optimization, and strategies for handling high scan volumes.
Scan Throughput and Timing Balance
Each scan, AI analysis, action, and output is processed as an individual message by a dedicated microservice worker. Because every asset operation flows through the queue as a discrete unit, the number of active workers directly controls both throughput and system load.
This creates a fundamental trade-off:
| Direction | Symptom | Impact |
|---|---|---|
| Too many workers / too fast | Queue empties instantly, workers idle then burst | Memory spikes, high CPU contention, noisy neighbor effects across services |
| Too few workers / too slow | Queue depth grows continuously | Backlogs accumulate, RAM climbs, real-time monitoring view drifts from actual asset state |
The goal is a steady middle ground: queue depth stays near zero under normal load, workers stay comfortably utilized (60–80%), and no single worker type becomes a bottleneck that starves the others.
Practical Tuning Guidelines
- Start conservative. Begin with fewer replicas than you think you need, observe queue depth and worker CPU, then scale up gradually.
- Watch memory first. A growing queue depth that doesn't drain is the earliest signal of under-provisioning. RAM pressure follows shortly after as unacknowledged messages accumulate.
- Avoid over-provisioning scan workers in isolation. Scan workers produce output messages consumed by action and output workers. Scaling scan workers without scaling downstream services shifts the bottleneck rather than removing it.
- Respect the timing window. Assets are expected to reflect their current exposure. If scan throughput falls behind the schedule cadence, the platform's view of asset risk becomes stale. Tune worker counts to keep cycle time within your acceptable monitoring interval.
Finding the right balance
Monitor three numbers together: queue depth, worker CPU utilization, and scan cycle time. If depth is near zero, CPU is under 70%, and cycle time matches your scheduling interval, the current configuration is well-balanced.
Queue Performance Fundamentals
Message Flow
Publisher ──► Exchange ──► Queue ──► Consumer ──► Acknowledge
│
├── Success: ACK, remove from queue
├── Retry: NACK + requeue
└── Failure: NACK + dead letter
The key performance metrics are:
- Publish rate -- messages entering the queue per second.
- Consume rate -- messages leaving the queue per second.
- Queue depth -- number of messages waiting (publish rate > consume rate).
- Consumer utilization -- percentage of time consumers are busy.
Goal
Keep queue depth near zero during normal operations. Sustained queue growth indicates a scaling or performance problem.
Monitoring Queue Health
Key Alerts
| Condition | Severity | Action |
|---|---|---|
| Queue depth > 1000 for 5 min | Warning | Scale consumers |
| Queue depth > 5000 for 5 min | Critical | Scale consumers, check for failures |
| Consumer count drops to 0 | Critical | Check worker deployment status |
| Memory alarm triggered | Critical | Add RAM or reduce prefetch |
Next Steps
- Horizontal scaling -- add worker replicas.
- Resource management -- ensure workers have adequate resources.
- RabbitMQ configuration -- broker-level settings.