Efficient RabbitMQ configuration is critical for S4E On-Prem performance. This guide covers queue tuning, consumer optimization, and strategies for handling high scan volumes.


Scan Throughput and Timing Balance

Each scan, AI analysis, action, and output is processed as an individual message by a dedicated microservice worker. Because every asset operation flows through the queue as a discrete unit, the number of active workers directly controls both throughput and system load.

This creates a fundamental trade-off:

Direction Symptom Impact
Too many workers / too fast Queue empties instantly, workers idle then burst Memory spikes, high CPU contention, noisy neighbor effects across services
Too few workers / too slow Queue depth grows continuously Backlogs accumulate, RAM climbs, real-time monitoring view drifts from actual asset state

The goal is a steady middle ground: queue depth stays near zero under normal load, workers stay comfortably utilized (60–80%), and no single worker type becomes a bottleneck that starves the others.

Practical Tuning Guidelines

  • Start conservative. Begin with fewer replicas than you think you need, observe queue depth and worker CPU, then scale up gradually.
  • Watch memory first. A growing queue depth that doesn't drain is the earliest signal of under-provisioning. RAM pressure follows shortly after as unacknowledged messages accumulate.
  • Avoid over-provisioning scan workers in isolation. Scan workers produce output messages consumed by action and output workers. Scaling scan workers without scaling downstream services shifts the bottleneck rather than removing it.
  • Respect the timing window. Assets are expected to reflect their current exposure. If scan throughput falls behind the schedule cadence, the platform's view of asset risk becomes stale. Tune worker counts to keep cycle time within your acceptable monitoring interval.

Finding the right balance

Monitor three numbers together: queue depth, worker CPU utilization, and scan cycle time. If depth is near zero, CPU is under 70%, and cycle time matches your scheduling interval, the current configuration is well-balanced.


Queue Performance Fundamentals

Message Flow

Publisher ──► Exchange ──► Queue ──► Consumer ──► Acknowledge
                                      ├── Success: ACK, remove from queue
                                      ├── Retry: NACK + requeue
                                      └── Failure: NACK + dead letter

The key performance metrics are:

  • Publish rate -- messages entering the queue per second.
  • Consume rate -- messages leaving the queue per second.
  • Queue depth -- number of messages waiting (publish rate > consume rate).
  • Consumer utilization -- percentage of time consumers are busy.

Goal

Keep queue depth near zero during normal operations. Sustained queue growth indicates a scaling or performance problem.

Monitoring Queue Health

Key Alerts

Condition Severity Action
Queue depth > 1000 for 5 min Warning Scale consumers
Queue depth > 5000 for 5 min Critical Scale consumers, check for failures
Consumer count drops to 0 Critical Check worker deployment status
Memory alarm triggered Critical Add RAM or reduce prefetch

Next Steps