Proper resource allocation ensures S4E On-Prem runs reliably without wasting infrastructure capacity. This guide covers CPU and memory management, persistent storage sizing, and resource quota configuration.
Kubernetes Resource Requests and Limits
How Requests and Limits Work
| Concept | Purpose |
|---|---|
| Request | Minimum guaranteed resources. Used by the scheduler for pod placement. |
| Limit | Maximum allowed resources. Pods exceeding CPU limits are throttled; pods exceeding memory limits are OOM-killed. |
Always set both
Running without resource limits can cause a single misbehaving pod to starve other services. Always define both requests and limits for production workloads.
Recommended Resource Allocations
API and Orchestration Services
| Service | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| s4e-core | 500m | 2000m | 512Mi | 2Gi |
| s4e-web | 100m | 500m | 128Mi | 512Mi |
| s4e-trigger | 250m | 1000m | 256Mi | 1Gi |
| s4e-scan-adder | 250m | 1000m | 256Mi | 1Gi |
| s4e-scheduler | 250m | 500m | 256Mi | 512Mi |
| s4e-dispatcher | 250m | 1000m | 256Mi | 1Gi |
Worker Services
| Service | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| s4e-scan | 500m | 2000m | 512Mi | 2Gi |
| s4e-crawler | 500m | 2000m | 512Mi | 2Gi |
Worker services are the most variable in resource usage. Adjust based on observed metrics:
- Port scans -- CPU-intensive, moderate memory.
- Web vulnerability scans -- CPU and memory intensive.
- Crawling (Katana) -- memory-intensive for large sites.
- Directory fuzzing (ffuf) -- CPU and network intensive.
Data Services
| Service | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| PostgreSQL | 1000m | 4000m | 2Gi | 8Gi |
| Redis | 250m | 1000m | 512Mi | 2Gi |
| RabbitMQ | 500m | 2000m | 1Gi | 4Gi |
| MongoDB | 500m | 2000m | 1Gi | 4Gi |
Setting Resources in Helm
# s4e-values.yaml
core:
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
scan:
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
CPU Management
CPU Throttling
When a pod reaches its CPU limit, Kubernetes throttles it rather than killing it. This manifests as:
- Increased response latency for s4e-core.
- Slower scan execution for workers.
- Delayed message processing for consumers.
Detecting CPU Issues
# Check current CPU usage
kubectl -n s4e top pods
# Check for throttling
kubectl -n s4e get pods -o json | jq '.items[] |
{name: .metadata.name,
cpu_request: .spec.containers[0].resources.requests.cpu,
cpu_limit: .spec.containers[0].resources.limits.cpu}'
Prometheus query for CPU throttling:
Throttling threshold
If throttling exceeds 10% of total CPU time, increase the CPU limit or add replicas.
Memory Management
OOM Kills
When a pod exceeds its memory limit, Kubernetes kills it (OOMKilled). The pod restarts automatically, but in-progress work is lost.
Detecting Memory Issues
# Check for OOM kills
kubectl -n s4e get events --field-selector reason=OOMKilling
# Check memory usage
kubectl -n s4e top pods --sort-by=memory
Prometheus query for memory pressure:
container_memory_working_set_bytes{namespace="s4e"} /
container_spec_memory_limit_bytes{namespace="s4e"} > 0.85
Memory Tuning Strategies
| Service | Strategy |
|---|---|
| s4e-core | Tune DB_POOL_SIZE and DB_MAX_OVERFLOW to control connection memory |
| s4e-scan | Reduce SCAN_CONCURRENCY if memory is limited |
| s4e-crawler | Reduce CRAWLER_CONCURRENCY and CRAWLER_DEPTH |
| PostgreSQL | Adjust shared_buffers and work_mem |
| RabbitMQ | Lower vm_memory_high_watermark ratio |
| Redis | Set maxmemory and eviction policy |
Persistent Volume Sizing
Storage Requirements
| Component | Minimum | Medium (1K assets) | Large (10K+ assets) |
|---|---|---|---|
| PostgreSQL | 50 Gi | 100 Gi | 500 Gi |
| MongoDB | 50 Gi | 100 Gi | 300 Gi |
| RabbitMQ | 10 Gi | 20 Gi | 50 Gi |
| Redis (persistent) | 5 Gi | 10 Gi | 20 Gi |
| Elasticsearch | 100 Gi | 500 Gi | 2 Ti |
Storage Class Selection
| Workload | Recommended Storage Type | Why |
|---|---|---|
| PostgreSQL | SSD (gp3, pd-ssd) | Random I/O performance |
| MongoDB | SSD | Write-heavy workloads |
| RabbitMQ | SSD | Message persistence |
| Elasticsearch | SSD for hot, HDD for warm | Cost optimization |
Monitoring Disk Usage
# Check PVC usage
kubectl -n s4e get pvc
# Check actual disk usage inside pods
kubectl -n s4e exec -it postgresql-0 -- df -h /var/lib/postgresql
Alert when disk usage exceeds 80%:
- alert: S4EPVCNearFull
expr: |
kubelet_volume_stats_used_bytes{namespace="s4e"} /
kubelet_volume_stats_capacity_bytes{namespace="s4e"} > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} is over 80% full"
Expanding PVCs
If a PVC is running out of space and your storage class supports volume expansion:
kubectl -n s4e patch pvc postgresql-data --type merge -p \
'{"spec": {"resources": {"requests": {"storage": "200Gi"}}}}'
Volume expansion
Not all storage classes support volume expansion. Check your storage class with kubectl get sc <name> -o yaml and look for allowVolumeExpansion: true.
Resource Quotas
Namespace-Level Quotas
Apply quotas to prevent the S4E namespace from consuming excessive cluster resources:
apiVersion: v1
kind: ResourceQuota
metadata:
name: s4e-quota
namespace: s4e
spec:
hard:
requests.cpu: "40"
requests.memory: "80Gi"
limits.cpu: "80"
limits.memory: "160Gi"
persistentvolumeclaims: "20"
pods: "100"
Limit Ranges
Set default resource limits for pods that do not specify them:
apiVersion: v1
kind: LimitRange
metadata:
name: s4e-limits
namespace: s4e
spec:
limits:
- type: Container
default:
cpu: "1"
memory: 1Gi
defaultRequest:
cpu: 250m
memory: 256Mi
max:
cpu: "4"
memory: 8Gi
min:
cpu: 50m
memory: 64Mi
Capacity Planning
Estimating Total Resources
Use this formula to estimate cluster capacity needs:
Total CPU = sum(service_cpu_request x replicas) + overhead (15%)
Total RAM = sum(service_memory_request x replicas) + overhead (15%)
Example: Medium Deployment
| Service | Replicas | CPU Request | Memory Request | Total CPU | Total Memory |
|---|---|---|---|---|---|
| s4e-core | 3 | 500m | 512Mi | 1500m | 1536Mi |
| s4e-web | 2 | 100m | 128Mi | 200m | 256Mi |
| s4e-trigger | 2 | 250m | 256Mi | 500m | 512Mi |
| s4e-scan | 5 | 500m | 512Mi | 2500m | 2560Mi |
| s4e-crawler | 4 | 500m | 512Mi | 2000m | 2048Mi |
| s4e-dispatcher | 2 | 250m | 256Mi | 500m | 512Mi |
| s4e-scheduler | 1 | 250m | 256Mi | 250m | 256Mi |
| PostgreSQL | 1 | 1000m | 2Gi | 1000m | 2048Mi |
| Redis | 1 | 250m | 512Mi | 250m | 512Mi |
| RabbitMQ | 1 | 500m | 1Gi | 500m | 1024Mi |
| MongoDB | 1 | 500m | 1Gi | 500m | 1024Mi |
| Subtotal | 9700m | 12288Mi | |||
| + 15% overhead | ~11.2 CPU | ~14.1 Gi |
This fits comfortably on 3 nodes with 8 vCPU and 16 GB RAM each (24 vCPU, 48 GB total).
Next Steps
- Horizontal scaling -- add capacity through replicas and nodes.
- Queue optimization -- ensure message processing matches capacity.
- Monitoring -- track resource utilization over time.