Resource Management

Proper resource allocation ensures S4E On-Prem runs reliably without wasting infrastructure capacity. This guide covers CPU and memory management, persistent storage sizing, and resource quota configuration.

Kubernetes Resource Requests and Limits

How Requests and Limits Work

Concept	Purpose
Request	Minimum guaranteed resources. Used by the scheduler for pod placement.
Limit	Maximum allowed resources. Pods exceeding CPU limits are throttled; pods exceeding memory limits are OOM-killed.

Always set both

Running without resource limits can cause a single misbehaving pod to starve other services. Always define both requests and limits for production workloads.

Recommended Resource Allocations

API and Orchestration Services

Service	CPU Request	CPU Limit	Memory Request	Memory Limit
s4e-core	500m	2000m	512Mi	2Gi
s4e-web	100m	500m	128Mi	512Mi
s4e-trigger	250m	1000m	256Mi	1Gi
s4e-scan-adder	250m	1000m	256Mi	1Gi
s4e-scheduler	250m	500m	256Mi	512Mi
s4e-dispatcher	250m	1000m	256Mi	1Gi

Worker Services

Service	CPU Request	CPU Limit	Memory Request	Memory Limit
s4e-scan	500m	2000m	512Mi	2Gi
s4e-crawler	500m	2000m	512Mi	2Gi

Worker services are the most variable in resource usage. Adjust based on observed metrics:

Port scans -- CPU-intensive, moderate memory.
Web vulnerability scans -- CPU and memory intensive.
Crawling (Katana) -- memory-intensive for large sites.
Directory fuzzing (ffuf) -- CPU and network intensive.

Data Services

Service	CPU Request	CPU Limit	Memory Request	Memory Limit
PostgreSQL	1000m	4000m	2Gi	8Gi
Redis	250m	1000m	512Mi	2Gi
RabbitMQ	500m	2000m	1Gi	4Gi
MongoDB	500m	2000m	1Gi	4Gi

Setting Resources in Helm

# s4e-values.yaml
core:
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi

scan:
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi

CPU Management

CPU Throttling

When a pod reaches its CPU limit, Kubernetes throttles it rather than killing it. This manifests as:

Increased response latency for s4e-core.
Slower scan execution for workers.
Delayed message processing for consumers.

Detecting CPU Issues

# Check current CPU usage
kubectl -n s4e top pods

# Check for throttling
kubectl -n s4e get pods -o json | jq '.items[] |
  {name: .metadata.name,
   cpu_request: .spec.containers[0].resources.requests.cpu,
   cpu_limit: .spec.containers[0].resources.limits.cpu}'

Prometheus query for CPU throttling:

rate(container_cpu_cfs_throttled_seconds_total{namespace="s4e"}[5m])

Throttling threshold

If throttling exceeds 10% of total CPU time, increase the CPU limit or add replicas.

Memory Management

OOM Kills

When a pod exceeds its memory limit, Kubernetes kills it (OOMKilled). The pod restarts automatically, but in-progress work is lost.

Detecting Memory Issues

# Check for OOM kills
kubectl -n s4e get events --field-selector reason=OOMKilling

# Check memory usage
kubectl -n s4e top pods --sort-by=memory

Prometheus query for memory pressure:

container_memory_working_set_bytes{namespace="s4e"} /
container_spec_memory_limit_bytes{namespace="s4e"} > 0.85

Memory Tuning Strategies

Service	Strategy
s4e-core	Tune `DB_POOL_SIZE` and `DB_MAX_OVERFLOW` to control connection memory
s4e-scan	Reduce `SCAN_CONCURRENCY` if memory is limited
s4e-crawler	Reduce `CRAWLER_CONCURRENCY` and `CRAWLER_DEPTH`
PostgreSQL	Adjust `shared_buffers` and `work_mem`
RabbitMQ	Lower `vm_memory_high_watermark` ratio
Redis	Set `maxmemory` and eviction policy

Persistent Volume Sizing

Storage Requirements

Component	Minimum	Medium (1K assets)	Large (10K+ assets)
PostgreSQL	50 Gi	100 Gi	500 Gi
MongoDB	50 Gi	100 Gi	300 Gi
RabbitMQ	10 Gi	20 Gi	50 Gi
Redis (persistent)	5 Gi	10 Gi	20 Gi
Elasticsearch	100 Gi	500 Gi	2 Ti

Storage Class Selection

Workload	Recommended Storage Type	Why
PostgreSQL	SSD (gp3, pd-ssd)	Random I/O performance
MongoDB	SSD	Write-heavy workloads
RabbitMQ	SSD	Message persistence
Elasticsearch	SSD for hot, HDD for warm	Cost optimization

Monitoring Disk Usage

# Check PVC usage
kubectl -n s4e get pvc

# Check actual disk usage inside pods
kubectl -n s4e exec -it postgresql-0 -- df -h /var/lib/postgresql

Alert when disk usage exceeds 80%:

- alert: S4EPVCNearFull
  expr: |
    kubelet_volume_stats_used_bytes{namespace="s4e"} /
    kubelet_volume_stats_capacity_bytes{namespace="s4e"} > 0.8
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "PVC {{ $labels.persistentvolumeclaim }} is over 80% full"

Expanding PVCs

If a PVC is running out of space and your storage class supports volume expansion:

kubectl -n s4e patch pvc postgresql-data --type merge -p \
  '{"spec": {"resources": {"requests": {"storage": "200Gi"}}}}'

Volume expansion

Not all storage classes support volume expansion. Check your storage class with kubectl get sc <name> -o yaml and look for allowVolumeExpansion: true.

Resource Quotas

Namespace-Level Quotas

Apply quotas to prevent the S4E namespace from consuming excessive cluster resources:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: s4e-quota
  namespace: s4e
spec:
  hard:
    requests.cpu: "40"
    requests.memory: "80Gi"
    limits.cpu: "80"
    limits.memory: "160Gi"
    persistentvolumeclaims: "20"
    pods: "100"

Limit Ranges

Set default resource limits for pods that do not specify them:

apiVersion: v1
kind: LimitRange
metadata:
  name: s4e-limits
  namespace: s4e
spec:
  limits:
    - type: Container
      default:
        cpu: "1"
        memory: 1Gi
      defaultRequest:
        cpu: 250m
        memory: 256Mi
      max:
        cpu: "4"
        memory: 8Gi
      min:
        cpu: 50m
        memory: 64Mi

Capacity Planning

Estimating Total Resources

Use this formula to estimate cluster capacity needs:

Total CPU = sum(service_cpu_request x replicas) + overhead (15%)
Total RAM = sum(service_memory_request x replicas) + overhead (15%)

Example: Medium Deployment

Service	Replicas	CPU Request	Memory Request	Total CPU	Total Memory
s4e-core	3	500m	512Mi	1500m	1536Mi
s4e-web	2	100m	128Mi	200m	256Mi
s4e-trigger	2	250m	256Mi	500m	512Mi
s4e-scan	5	500m	512Mi	2500m	2560Mi
s4e-crawler	4	500m	512Mi	2000m	2048Mi
s4e-dispatcher	2	250m	256Mi	500m	512Mi
s4e-scheduler	1	250m	256Mi	250m	256Mi
PostgreSQL	1	1000m	2Gi	1000m	2048Mi
Redis	1	250m	512Mi	250m	512Mi
RabbitMQ	1	500m	1Gi	500m	1024Mi
MongoDB	1	500m	1Gi	500m	1024Mi
Subtotal				9700m	12288Mi
+ 15% overhead				~11.2 CPU	~14.1 Gi

This fits comfortably on 3 nodes with 8 vCPU and 16 GB RAM each (24 vCPU, 48 GB total).

Next Steps

Horizontal scaling -- add capacity through replicas and nodes.
Queue optimization -- ensure message processing matches capacity.
Monitoring -- track resource utilization over time.