Proper resource allocation ensures S4E On-Prem runs reliably without wasting infrastructure capacity. This guide covers CPU and memory management, persistent storage sizing, and resource quota configuration.


Kubernetes Resource Requests and Limits

How Requests and Limits Work

Concept Purpose
Request Minimum guaranteed resources. Used by the scheduler for pod placement.
Limit Maximum allowed resources. Pods exceeding CPU limits are throttled; pods exceeding memory limits are OOM-killed.

Always set both

Running without resource limits can cause a single misbehaving pod to starve other services. Always define both requests and limits for production workloads.

API and Orchestration Services

Service CPU Request CPU Limit Memory Request Memory Limit
s4e-core 500m 2000m 512Mi 2Gi
s4e-web 100m 500m 128Mi 512Mi
s4e-trigger 250m 1000m 256Mi 1Gi
s4e-scan-adder 250m 1000m 256Mi 1Gi
s4e-scheduler 250m 500m 256Mi 512Mi
s4e-dispatcher 250m 1000m 256Mi 1Gi

Worker Services

Service CPU Request CPU Limit Memory Request Memory Limit
s4e-scan 500m 2000m 512Mi 2Gi
s4e-crawler 500m 2000m 512Mi 2Gi

Worker services are the most variable in resource usage. Adjust based on observed metrics:

  • Port scans -- CPU-intensive, moderate memory.
  • Web vulnerability scans -- CPU and memory intensive.
  • Crawling (Katana) -- memory-intensive for large sites.
  • Directory fuzzing (ffuf) -- CPU and network intensive.

Data Services

Service CPU Request CPU Limit Memory Request Memory Limit
PostgreSQL 1000m 4000m 2Gi 8Gi
Redis 250m 1000m 512Mi 2Gi
RabbitMQ 500m 2000m 1Gi 4Gi
MongoDB 500m 2000m 1Gi 4Gi

Setting Resources in Helm

# s4e-values.yaml
core:
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi

scan:
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi

CPU Management

CPU Throttling

When a pod reaches its CPU limit, Kubernetes throttles it rather than killing it. This manifests as:

  • Increased response latency for s4e-core.
  • Slower scan execution for workers.
  • Delayed message processing for consumers.

Detecting CPU Issues

# Check current CPU usage
kubectl -n s4e top pods

# Check for throttling
kubectl -n s4e get pods -o json | jq '.items[] |
  {name: .metadata.name,
   cpu_request: .spec.containers[0].resources.requests.cpu,
   cpu_limit: .spec.containers[0].resources.limits.cpu}'

Prometheus query for CPU throttling:

rate(container_cpu_cfs_throttled_seconds_total{namespace="s4e"}[5m])

Throttling threshold

If throttling exceeds 10% of total CPU time, increase the CPU limit or add replicas.

Memory Management

OOM Kills

When a pod exceeds its memory limit, Kubernetes kills it (OOMKilled). The pod restarts automatically, but in-progress work is lost.

Detecting Memory Issues

# Check for OOM kills
kubectl -n s4e get events --field-selector reason=OOMKilling

# Check memory usage
kubectl -n s4e top pods --sort-by=memory

Prometheus query for memory pressure:

container_memory_working_set_bytes{namespace="s4e"} /
container_spec_memory_limit_bytes{namespace="s4e"} > 0.85

Memory Tuning Strategies

Service Strategy
s4e-core Tune DB_POOL_SIZE and DB_MAX_OVERFLOW to control connection memory
s4e-scan Reduce SCAN_CONCURRENCY if memory is limited
s4e-crawler Reduce CRAWLER_CONCURRENCY and CRAWLER_DEPTH
PostgreSQL Adjust shared_buffers and work_mem
RabbitMQ Lower vm_memory_high_watermark ratio
Redis Set maxmemory and eviction policy

Persistent Volume Sizing

Storage Requirements

Component Minimum Medium (1K assets) Large (10K+ assets)
PostgreSQL 50 Gi 100 Gi 500 Gi
MongoDB 50 Gi 100 Gi 300 Gi
RabbitMQ 10 Gi 20 Gi 50 Gi
Redis (persistent) 5 Gi 10 Gi 20 Gi
Elasticsearch 100 Gi 500 Gi 2 Ti

Storage Class Selection

Workload Recommended Storage Type Why
PostgreSQL SSD (gp3, pd-ssd) Random I/O performance
MongoDB SSD Write-heavy workloads
RabbitMQ SSD Message persistence
Elasticsearch SSD for hot, HDD for warm Cost optimization

Monitoring Disk Usage

# Check PVC usage
kubectl -n s4e get pvc

# Check actual disk usage inside pods
kubectl -n s4e exec -it postgresql-0 -- df -h /var/lib/postgresql

Alert when disk usage exceeds 80%:

- alert: S4EPVCNearFull
  expr: |
    kubelet_volume_stats_used_bytes{namespace="s4e"} /
    kubelet_volume_stats_capacity_bytes{namespace="s4e"} > 0.8
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "PVC {{ $labels.persistentvolumeclaim }} is over 80% full"

Expanding PVCs

If a PVC is running out of space and your storage class supports volume expansion:

kubectl -n s4e patch pvc postgresql-data --type merge -p \
  '{"spec": {"resources": {"requests": {"storage": "200Gi"}}}}'

Volume expansion

Not all storage classes support volume expansion. Check your storage class with kubectl get sc <name> -o yaml and look for allowVolumeExpansion: true.

Resource Quotas

Namespace-Level Quotas

Apply quotas to prevent the S4E namespace from consuming excessive cluster resources:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: s4e-quota
  namespace: s4e
spec:
  hard:
    requests.cpu: "40"
    requests.memory: "80Gi"
    limits.cpu: "80"
    limits.memory: "160Gi"
    persistentvolumeclaims: "20"
    pods: "100"

Limit Ranges

Set default resource limits for pods that do not specify them:

apiVersion: v1
kind: LimitRange
metadata:
  name: s4e-limits
  namespace: s4e
spec:
  limits:
    - type: Container
      default:
        cpu: "1"
        memory: 1Gi
      defaultRequest:
        cpu: 250m
        memory: 256Mi
      max:
        cpu: "4"
        memory: 8Gi
      min:
        cpu: 50m
        memory: 64Mi

Capacity Planning

Estimating Total Resources

Use this formula to estimate cluster capacity needs:

Total CPU = sum(service_cpu_request x replicas) + overhead (15%)
Total RAM = sum(service_memory_request x replicas) + overhead (15%)

Example: Medium Deployment

Service Replicas CPU Request Memory Request Total CPU Total Memory
s4e-core 3 500m 512Mi 1500m 1536Mi
s4e-web 2 100m 128Mi 200m 256Mi
s4e-trigger 2 250m 256Mi 500m 512Mi
s4e-scan 5 500m 512Mi 2500m 2560Mi
s4e-crawler 4 500m 512Mi 2000m 2048Mi
s4e-dispatcher 2 250m 256Mi 500m 512Mi
s4e-scheduler 1 250m 256Mi 250m 256Mi
PostgreSQL 1 1000m 2Gi 1000m 2048Mi
Redis 1 250m 512Mi 250m 512Mi
RabbitMQ 1 500m 1Gi 500m 1024Mi
MongoDB 1 500m 1Gi 500m 1024Mi
Subtotal 9700m 12288Mi
+ 15% overhead ~11.2 CPU ~14.1 Gi

This fits comfortably on 3 nodes with 8 vCPU and 16 GB RAM each (24 vCPU, 48 GB total).

Next Steps