S4E On-Prem provides comprehensive observability through metrics, logs, and health endpoints. This guide covers setting up monitoring infrastructure and configuring log collection for your deployment.


Observability Stack

S4E On-Prem is designed to integrate with standard Kubernetes observability tools:

Component Purpose Recommended Tool
Metrics Performance monitoring, resource usage, scan throughput Prometheus + Grafana
Logging Centralized log aggregation and search ELK stack (Elasticsearch, Logstash, Kibana) or Loki
Alerting Proactive notifications for anomalies and failures Grafana Alerting or Alertmanager
Tracing Distributed request tracing across services Jaeger (optional)

Prometheus Metrics

Service Metrics

All S4E services expose Prometheus-compatible metrics at the /metrics endpoint. Key metrics include:

s4e-core

Metric Type Description
s4e_http_requests_total Counter Total HTTP requests by method, path, and status code
s4e_http_request_duration_seconds Histogram Request latency distribution
s4e_active_sessions Gauge Number of active user sessions
s4e_api_errors_total Counter API errors by type and endpoint

Workers (scan, crawler, dispatcher)

Metric Type Description
s4e_scan_jobs_total Counter Total scan jobs processed
s4e_scan_jobs_active Gauge Currently running scan jobs
s4e_scan_duration_seconds Histogram Scan execution time by scan type
s4e_crawler_urls_discovered Counter Total URLs discovered by the crawler
s4e_queue_messages_consumed Counter Messages consumed from RabbitMQ

Infrastructure

Metric Type Description
pg_stat_activity_count Gauge Active PostgreSQL connections
rabbitmq_queue_messages Gauge Messages in each RabbitMQ queue
redis_connected_clients Gauge Connected Redis clients

Prometheus Configuration

Add a ServiceMonitor resource for each S4E service:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: s4e-core-monitor
  namespace: s4e
spec:
  selector:
    matchLabels:
      app: s4e-core
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Helm integration

S4E Helm charts include optional ServiceMonitor templates. Enable them by setting metrics.serviceMonitor.enabled: true in your values file.

Grafana Dashboards

S4E provides pre-built Grafana dashboard JSON files covering:

  • Platform Overview -- service health, request rates, error rates.
  • Scan Activity -- scan throughput, queue depths, worker utilization.
  • Infrastructure -- database connections, Redis memory, RabbitMQ message rates.
  • Resource Usage -- CPU, memory, and storage consumption per service.

Import the dashboard JSON files from the S4E release artifacts or configure them through ArgoCD.

Logging

Log Format

All S4E services emit structured JSON logs:

{
  "timestamp": "2025-01-15T10:23:45.123Z",
  "level": "INFO",
  "service": "s4e-core",
  "module": "auth",
  "message": "User login successful",
  "user_id": 42,
  "ip": "10.0.1.15",
  "request_id": "abc-123-def"
}

Log Levels

Level Usage
DEBUG Detailed diagnostic information (disabled in production by default)
INFO Normal operational events (startup, request processing, scan completion)
WARNING Unexpected but recoverable situations
ERROR Failures requiring attention (connection errors, scan failures)
CRITICAL System-level failures requiring immediate action

Configure the log level per service via the LOG_LEVEL environment variable.

ELK Stack Integration

Filebeat DaemonSet

Deploy Filebeat as a DaemonSet to collect container logs from all Kubernetes nodes:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: logging
spec:
  selector:
    matchLabels:
      app: filebeat
  template:
    spec:
      containers:
        - name: filebeat
          image: docker.elastic.co/beats/filebeat:8.12.0
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: containers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers

Kibana Index Patterns

Create index patterns in Kibana for S4E logs:

  • s4e-core-* -- API and authentication logs
  • s4e-scan-* -- Scan execution logs
  • s4e-crawler-* -- Crawl pipeline logs
  • s4e-* -- All S4E service logs combined

Fluentd / Fluent Bit Alternative

If you use Fluent Bit instead of the ELK stack, configure a filter to parse S4E JSON logs:

[FILTER]
    Name         parser
    Match        kube.s4e-*
    Key_Name     log
    Parser       json

[OUTPUT]
    Name         es
    Match        kube.s4e-*
    Host         elasticsearch.logging.svc
    Port         9200
    Index        s4e-logs

Health Checks

Liveness and Readiness Probes

All S4E services expose health endpoints used by Kubernetes probes:

Endpoint Purpose
/health/live Liveness check -- is the process running?
/health/ready Readiness check -- can the service handle requests?

The readiness probe verifies connectivity to required dependencies (database, Redis, RabbitMQ) before marking the pod as ready.

Monitoring Health

Create alerts for health check failures:

groups:
  - name: s4e-health
    rules:
      - alert: S4EServiceDown
        expr: up{namespace="s4e"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "S4E service {{ $labels.job }} is down"

Alerting thresholds

Tune alert thresholds based on your deployment's normal behavior. Start with conservative thresholds and adjust as you establish baselines.

Best Practices

  1. Retain logs for compliance -- configure log retention policies that meet your regulatory requirements (typically 90-365 days).
  2. Use request IDs -- the request_id field enables end-to-end request tracing across services.
  3. Monitor queue depths -- rising RabbitMQ queue depths indicate worker capacity issues.
  4. Set up PagerDuty or OpsGenie -- route critical alerts to your on-call rotation.
  5. Dashboard rotation -- display the Platform Overview dashboard on a wall monitor in your operations center.