Monitoring & Logs

S4E On-Prem provides comprehensive observability through metrics, logs, and health endpoints. This guide covers setting up monitoring infrastructure and configuring log collection for your deployment.

Observability Stack

S4E On-Prem is designed to integrate with standard Kubernetes observability tools:

Component	Purpose	Recommended Tool
Metrics	Performance monitoring, resource usage, scan throughput	Prometheus + Grafana
Logging	Centralized log aggregation and search	ELK stack (Elasticsearch, Logstash, Kibana) or Loki
Alerting	Proactive notifications for anomalies and failures	Grafana Alerting or Alertmanager
Tracing	Distributed request tracing across services	Jaeger (optional)

Prometheus Metrics

Service Metrics

All S4E services expose Prometheus-compatible metrics at the /metrics endpoint. Key metrics include:

s4e-core

Metric	Type	Description
`s4e_http_requests_total`	Counter	Total HTTP requests by method, path, and status code
`s4e_http_request_duration_seconds`	Histogram	Request latency distribution
`s4e_active_sessions`	Gauge	Number of active user sessions
`s4e_api_errors_total`	Counter	API errors by type and endpoint

Workers (scan, crawler, dispatcher)

Metric	Type	Description
`s4e_scan_jobs_total`	Counter	Total scan jobs processed
`s4e_scan_jobs_active`	Gauge	Currently running scan jobs
`s4e_scan_duration_seconds`	Histogram	Scan execution time by scan type
`s4e_crawler_urls_discovered`	Counter	Total URLs discovered by the crawler
`s4e_queue_messages_consumed`	Counter	Messages consumed from RabbitMQ

Infrastructure

Metric	Type	Description
`pg_stat_activity_count`	Gauge	Active PostgreSQL connections
`rabbitmq_queue_messages`	Gauge	Messages in each RabbitMQ queue
`redis_connected_clients`	Gauge	Connected Redis clients

Prometheus Configuration

Add a ServiceMonitor resource for each S4E service:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: s4e-core-monitor
  namespace: s4e
spec:
  selector:
    matchLabels:
      app: s4e-core
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Helm integration

S4E Helm charts include optional ServiceMonitor templates. Enable them by setting metrics.serviceMonitor.enabled: true in your values file.

Grafana Dashboards

S4E provides pre-built Grafana dashboard JSON files covering:

Platform Overview -- service health, request rates, error rates.
Scan Activity -- scan throughput, queue depths, worker utilization.
Infrastructure -- database connections, Redis memory, RabbitMQ message rates.
Resource Usage -- CPU, memory, and storage consumption per service.

Import the dashboard JSON files from the S4E release artifacts or configure them through ArgoCD.

Logging

Log Format

All S4E services emit structured JSON logs:

{
  "timestamp": "2025-01-15T10:23:45.123Z",
  "level": "INFO",
  "service": "s4e-core",
  "module": "auth",
  "message": "User login successful",
  "user_id": 42,
  "ip": "10.0.1.15",
  "request_id": "abc-123-def"
}

Log Levels

Level	Usage
`DEBUG`	Detailed diagnostic information (disabled in production by default)
`INFO`	Normal operational events (startup, request processing, scan completion)
`WARNING`	Unexpected but recoverable situations
`ERROR`	Failures requiring attention (connection errors, scan failures)
`CRITICAL`	System-level failures requiring immediate action

Configure the log level per service via the LOG_LEVEL environment variable.

ELK Stack Integration

Filebeat DaemonSet

Deploy Filebeat as a DaemonSet to collect container logs from all Kubernetes nodes:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: logging
spec:
  selector:
    matchLabels:
      app: filebeat
  template:
    spec:
      containers:
        - name: filebeat
          image: docker.elastic.co/beats/filebeat:8.12.0
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: containers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers

Kibana Index Patterns

Create index patterns in Kibana for S4E logs:

s4e-core-* -- API and authentication logs
s4e-scan-* -- Scan execution logs
s4e-crawler-* -- Crawl pipeline logs
s4e-* -- All S4E service logs combined

Fluentd / Fluent Bit Alternative

If you use Fluent Bit instead of the ELK stack, configure a filter to parse S4E JSON logs:

[FILTER]
    Name         parser
    Match        kube.s4e-*
    Key_Name     log
    Parser       json

[OUTPUT]
    Name         es
    Match        kube.s4e-*
    Host         elasticsearch.logging.svc
    Port         9200
    Index        s4e-logs

Health Checks

Liveness and Readiness Probes

All S4E services expose health endpoints used by Kubernetes probes:

Endpoint	Purpose
`/health/live`	Liveness check -- is the process running?
`/health/ready`	Readiness check -- can the service handle requests?

The readiness probe verifies connectivity to required dependencies (database, Redis, RabbitMQ) before marking the pod as ready.

Monitoring Health

Create alerts for health check failures:

groups:
  - name: s4e-health
    rules:
      - alert: S4EServiceDown
        expr: up{namespace="s4e"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "S4E service {{ $labels.job }} is down"

Alerting thresholds

Tune alert thresholds based on your deployment's normal behavior. Start with conservative thresholds and adjust as you establish baselines.

Best Practices

Retain logs for compliance -- configure log retention policies that meet your regulatory requirements (typically 90-365 days).
Use request IDs -- the request_id field enables end-to-end request tracing across services.
Monitor queue depths -- rising RabbitMQ queue depths indicate worker capacity issues.
Set up PagerDuty or OpsGenie -- route critical alerts to your on-call rotation.
Dashboard rotation -- display the Platform Overview dashboard on a wall monitor in your operations center.