Guides

Production Deployment

A checklist and deep-dive for running Lelu reliably in production — covering HTTPS, secrets management, Engine scaling, and observability.

Pre-Launch Checklist

TLS terminated at load balancer or ingress for all services (ops)
LELU_API_KEY rotated from default and stored in a secret manager (ops)
DATABASE_URL uses sslmode=require in production
Redis uses TLS (rediss://) or a private network
Engine replicas ≥ 2 for high availability
Health checks configured on /healthz for Engine, Platform, MCP, and UI
Audit retention configured (S3/object-store lifecycle, 1+ year)
Structured logs exported to your log platform
OPA/Rego policies are version-controlled before deploy

In Docker Compose healthchecks, prefer 127.0.0.1 over localhost to avoid container-local hostname resolution edge cases.

Scaling the Engine

The Engine is stateless — scale horizontally by running multiple replicas behind a load balancer. All state lives in Redis.

docker-compose.override.yml
services:
  engine:
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: "1"
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s

Secrets Management

Never store secrets in environment files committed to source control. Use one of these patterns in production:

AWS Secrets Manager

Use the AWS SSM Parameter Store or Secrets Manager and inject via IAM role at runtime.

Kubernetes Secrets

Mount as environment variables from an encrypted Secret object — use Sealed Secrets or External Secrets Operator.

HashiCorp Vault

Use Vault Agent Injector to automatically inject secrets into pods at startup.

Observability

Key metrics to alert on
lelu_http_requests_total{method="POST",path="/v1/agent/authorize",status="200"}
  # Request volume and status-code anomalies

lelu_http_request_duration_seconds{method="POST",path="/v1/agent/authorize"}
  # Latency SLO / p95 / p99

lelu_auth_decisions_total{type="agent",allowed="false"}
  # Deny-rate spikes and confidence policy pressure