Overseer

Why This Exists

Nothing is more annoying than the shrug.

Something's broken in production. You ask what happened. You get a shrug. You ask when it started. Another shrug. You ask where the logs are. Shrug. You ask how we can fix it. The biggest fucking shrug you've ever seen.

If something is wrong, I want to know where, when, how, why, and how can we reconcile it. Christ! Is that too much to ask?

It shouldn't be so fucking hard to know what happened, when, where.

You work with Datadog until management decides to migrate to New Relic. Or you're a solo dev who just wants to see if your background jobs are running without paying enterprise prices. Overseer solves this: centralized monitoring that you own, built into every Aegis Stack project from day one.

What It Is

Overseer is a health monitoring dashboard built into your Aegis Stack application. It provides real-time visibility into component and service health through a web UI.

The dashboard displays:

Component Cards: Backend, Database, Worker, Scheduler health
Service Cards: Auth, AI, Comms health (when included)
Header: Overall health summary and theme toggle
Auto-refresh: Polls health endpoint every 30 seconds

Current Capabilities

Component health monitoring (Backend, Database, Worker, Scheduler)
Service health monitoring (Auth, AI, Comms)
System metrics (CPU, memory, disk usage)
Status hierarchy (Healthy, Warning, Unhealthy, Info)
Web dashboard with auto-refresh (30-second polling)

How It Works

sequenceDiagram
    participant C as Components/Services
    participant R as Health Registry
    participant E as /health/ Endpoint
    participant D as Dashboard UI

    Note over C,R: Startup: Registration Phase
    C->>R: register_health_check("backend", check_func)
    C->>R: register_health_check("database", check_func)
    C->>R: register_service_health_check("auth", check_func)

    Note over E,D: Runtime: Monitoring Phase
    D->>E: GET /health/ (every 30s)
    E->>R: Run all registered checks
    R->>C: Execute health check functions
    C->>R: Return ComponentStatus
    R->>E: Aggregate into SystemStatus
    E->>D: Return health data
    D->>D: Render component/service cards

The Flow:

Registration: During app startup, components and services register their health check functions with the health registry
Aggregation: The /health/ endpoint runs all registered checks and aggregates results into a hierarchical status tree
Polling: The dashboard polls the health endpoint every 30 seconds
Display: Component and service cards render with real-time status, metrics, and details

Health Status Indicators

Each card displays a status indicator using the Overseer status hierarchy:

Status	Color	Visual	Meaning
✅ Healthy	Green	Solid green border	Component/service fully operational
ℹ️ Info	Blue	Solid blue border	Informational status, not a problem
⚠️ Warning	Yellow	Orange border	Operational but with issues
❌ Unhealthy	Red	Red border	Component/service down or failing

Status Propagation: Parent components inherit the worst child status:

Any child Unhealthy → Parent Unhealthy
Any child Warning (no unhealthy) → Parent Warning
Any child Info (no unhealthy/warning) → Parent Info
All children Healthy → Parent Healthy

The Story

Want to know how this came to be?

Read the full story → - How Overseer evolved from solving real production problems to becoming part of Aegis Stack.

Next Steps

The Overseer Story - Evolution from Streamlit to Aegis Stack and vision
Integration Guide - Add health checks to custom components/services

Was this page helpful?