Chapter 18: Monitoring Subscription Health
After the DLQ incident, the team realizes they need proactive monitoring. They should know when a handler is falling behind or accumulating failures — not discover it from a customer complaint.
Checking Subscription Status
The protean subscriptions status command gives a dashboard of all
subscriptions:
$ protean subscriptions status --domain=fidelis
Subscriptions - Fidelis
Handler Type Stream Lag Pending DLQ Status
AccountCommandHandler stream fidelis::account:cmd 0 0 - ok
AccountSummaryProjector stream fidelis::account 2 0 - ok
ComplianceAlertHandler stream fidelis::account 0 0 3 lagging
NotificationHandler stream fidelis::account 0 0 - ok
AccountReportProjector stream fidelis::account-fact 0 0 - ok
FundsTransferPM stream fidelis::transfer 0 0 - ok
6 subscription(s), 5 ok, 1 lagging, total lag: 2
Key metrics:
- Lag — how many messages the handler has not yet processed. High lag means the handler is falling behind.
- Pending — messages currently being processed (claimed but not acknowledged).
- DLQ — number of messages in the dead-letter queue.
- Status —
ok,lagging, orunknown.
For machine-readable output:
$ protean subscriptions status --domain=fidelis --json
The Observatory
For real-time monitoring, launch the Observatory — Protean's built-in observability dashboard:
$ protean observatory --domain=fidelis --port=9000
Observatory running at http://0.0.0.0:9000
The Observatory provides:
- Live message traces — a real-time stream of
handler.started,handler.completed,handler.failed,message.acked,message.dlqevents via Server-Sent Events. - Subscription status — the same data as the CLI, auto-refreshing every 5 seconds.
- DLQ management — inspect, replay, and purge directly from the web interface.
- Stream health — queue depths per stream.
Observatory API Endpoints
| Endpoint | Description |
|---|---|
GET / |
Dashboard |
GET /stream |
SSE real-time trace stream |
GET /api/health |
Health check |
GET /api/subscriptions |
Subscription status |
GET /api/traces |
Recent trace history |
GET /api/streams |
Stream information |
GET /api/outbox |
Outbox status |
GET /metrics |
Prometheus metrics |
Prometheus Metrics
The Observatory exposes Prometheus-compatible metrics at /metrics:
# HELP protean_subscription_lag Messages behind head position
# TYPE protean_subscription_lag gauge
protean_subscription_lag{domain="fidelis",handler="AccountSummaryProjector",stream="fidelis::account",type="stream"} 2
# HELP protean_subscription_dlq_depth Messages in dead-letter queue
# TYPE protean_subscription_dlq_depth gauge
protean_subscription_dlq_depth{domain="fidelis",handler="ComplianceAlertHandler",stream="fidelis::account",type="stream"} 3
# HELP protean_subscription_status Subscription status (1=ok, 0=error)
# TYPE protean_subscription_status gauge
protean_subscription_status{domain="fidelis",handler="AccountCommandHandler",stream="fidelis::account:command",type="stream"} 1
These metrics can be scraped by Grafana, Datadog, or any Prometheus- compatible monitoring tool. Set alerts on:
protean_subscription_lag > 100— handler falling behindprotean_subscription_dlq_depth > 0— failed messages accumulating
Trace Events
The engine emits structured trace events for every message processed:
handler.started— handler began processing a messagehandler.completed— handler finished successfullyhandler.failed— handler threw an exceptionmessage.acked— message acknowledged (removed from pending)message.nacked— message negatively acknowledged (will retry)message.dlq— message moved to dead-letter queueoutbox.published— outbox processor published a message to brokeroutbox.failed— outbox processor failed to publish
These events flow to the Observatory via Redis Pub/Sub in real-time. When nobody is listening, the emitter short-circuits with zero overhead.
What We Built
protean subscriptions statusfor quick health checks.- The Observatory for real-time monitoring and DLQ management.
- Prometheus metrics for production alerting.
- Understanding of trace events emitted by the engine.
With monitoring in place, we can detect problems early. In the next chapter, a bank acquisition triggers a massive migration — and we learn how to handle it without disrupting production.