Chapter 18: Monitoring Subscription Health
After the DLQ incident in Chapter 17, the team realizes they need to know about problems before customers call support. We need proactive monitoring of the message processing pipeline.
Checking Subscription Status
The quickest way to check health is the CLI:
$ protean subscriptions status --domain bookshelf
Subscription Stream Lag Pending DLQ
BookCommandHandler bookshelf::book:command 0 0 0
OrderCommandHandler bookshelf::order:command 0 0 0
BookEventHandler bookshelf::book 0 0 0
OrderEventHandler bookshelf::order 2 1 0
BookCatalogProjector bookshelf::book 0 0 0
BookReportProjector bookshelf::book-fact 0 0 3
Key metrics:
- Lag — messages waiting to be processed. High lag means the handler is falling behind.
- Pending — messages currently being processed (acknowledged but not yet completed).
- DLQ — messages that failed all retries.
The Observatory Dashboard
For real-time monitoring, use the Observatory:
$ protean observatory --domain bookshelf --host 0.0.0.0 --port 9000
Observatory running at http://0.0.0.0:9000
Monitoring domains: bookshelf
Open http://localhost:9000 in your browser to see:
- Subscription status — lag, pending, and DLQ counts for every handler.
- Processing rates — messages processed per second.
- Error rates — recent failures and their handlers.
- Live trace stream — server-sent events showing message processing in real time.
The Observatory can monitor multiple domains simultaneously:
$ protean observatory \
--domain bookshelf \
--domain inventory_service \
--host 0.0.0.0 --port 9000
Metrics Endpoint
The Observatory exposes a /metrics endpoint in Prometheus format:
$ curl http://localhost:9000/metrics
# HELP protean_subscription_lag Messages waiting to be processed
# TYPE protean_subscription_lag gauge
protean_subscription_lag{domain="bookshelf",handler="BookEventHandler"} 0
protean_subscription_lag{domain="bookshelf",handler="OrderEventHandler"} 2
...
Connect this to Prometheus and Grafana for production dashboards and alerting.
Setting Up Alerts
A basic Prometheus alert for subscription lag:
groups:
- name: bookshelf
rules:
- alert: HighSubscriptionLag
expr: protean_subscription_lag > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Handler {{ $labels.handler }} has high lag"
What We Built
protean subscriptions statusfor quick health checks.- The Observatory dashboard for real-time monitoring.
- A Prometheus metrics endpoint for production alerting.
- Understanding of key metrics: lag, pending, DLQ depth.
In the next chapter, we will tackle a bulk import scenario that requires priority lanes to avoid starving production traffic.