Logging
This page explains why Protean's logging works the way it does. For task-oriented instructions see the Logging guide; for a factual enumeration of config keys, framework loggers, and event schemas see the Logging reference.
Why structured logs?
The 12-factor treatment of logs as an event stream has been conventional wisdom for more than a decade, but most codebases still produce human-formatted strings and call it a day:
# The old way — looks readable, queries like grep
logger.info(f"Refunded ${amount} for order {order_id} because {reason}")
When a customer reports a missing refund, you open Loki or Elasticsearch
and search. You don't know whether people wrote "Refunded" or
"Refund issued" or "Processing refund"; you don't know whether the
amount comes before or after the order id; you don't know whether the
reason is free text or a code. Every query is an archaeology dig.
Structured logs invert that:
# The Protean way — event name is stable, fields are queryable
logger.info("payment_refunded", order_id="ord-123", amount=19.99, reason="customer_request")
event=payment_refunded AND reason=customer_request is a viable query.
avg(amount) by reason is a viable aggregation. count by hour draws a
time series. Strings are opaque; fields are data.
Protean commits to this pattern throughout — every hot-path event emitted
by the framework uses a stable name and structured fields, and the
get_logger() API encourages application code to do the same.
The wide event pattern
Structured logs alone still leave an observability gap: when something
goes wrong inside a handler, you typically have a dozen scattered log
lines — "started processing", "loaded aggregate", "validated input",
"raised event", "saved aggregate", "committed UoW" — and you have to
reconstruct the full story by filtering on correlation_id.
The wide event pattern replaces that stream of thin events with one rich event per unit of work. Stripe's engineering blog codified this as the canonical log line pattern:
We emit one log line for every request our application handles. The log line contains every field we might want to query on: request id, user id, response time, status code, billing plan, A/B test cohorts, and so on.
That one log line answers:
- "Which requests failed in the last 10 minutes?"
- "What's the p99 latency for premium users on the checkout endpoint?"
- "Which feature-flag cohort shows the highest error rate?"
…without joining log lines or re-running the query over a dozen thin events.
Designing fields for queryability
The practical insight: choose log fields by asking "what questions will operators ask at 3 AM?" not "what is my code doing right now." That reframes debugging from archaeology — grep for strings until a pattern emerges — to analytics — run an aggregation on a queryable schema.
Examples of queries wide events enable when the right fields are bound:
- "Show all payment failures for premium users in the last hour grouped by error code."
- "What's the p99 handler duration for
PlaceOrdercommands wherecoupon_appliedis not null?" - "Which aggregates have the highest
events_raised_count? (potential aggregate design smell.)" - "Which correlation chains span more than five handlers? (long-running flows worth inspecting.)"
If your wide event can answer those questions, you've picked the right fields. If it can't, you're still doing archaeology.
How Protean builds wide events automatically
No other Python framework can auto-populate domain fields like
aggregate, events_raised, or uow_outcome because no other framework
models those concepts at the framework layer. Protean builds the wide
event from two sources:
- A framework layer, always present with no code required — handler identity, duration, aggregate, events raised, repository operations, UoW outcome, correlation, and (when telemetry is enabled) trace context. See the reference for the exhaustive field list.
- An application layer, whatever the handler binds via
bind_event_context()— user tier, order total, feature flags, tenant plan.
The two layers merge into a single event emitted when the handler
returns. The framework keeps ownership of its fields: application keys
that collide with a framework-reserved name are dropped before emission,
so the meaning of handler or uow_outcome never depends on what the
caller happened to bind.
Query-oriented field design
When choosing what to bind_event_context(), favour fields that:
- Are queryable dimensions, not free text. Prefer
user_tier="premium"tonote="premium user doing checkout". - Are stable. If a value changes shape every deployment, it's a poor aggregation key.
- Are low enough cardinality to group by, high enough to be useful.
tenant_idis great;uuid_of_every_requestis not (unless your backend handles it — see below). - Name the business concept the operator will ask about.
coupon_applied=Truebeatsflags.coupon=1because the former reads cleanly in a filter.
Leave out fields that only matter inside the handler itself — those belong in DEBUG-level thin events, not the wide event.
High-cardinality fields and backend choice
Wide events produce data that legacy log backends struggle with. A single event carries a request id, a user id, an aggregate id, a correlation id, possibly a tenant id — every one of them high-cardinality by design, because they're exactly what an operator filters by.
Syslog-era stores built for low-cardinality tagging don't index these well; queries over them fall back to full scans. Modern columnar log stores — Loki with structured metadata, ClickHouse, BigQuery, CloudWatch Logs Insights, Datadog Log Analytics — handle high cardinality natively.
Protean does not recommend a specific vendor. The heuristic is: if your
logging aggregator makes filter user_id=X or top by aggregate_id
queries slow, the bottleneck is the backend, not the event shape. Fix
that first before trimming fields out of the wide event.
How Protean's logging differs from Django and FastAPI
| Feature | Django | FastAPI (ASGI + uvicorn) | Protean |
|---|---|---|---|
| Structured logs by default | No (stdlib messages) | No (uvicorn access log is plain-text) | Yes (structlog pipeline on by default) |
| Wide event per request | No | No | Yes (protean.access) |
| Domain context in logs | Manual | Manual | Auto (aggregate, events, UoW outcome, correlation) |
| Correlation ID in logs | Manual | Manual | Auto |
| OpenTelemetry trace context in logs | Manual | Manual | Auto when telemetry.enabled = true |
dictConfig support |
Yes | Yes | Yes (via --log-config or dict_config=) |
| Redaction | Manual | Manual | Built-in processor with default keys |
| Multi-worker hygiene | N/A (single process) | Manual | Built-in QueueHandler / QueueListener in --workers N mode |
The gap Protean closes is the domain-aware one: aggregate, events raised, UoW outcome. No other Python framework can populate those automatically because no other framework models them. The commodity features (structured logs, dictConfig, correlation filter) are also on by default — where Django and FastAPI leave them to the operator.
The four sources of context
Every resolved setting comes from one of four places. The precedence, from highest to lowest:
- Explicit keyword arguments to
Domain.configure_logging()orconfigure_logging(). Callers always win. - Environment variables (
PROTEAN_LOG_LEVEL,PROTEAN_ENV). Useful for container orchestration. domain.toml [logging]section. The declarative surface operators tune during deployment.- Environment-based defaults.
PROTEAN_ENV=productionpicks INFO + JSON;developmentpicks DEBUG + console;testpicks WARNING + console.
This precedence matches every other resolvable setting in Protean
(server, telemetry, databases) so operators don't need to memorize
a per-subsystem rule. It also means the right answer to "what will my
deployment see?" is always: check the kwargs, then the env vars, then
domain.toml, then the environment.
How correlation propagates to logs
The ProteanCorrelationFilter (stdlib) and protean_correlation_processor
(structlog) read g.message_in_context.metadata.domain.correlation_id
first, then fall back to g.correlation_id. The fallback is the
documented extension point for HTTP middleware, CLI commands, and
background jobs that tag context before any domain message exists.
Both are safe no-ops outside a domain context: they return "" for both
fields so formatters referencing %(correlation_id)s never raise
KeyError. That no-op semantics is the reason Protean can attach the
filter to the root logger unconditionally — it costs one attribute set
per record and changes nothing about the output shape when context is
absent.
For the full correlation story across HTTP, events, subscribers, and OTel spans, see Correlation and Causation IDs.
How trace context reaches logs
OpenTelemetry trace context (trace_id, span_id, trace_flags) is
injected by OTelTraceContextFilter and protean_otel_processor. Both
are installed only when telemetry.enabled = true — when telemetry is
disabled, the structlog chain has one fewer processor and the root
logger has one fewer filter, so the hot path pays zero cost.
The processor lazily resolves OpenTelemetry symbols on first access and
caches the result. If opentelemetry is not installed (the telemetry
extra is optional), the processor emits empty values and never retries
the import.
This design enables the log ↔ trace jump in every APM tool: click a
trace in Jaeger or Datadog, copy its trace_id, paste it into Loki or
Elasticsearch, see every log record from the same request. Without the
trace_id field on log records, that jump requires custom middleware.
See ADR-0007 and ADR-0008 for the OTel architecture this layer builds on.
Why redaction is processor-based
Call-site redaction — "be careful what you log!" — never survives contact with the real world. Someone, somewhere, forgets. A stacktrace inlines a request body. A new feature logs more than its predecessor. A well-meaning debug line ships to production.
The only pattern that survives is redaction as a pipeline stage: any mention of a sensitive key name in any log event is masked before rendering, regardless of where in the pipeline the value was introduced. Protean's redaction processor runs last in the structlog pipeline — after every caller-supplied processor — so operator-supplied processors cannot smuggle sensitive values past it, whether intentionally or by accident.
The redact list is unioned with the built-in defaults, not replaced.
Operators can only widen the list — never narrow it. You cannot turn off
password masking by supplying your own list.
Redaction is a hygiene filter, not a security boundary: don't store secrets where their redaction is the last line of defence. Authentication tokens belong in a vault, not a best-effort log filter.
Multi-worker hygiene
stdout writes from separate processes are not atomic beyond PIPE_BUF
bytes — typically 4096 on Linux, 512 on BSD. A structured JSON log
record easily exceeds either. Without coordination, concurrent worker
output interleaves mid-record, and the resulting lines are unparseable.
protean server --workers N (with N > 1) solves this the way Python's
stdlib recommends: each worker installs a QueueHandler as its sole
root handler; the supervisor runs a QueueListener that drains the
queue and forwards records to the supervisor's real handlers. Records
cross the process boundary through a multiprocessing.Queue, which
preserves record integrity.
The listener is stopped in a finally block on shutdown so buffered
records are flushed before the supervisor exits. Single-worker mode is
unchanged — no queue overhead.
Two layers of wide events
Protean emits wide events at the handler layer: one event per
handled command, event, query, or projector invocation, on the
protean.access logger.
For applications fronted by FastAPI, the HTTP request is the outermost unit of work, which may trigger zero, one, or several command or query dispatches. The wide event at the handler layer captures each dispatched domain operation; the HTTP envelope is captured separately by FastAPI's own middleware layer (see FastAPI Integration).
All layers share correlation_id, so operators can query correlation_id = X
and see the HTTP envelope plus every domain operation it triggered in a
single result set.
What Protean deliberately does not do
Some capabilities that adjacent frameworks ship with are out of scope for Protean's logging:
- Email-on-error handlers. Django's
AdminEmailHandleris historical baggage; paging belongs in an alerting layer (PagerDuty, Opsgenie, Alertmanager) reading from the log backend, not a handler buried in the logging library. - An Observatory log viewer. Logs belong in log aggregators. The Observatory's value is live traces and event timelines — fighting Loki, Elasticsearch, and Datadog on their home turf is a losing strategy.
- PII heuristics. Automatic PII detection is a research problem; the
current state of the art is too flaky to enable by default without
false positives that mask real signal. Operators explicitly list the
keys they care about via
[logging].redact. - Audit log persistence. Domain events already capture the append-only history of what happened. The logging layer is for debugging and observability, not business audit trails.
Keeping these out of scope is deliberate. A smaller, coherent logging story serves operators better than a large one that duplicates downstream tools.