ADR-0007: Domain-Scoped OpenTelemetry Providers
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-03-16 |
| Author | Subhash Bhushan |
| Epic | 6.1 OpenTelemetry Integration (#742) |
Context
Protean needed production-grade observability -- distributed tracing and
metrics -- to support ops teams shipping telemetry to backends like
Grafana, Datadog, and Jaeger. OpenTelemetry (OTEL) was the natural choice:
it is vendor-agnostic, widely adopted, and provides the W3C TraceContext
propagation that Protean's TraceParent value object already uses.
OTEL's recommended usage pattern is to register a single global
TracerProvider and MeterProvider at process startup via
trace.set_tracer_provider() and metrics.set_meter_provider(), then
retrieve them anywhere with trace.get_tracer(). This works well for
conventional applications that have exactly one telemetry configuration
per process.
Protean's Domain class, however, is designed to be instantiated multiple
times in the same process -- most commonly in tests, where each test case
may create its own Domain(name="Test") with different configuration.
The OTEL global registration model creates three problems in this context:
-
Single-assignment semantics. OTEL globals are write-once: the first call to
set_tracer_provider()wins, and subsequent calls are silently ignored (or log a warning). A second domain in the same process cannot override the provider with its own configuration. -
Shutdown leaves broken globals. Calling
shutdown()on the global provider puts it into a terminal state. Any spans created afterward are silently dropped. In tests, where domains are created, activated, and shut down repeatedly, this means only the first test gets real spans. -
No per-domain configuration. Different domains may need different service names, exporters, or resource attributes. A single global provider cannot express this.
Protean also already had an existing observability system -- the
Observatory -- which provides zero-config, real-time tracing for
developers running protean server locally. Observatory uses Redis
Streams and SSE, producing flat MessageTrace events. OTEL needed to
coexist with Observatory without replacing it: Observatory serves
developers; OTEL serves ops teams. The two systems share instrumentation
callsites but have independent emission paths, connected only at the
/metrics endpoint where OTEL's PrometheusMetricReader takes over
when available.
Decision
We store TracerProvider and MeterProvider as attributes on the
Domain instance, never setting OTEL globals. All access goes through
domain-scoped helpers:
# In application code -- always safe, always returns something usable
tracer = domain.tracer # lazy property, triggers init on first access
meter = domain.meter
# Under the hood (src/protean/utils/telemetry.py)
def get_tracer(domain, name="protean"):
provider = getattr(domain, "_otel_tracer_provider", None)
if provider is None:
return _NoOpTracer()
return provider.get_tracer(name)
Providers are created during init_telemetry(domain), which is called
lazily on first access to domain.tracer or domain.meter. A sentinel
flag (_otel_init_attempted) prevents repeated initialization attempts.
shutdown_telemetry(domain) flushes and destroys the providers, then
resets the sentinel so a new initialization cycle is possible -- critical
for test isolation.
The Domain class exposes two lazy properties:
domain.tracer-- returns a configured OTELTraceror a_NoOpTracerdomain.meter-- returns a configured OTELMeteror a_NoOpMeter
Both use deferred imports so the opentelemetry package is never loaded
unless telemetry is actually accessed.
Telemetry is disabled by default (telemetry.enabled: False in
domain config) and all OTEL packages are optional extras
(pip install protean[telemetry]). When disabled or when packages are
absent, all public functions return lightweight no-op objects that
implement the minimal OTEL interface, so instrumentation code never
needs conditional guards.
Consequences
Positive:
- Each domain instance owns its telemetry lifecycle. Tests create isolated
domains with
InMemorySpanExporterand assert on spans without cross-test contamination. - Multiple domains in the same process (e.g., a multi-bounded-context deployment) can have independent service names, exporters, and resource attributes.
- Clean shutdown/re-init cycles work correctly -- no "zombie provider" problem where a shut-down global silently drops spans.
- Observatory and OTEL coexist cleanly at instrumentation callsites.
The
/metricsendpoint serves OTEL Prometheus exposition when available, falling back to the hand-rolled implementation when not.
Negative:
- Diverges from OTEL's documented best practice. Contributors familiar
with OTEL will initially reach for
trace.get_tracer()(the global) and need to learn the domain-scoped pattern. - Every instrumented callsite must have access to the
Domaininstance (typically viacurrent_domaincontext or direct reference) to obtain a tracer or meter. - Third-party OTEL auto-instrumentation libraries that expect global
providers (e.g.,
opentelemetry-instrumentation-requests) will not automatically participate in Protean's domain-scoped traces. Manual bridging would be needed.
Alternatives Considered
Global providers (OTEL default). Rejected because of the single-assignment, broken-shutdown, and no-per-domain-config problems described in Context. These are fundamental to OTEL's design and unlikely to change upstream.
Per-thread providers. Would solve test isolation for single-threaded test runners but not for async or multi-domain production scenarios. Also adds complexity without addressing the configuration problem.
Monkey-patching OTEL globals per test. Fragile, requires careful teardown ordering, and breaks if any library caches a reference to the old provider.
References
- OTEL Python SDK: TracerProvider
- OTEL specification: Global providers are set once
- Epic 6.1: OpenTelemetry Integration (#742)