Outbox Pattern

The outbox pattern ensures reliable message delivery by storing messages in the same database transaction as your business data, then publishing them to the message broker in a separate process. This guarantees that messages are never lost, even if the broker is temporarily unavailable.

Why Use the Outbox Pattern?

Without the outbox pattern, there's a risk of data inconsistency:

sequenceDiagram
    participant App as Application
    participant DB as Database
    participant Broker as Message Broker

    App->>DB: 1. Save order
    DB-->>App: Success
    App->>Broker: 2. Publish OrderCreated
    Note over Broker: Broker is down!
    Broker--xApp: Failed

    Note over App,Broker: Order saved but event lost!

With the outbox pattern:

sequenceDiagram
    participant App as Application
    participant DB as Database
    participant Outbox as Outbox Table
    participant OP as Outbox Processor
    participant Broker as Message Broker

    App->>DB: 1. Save order
    App->>Outbox: 2. Save event (same transaction)
    DB-->>App: Success (both committed)

    Note over OP,Broker: Later, asynchronously...

    OP->>Outbox: 3. Poll for messages
    OP->>Broker: 4. Publish OrderCreated
    Broker-->>OP: Success
    OP->>Outbox: 5. Mark as published

    Note over App,Broker: Both order and event are guaranteed!

How It Works

Three things happen, in order:

1. Event storage. When an aggregate is saved through a repository, any events it raised are written to the outbox table within the same transaction as the aggregate state. If the transaction rolls back, both disappear together; if it commits, both are durable.

2. Outbox processing. The OutboxProcessor runs as part of the Engine, polling the outbox table, publishing each row to the configured broker, and marking the row as PUBLISHED on success. Failures are retried with exponential backoff. Each poll pushes its status, lock-window, and retry-window predicates into the database so a healthy outbox issues a single SELECT per tick. When outbox.max_tick_interval is set, an idle processor backs off its poll interval (doubling after each empty poll, up to the cap) and snaps back to the base tick_interval as soon as a batch is found.

3. Message consumption. StreamSubscription consumers read from the broker stream, just as they would for any other message. They have no awareness of the outbox — it's an implementation detail of the publisher side.

The guarantee that holds this together: step 1 is transactional with the aggregate save. Steps 2 and 3 are eventually consistent but never lossy — a row that's written in the outbox is published exactly once (with at-least-once semantics from the consumer's perspective).

One outbox processor runs per database provider. A domain with one database has one processor; a domain with a default database and an analytics database has two, each publishing its own outbox to the configured broker.

See the Outbox Guide for enabling the outbox, creating the table, configuring retries and cleanup, and investigating abandoned messages.

The event store is the durable anchor

Step 1 hides one more ordering detail. A unit of work makes two durable writes to two independent datastores — the event store and the relational database — and no transaction spans both. Protean appends events to the event store first, and only then commits the relational transaction that carries aggregate state and the outbox rows. The event store, the source of truth for event-sourced replay, is the durable anchor.

This ordering trades an unrecoverable failure for a recoverable one. If the process dies between the two writes, the events are durable in the store while their outbox rows never committed — the events exist but would never publish. That gap is closed by reconciliation, which re-derives the missing outbox rows from the event store, and runs both automatically on server startup and on demand via protean outbox reconcile. The full rationale — and why this was chosen over two-phase commit — is in ADR-0015; the operational walkthrough is in Recover from a crash.

External Dispatch for Published Events

Events marked with published=True can be delivered to external brokers — other bounded contexts, partner systems, or analytics pipelines. When external_brokers is configured, the Unit of Work creates additional outbox rows for each external broker, alongside the internal row:

sequenceDiagram
    participant App as Application
    participant DB as Database
    participant Outbox as Outbox Table
    participant IOP as Internal Processor
    participant EOP as External Processor
    participant IBR as Internal Broker
    participant EBR as External Broker

    App->>DB: 1. Save aggregate
    App->>Outbox: 2a. Internal row (same txn)
    App->>Outbox: 2b. External row (same txn)
    DB-->>App: Success (all committed)

    Note over IOP,EBR: Asynchronously, independently...

    IOP->>Outbox: 3a. Poll (target_broker=default)
    IOP->>IBR: 4a. Publish (full metadata)

    EOP->>Outbox: 3b. Poll (target_broker=partner)
    EOP->>EBR: 4b. Publish (stripped metadata)

Each row is processed independently — if the external broker is down, the internal row publishes normally while the external row retries on its own schedule.

External Envelope

External messages use a stripped envelope that removes internal-only fields (expected_version, asynchronous, priority, event store positions, checksum) while preserving fields external consumers need: headers for deduplication, domain context for routing, and user-provided extensions.

For setup, see Dispatching Published Events to External Brokers. For architectural trade-offs, see Publishing Events to External Brokers.

Outbox Message Lifecycle

Messages in the outbox go through several states:

stateDiagram-v2
    [*] --> PENDING: Event raised
    PENDING --> PROCESSING: Worker claims message
    PROCESSING --> PUBLISHED: Broker publish succeeds
    PROCESSING --> FAILED: Broker publish fails
    FAILED --> PENDING: Retry scheduled
    FAILED --> ABANDONED: Max retries exceeded
    PUBLISHED --> [*]: Cleanup removes
    ABANDONED --> [*]: Cleanup removes

Message States

State	Description
`PENDING`	Message waiting to be processed
`PROCESSING`	Message claimed by a worker
`PUBLISHED`	Successfully published to broker
`FAILED`	Publishing failed, may be retried
`ABANDONED`	Max retries exceeded, given up

Retry Mechanism

Failed messages are retried with exponential backoff:

Attempt 1: Immediate
Attempt 2: 60 seconds later (base_delay)
Attempt 3: 120 seconds later (base_delay * 2)
Attempt 4: 240 seconds later (base_delay * 4)
... up to max_backoff_seconds

With jitter enabled, delays are randomized by ±25% to prevent thundering-herd problems when a broker recovers and many messages become retry-eligible simultaneously.

After max_attempts failures, the message is marked ABANDONED and stops retrying. It remains in the outbox table as a durable record of the failure until cleanup removes it.

For tuning retry attempts, backoff, and jitter, see the Outbox Guide.

Message Cleanup

PUBLISHED and ABANDONED rows are purged on a schedule so the outbox table doesn't grow unbounded. Published rows are kept long enough to serve as an audit trail; abandoned rows are kept long enough to investigate. The retention windows are separately configurable — see Outbox Guide: Configure cleanup.

Multi-Worker Support

When running with --workers N (see Multi-Worker Mode), each worker runs its own OutboxProcessor. Messages are claimed atomically at the database level to prevent duplicate publishing.

Database-Level Locking

The processor uses an atomic UPDATE...WHERE to claim messages. Under READ COMMITTED isolation (PostgreSQL, MSSQL), concurrent updates on the same row block until the first transaction commits, then re-evaluate the WHERE clause -- so only one worker succeeds:

# Simplified view of claim_for_processing():
claimed_count = dao.query.filter(
    id=message.id,
    status__in=["pending", "failed"],   # Only eligible messages
).update_all(
    status="processing",
    locked_by=worker_id,
    locked_until=now + timedelta(minutes=5),
)
# claimed_count > 0 only for the winning worker

This prevents the TOCTOU (Time-Of-Check-Time-Of-Use) race condition where two workers could both read a message as PENDING and both attempt to publish it.

Lock Fields

Each outbox message carries lock metadata:

Field	Description
`locked_by`	Worker identifier that holds the lock
`locked_until`	When the lock expires (default 5 minutes)
`status`	Current processing state (`PROCESSING` while locked)

Lock Lifecycle

Worker fetches a batch of PENDING messages.
For each message, the worker calls claim_for_processing() -- an atomic database operation that sets the status to PROCESSING and records the worker ID and lock expiry.
If the claim succeeds, the worker publishes the message to the broker and marks it as PUBLISHED. If publishing fails, the message is marked as FAILED for retry.
If the claim fails (another worker already claimed it), the worker skips that message and moves on.
All operations happen within a UnitOfWork, so the claim, publish, and status update are atomic.

Stale Lock Recovery

If a worker crashes while holding a lock, the lock expires after the configured duration (default 5 minutes). The message remains in PROCESSING status with an expired locked_until timestamp. Another worker detects the expired lock and the message becomes eligible for reprocessing.

Operational Signals

Four signals characterize outbox health. Any of them drifting persistently is worth investigating:

Signal	Meaning	Likely cause when elevated
Pending count	Rows waiting to be processed	Worker throughput below publish rate; broker slow or unreachable
Failed count	Rows that failed at least once	Transient broker errors; downstream schema drift
Retry rate	Fraction of attempts that retry	Persistent broker issue; publish-side bug
Abandoned count	Rows that exhausted retries	Chronic failure — needs investigation before cleanup removes the evidence

Runtime visibility is available via Observatory's /api/outbox endpoint and via direct queries on the outbox repository. See Outbox Guide: Investigate abandoned messages.

Using the Outbox — Enable the outbox, create the table, configure retries and cleanup, investigate abandoned messages.
Dispatching Published Events to External Brokers — Routing events to partner systems.
Subscription Types — How StreamSubscription consumes outbox-published messages.
Server Configuration — Full configuration reference.