Creating Identities Early
The Problem
In many systems, identity is an afterthought -- a database-generated auto-increment integer assigned at the moment of persistence. The aggregate doesn't know its own identity until it has been saved, and callers don't learn the identity until the save completes.
This creates a cascade of problems:
- Commands cannot reference the aggregate they intend to create. A
PlaceOrdercommand cannot carry theorder_idbecause the ID doesn't exist yet. The handler must create the aggregate, persist it, extract the generated ID, and return it. The caller is left waiting. - API responses are coupled to the database round-trip. The client submits a POST request but cannot know the resource URL until the server persists the record and returns the generated ID. This blocks optimistic UI patterns and forces synchronous, tightly-coupled request flows.
- Idempotent creation is impossible without the identity. If the client retries
a create request (network timeout, user double-click), the system has no way to
detect the duplicate. Each retry generates a new auto-increment ID, producing
duplicate records. The check-then-act pattern (
does this order already exist?) requires knowing the identity upfront. - Events raised during creation cannot carry a stable identity. An
OrderPlacedevent should contain theorder_id, but if identity is database-assigned, the event is raised before the ID is known -- or the event must be patched after persistence, complicating the flow. - Distributed systems require central coordination. Auto-increment IDs depend on a single database sequence. In a distributed architecture with multiple nodes, services, or event-sourced aggregates, this becomes a bottleneck or a single point of failure.
These problems all stem from the same root cause: deferring identity generation to the persistence layer.
The Pattern
Generate the aggregate's identity at the point of creation -- or even earlier, at the system boundary where the intent originates.
Traditional flow:
Client → API → Handler → Create aggregate → Persist → Get ID → Return ID
↑
Identity assigned here
(too late)
Early identity flow:
Client → Generate ID → API (with ID) → Handler → Create aggregate → Persist
↑
Identity assigned here
(as early as possible)
The key insight from Domain-Driven Design is that identity is intrinsic to an entity, not a side effect of storage. An entity is defined by its identity. It should have that identity from the moment it comes into existence -- not from the moment it is persisted.
Why This Matters in DDD
Identity Is Foundational
In DDD, entities and aggregates are distinguished from value objects by one
defining characteristic: identity. Two Order instances with the same
attributes but different identities are different orders. Two with different
attributes but the same identity are the same order at different points in time.
If identity is foundational, it should be present from birth. An aggregate without an identity is incomplete -- it cannot participate in domain operations, cannot be referenced by commands or events, and cannot enforce invariants that depend on knowing "which one" it is.
Aggregates Are Consistency Boundaries
An aggregate is the boundary within which invariants are guaranteed. Commands target a specific aggregate instance by identity. Events reference the aggregate that changed by identity. Repositories load and persist by identity.
When identity is deferred to the database, there is a gap between the aggregate's creation and its ability to participate in these operations. Early identity generation closes this gap entirely.
Commands Carry Intent and Context
A well-designed command carries everything the handler needs, including the identity of the aggregate being acted upon. For creation commands, this means the caller -- not the handler, and not the database -- decides the identity:
# The command carries the identity of the aggregate it will create.
# The caller generates this identity before submitting the command.
PlaceOrder(
order_id="ord-a1b2c3d4",
customer_id="cust-789",
items=[...],
)
This makes the command self-contained. The handler doesn't need to generate an ID,
and it can use the same order_id in every event it raises, every repository
call it makes, and every response it returns.
Events Record Facts with Stable References
Domain events are immutable facts. Once raised, they become part of the system's history. Every downstream consumer -- event handlers, projectors, sagas -- relies on the identities embedded in these events to correlate, route, and process.
When identity is generated early, events carry stable, meaningful references from the start:
# The event references the same order_id that the command carried.
# Downstream handlers can immediately correlate this event.
OrderPlaced(
order_id="ord-a1b2c3d4",
customer_id="cust-789",
total=149.99,
)
How Protean Supports Early Identity
Protean's identity system is designed around the principle that identities should be generated close to the point of element creation, without relying on external infrastructure like databases.
Automatic Identity Generation at Construction
When you create an aggregate or entity instance, Protean generates its identity immediately -- at construction time, not at persistence time:
@domain.aggregate
class Order:
order_id = Auto(identifier=True)
customer_id = Identifier()
total = Float()
# Identity is assigned the moment the object is created
order = Order(customer_id="cust-789", total=149.99)
print(order.order_id) # '9cf4ddc4-2919-4021-bd1a-c8083b5fdda7'
The Auto field generates a UUID immediately. No database round-trip, no
sequence query, no central coordinator. The aggregate has its identity from
the moment it exists.
If no identity field is explicitly declared, Protean automatically adds an Auto
field named id:
@domain.aggregate
class Order:
customer_id = Identifier()
total = Float()
order = Order(customer_id="cust-789", total=149.99)
print(order.id) # Auto-generated UUID
Caller-Supplied Identities
When the caller already has an identity -- because it was generated at the client, at the API layer, or carried in a command -- it can be supplied directly:
# The caller provides the identity explicitly
order = Order(
order_id="ord-a1b2c3d4",
customer_id="cust-789",
total=149.99,
)
print(order.order_id) # 'ord-a1b2c3d4'
The Auto field accepts explicit values. When a value is provided, it is used
as-is. When omitted, a value is auto-generated. This means the same aggregate
definition supports both caller-supplied and framework-generated identities.
Identity Strategies and Types
Protean's identity system is configurable at two levels: domain-wide defaults and per-field overrides.
Domain-level configuration (in domain.toml):
identity_strategy = "uuid" # "uuid" (default) or "function"
identity_type = "string" # "string" (default), "integer", or "uuid"
UUIDs are the default and recommended strategy because they can be generated anywhere -- in the client, in the API layer, in the command handler -- without coordination. This is precisely what makes early identity generation possible.
Per-field override for aggregates with special requirements:
import time
def gen_epoch_id():
return int(time.time() * 1000)
@domain.aggregate
class Measurement:
measurement_id = Auto(
identifier=True,
identity_strategy="function",
identity_function=gen_epoch_id,
identity_type="integer",
)
value = Float()
The Identifier Field on Commands
Commands use the Identifier field to carry aggregate identities. Unlike Auto,
Identifier does not auto-generate values -- the caller must supply them:
@domain.command(part_of=Order)
class PlaceOrder(BaseCommand):
order_id = Identifier(identifier=True)
customer_id = Identifier()
items = List()
total = Float()
This design reinforces the pattern: the identity originates at the caller and flows through the command into the aggregate.
Applying the Pattern
At the API Boundary
The most common place to generate identities early is the API layer. When a client sends a creation request, the API endpoint generates (or accepts) the identity before constructing the command:
import uuid
from fastapi import FastAPI
app = FastAPI()
@app.post("/orders")
async def create_order(request: CreateOrderRequest):
# Option 1: Accept the identity from the client
order_id = request.order_id
# Option 2: Generate at the API layer if not provided
if not order_id:
order_id = str(uuid.uuid4())
domain.process(
PlaceOrder(
order_id=order_id,
customer_id=request.customer_id,
items=request.items,
total=request.total,
)
)
# The API can return the identity immediately,
# without waiting for persistence to complete.
return {"order_id": order_id, "status": "accepted"}
The response includes the order_id immediately. If the command is processed
asynchronously, the client already knows the identity and can use it to poll
for status, navigate to the resource, or issue follow-up commands.
At the Client
For the earliest possible identity generation, the client itself creates the ID:
Frontend (browser/mobile):
1. Generate UUID: "ord-a1b2c3d4-..."
2. POST /orders { order_id: "ord-a1b2c3d4-...", items: [...] }
3. Immediately navigate to /orders/ord-a1b2c3d4-...
4. Display optimistic UI while the server processes
Server:
1. Receive request with order_id already set
2. Construct and process PlaceOrder command
3. Aggregate created with the client-provided identity
This enables optimistic UI patterns: the client doesn't wait for the server to confirm creation before showing the new resource. The UUID guarantees uniqueness without server coordination.
First Creation vs. Subsequent Commands
Early identity generation applies specifically to the creation command -- the first command that brings an aggregate into existence. After creation, every subsequent command naturally carries the identity because you need to know which aggregate to act on:
# Creation: identity generated at the caller
order_id = str(uuid.uuid4())
domain.process(PlaceOrder(order_id=order_id, items=[...]))
# Subsequent commands: identity is already known
domain.process(AddItemToOrder(order_id=order_id, product_id="prod-1", quantity=2))
domain.process(ConfirmOrder(order_id=order_id))
domain.process(ShipOrder(order_id=order_id, tracking_number="TRK-456"))
The pattern ensures there is never a moment when the caller lacks the identity needed to interact with the aggregate.
Enabling Idempotent Creation
Early identity generation is the foundation for idempotent creation commands. Without it, there is no reliable way to detect whether a create request is a duplicate.
The Check-Then-Act Pattern
When the creation command carries the aggregate's identity, the handler can check whether the aggregate already exists:
@domain.command_handler(part_of=Order)
class OrderCommandHandler(BaseCommandHandler):
@handle(PlaceOrder)
def place_order(self, command: PlaceOrder):
repo = current_domain.repository_for(Order)
# If the order already exists, this is a duplicate command
existing = repo.get(command.order_id)
if existing:
return # Idempotent: no-op on duplicate
order = Order(
order_id=command.order_id,
items=command.items,
total=command.total,
)
repo.add(order)
This pattern is simple and effective. It works without any additional infrastructure (no Redis, no idempotency keys) and provides handler-level safety even when framework-level deduplication is unavailable.
For stronger guarantees, combine early identity generation with Protean's framework-level idempotency keys. See the Command Idempotency pattern for the full treatment.
Why Database-Generated IDs Break Idempotency
Consider the same handler without early identity:
# Anti-pattern: identity generated by the database
@handle(PlaceOrder)
def place_order(self, command: PlaceOrder):
# No order_id in the command -- the database will assign one
order = Order(items=command.items, total=command.total)
repo.add(order) # Database generates the ID on insert
If this command is delivered twice (network retry, broker redelivery), the handler creates two separate orders with two different database-assigned IDs. There is no way to detect the duplicate because each execution looks like a fresh creation.
Choosing the Right Identity Source
| Scenario | Generate Identity At | Rationale |
|---|---|---|
| Standard API creation | API endpoint | Simplest; identity available for immediate response |
| Optimistic UI | Client (browser/mobile) | Client navigates to the resource before server confirms |
| Async command processing | API endpoint or client | Caller needs the identity to correlate the eventual result |
| Saga-initiated creation | Saga/process manager | The saga tracks the identity for compensating actions |
| Event-sourced aggregates | Client or API endpoint | The event stream needs a stable identity from the first event |
| Internal service-to-service | Calling service | The caller tracks the identity for correlation across services |
| Batch/import processing | Import script | Each record gets an identity before the batch begins |
In every case, the principle is the same: whoever originates the intent generates the identity.
When Not to Use This Pattern
Early identity generation is the default recommendation, but there are situations where it may not apply:
- Natural identities from the domain: When the domain itself provides a
unique identifier -- ISBN for books, email for user accounts, SSN for tax
records -- you don't need to generate a synthetic identity. Use the
identifier=Trueflag on the natural key field:
@domain.aggregate
class Book:
isbn = String(max_length=13, identifier=True)
title = String(max_length=200, required=True)
The creation command carries this natural identity (isbn) from the caller
naturally, so the pattern's benefits still apply -- just without the UUID
generation step.
- Auto-increment requirements from external systems: Some integrations
require sequential numeric IDs (invoice numbers, receipt numbers). Use the
incrementoption on theAutofield for these, understanding that identity generation is deferred to the database:
@domain.aggregate
class Invoice:
invoice_number = Auto(identifier=True, increment=True)
# ...
Even here, consider using a UUID as the internal aggregate identity and treating the sequential number as a separate domain attribute assigned during a specific workflow step.
Summary
| Aspect | Database-Generated ID | Early Identity |
|---|---|---|
| When assigned | At persistence time | At creation time (or earlier) |
| Who decides | The database | The caller (client, API, saga) |
| Available in commands | No | Yes |
| Available in events | After persistence | Immediately |
| Supports idempotent creation | No | Yes (check-then-act) |
| Supports async processing | Poorly (caller must wait) | Well (caller has the ID immediately) |
| Supports optimistic UI | No | Yes |
| Distributed-friendly | No (central sequence) | Yes (UUIDs need no coordination) |
| Protean default | No | Yes (Auto field with UUID) |
The pattern is simple: generate identities at the origin of intent, not at the
point of storage. Protean's Auto field with UUID generation makes this the
default behavior. Commands carry the identity. Events reference it. Handlers use
it. The entire flow is simpler, more resilient, and naturally idempotent.