Design Projection Granularity Around Consumer Needs
The Problem
Developers new to CQRS face a fundamental design question: how many projections should I create, and what should each one contain? The answer usually falls into one of two extremes, both of which cause problems.
Extreme 1: Mirror the aggregate
The most common first instinct is to create one projection per aggregate that mirrors its structure:
# Anti-pattern: projection that mirrors the Order aggregate 1:1
@domain.aggregate
class Order:
order_id: Auto(identifier=True)
customer_id: Identifier(required=True)
customer_name: String(required=True)
items = HasMany(OrderItem)
status: String(default="draft")
total: Float(default=0.0)
placed_at: DateTime()
shipped_at: DateTime()
@domain.projection
class OrderProjection:
order_id: Identifier(identifier=True)
customer_id: Identifier()
customer_name: String()
status: String()
total: Float()
placed_at: DateTime()
shipped_at: DateTime()
# Where are the items? Need a separate projection for those,
# plus a join to combine them. Same complexity as the write side.
This defeats the purpose of CQRS. The read model mirrors the write model, so you get the same joins and complexity. The projection adds maintenance burden without any benefit.
The symptoms:
-
API handlers still need joins. The order list endpoint needs customer names, so it joins the order projection with a customer projection -- the same join the write side would have needed.
-
Projectors are trivial copy machines. Every projector just copies fields from the event to the projection. No transformation, no denormalization, no value added.
-
No read-side optimization. Queries against the projection are no faster than queries against the aggregate's table, because the data shape is identical.
Extreme 2: One projection per endpoint
The opposite instinct is to create a perfectly denormalized projection for every API endpoint or UI component:
# Anti-pattern: hyper-specific projections for every endpoint
@domain.projection
class OrderListItem: # For GET /orders
order_id: Identifier(identifier=True)
customer_name: String()
status: String()
total: Float()
@domain.projection
class OrderDetail: # For GET /orders/{id}
order_id: Identifier(identifier=True)
customer_name: String()
customer_email: String()
status: String()
total: Float()
item_count: Integer()
shipped_at: DateTime()
@domain.projection
class OrderForShipping: # For the shipping dashboard
order_id: Identifier(identifier=True)
customer_name: String()
status: String()
tracking_number: String()
# ... and a projector for each one, all listening to the same events
Now you have three projections and three projectors all maintaining overlapping data from the same events. The consequences:
-
Maintenance explosion. Every new event field must be propagated to every projection that needs it. Adding a
discount_amounttoOrderPlacedmeans updating four projectors. -
Rebuild cost. Rebuilding projections means replaying events through all four projectors. With millions of orders, this takes four times as long.
-
Staleness windows. Each projection updates independently. During high load, the list view might show "shipped" while the detail view still shows "placed" because its projector is lagging.
-
Schema proliferation. The database accumulates tables that are 80% identical, wasting storage and complicating migrations.
The root cause of both extremes: projections were designed around domain entities (one per aggregate) or infrastructure concerns (one per endpoint), rather than around what consumers actually need.
The Pattern
Design each projection around a consumer need -- a UI view, an API resource, or a query pattern -- not around a domain entity or an endpoint.
Wrong mental model:
"I have an Order aggregate, so I need an Order projection."
Also wrong:
"I have five endpoints that show orders, so I need five projections."
Right mental model:
"What distinct read patterns do my consumers have? Each pattern
gets one projection, shaped to serve it directly."
The decision framework
For every screen, API resource, or query pattern, ask these questions:
-
What data does the consumer need? List the fields. If they span multiple aggregates, the projection should combine them -- that is the whole point of a read model.
-
How is the data accessed? By primary key lookup? By filtered search with sorting? By key-value cache hit? This determines whether the projection is database-backed (needs queries) or cache-backed (needs fast key lookups).
-
How volatile is the data? If it changes every few seconds (dashboard counters, live status), consider a cache-backed projection. If it changes infrequently but needs complex queries (order history with filters), use database-backed.
-
Does another projection already serve 80% of this need? If two consumers need almost the same data, prefer one projection with a few optional fields over two separate projections. The API layer can select which fields to return.
Rules of thumb
| Situation | Guidance |
|---|---|
| Two views need the same data | One projection, two API endpoints |
| Two views share 80% of fields | One projection with optional fields |
| Two views need fundamentally different data | Two projections |
| Data is queried by filters and sorting | Database-backed (provider="default") |
| Data is looked up by ID for fast display | Cache-backed (cache="default") |
| Data spans multiple aggregates | One cross-aggregate projection with multi-aggregate projector |
| Data mirrors the aggregate exactly | You probably don't need a projection at all |
Applying the Pattern
Example 1: Cross-aggregate projection
An order summary page needs data from three aggregates: Order (status, total), Customer (name, email), and Product (item names). Instead of three projections with joins, build one denormalized projection:
@domain.projection
class OrderSummary:
"""Serves the order list page and the order detail page.
Combines data from Order, Customer, and Product aggregates
into a single denormalized read model.
"""
order_id: Identifier(identifier=True)
customer_id: Identifier()
customer_name: String()
customer_email: String()
status: String()
total: Float()
item_count: Integer()
placed_at: DateTime()
shipped_at: DateTime()
tracking_number: String()
The projector listens to events from multiple aggregates using
stream_categories:
from protean.core.projector import on
@domain.projector(
projector_for=OrderSummary,
stream_categories=["order", "customer"],
)
class OrderSummaryProjector:
@on(OrderPlaced)
def on_order_placed(self, event: OrderPlaced) -> None:
repo = current_domain.repository_for(OrderSummary)
repo.add(OrderSummary(
order_id=event.order_id,
customer_id=event.customer_id,
customer_name=event.customer_name,
customer_email=event.customer_email,
status="placed",
total=event.total,
item_count=len(event.items),
placed_at=event.placed_at,
))
@on(OrderShipped)
def on_order_shipped(self, event: OrderShipped) -> None:
repo = current_domain.repository_for(OrderSummary)
summary = repo.get(event.order_id)
summary.status = "shipped"
summary.shipped_at = event.shipped_at
summary.tracking_number = event.tracking_number
repo.add(summary)
@on(CustomerNameChanged)
def on_customer_name_changed(self, event: CustomerNameChanged) -> None:
"""Update all orders for this customer when their name changes."""
view = current_domain.view_for(OrderSummary)
orders = view.find_by(customer_id=event.customer_id)
repo = current_domain.repository_for(OrderSummary)
for order in orders:
order.customer_name = event.new_name
repo.add(order)
Cross-aggregate ordering
When a projector listens to multiple stream categories, rebuild_projection()
merges events by global_position so that cross-aggregate events are replayed
in the correct chronological order. This is critical for projections where
event ordering across aggregates matters.
The key insight: the OrderSummary projection does not mirror any single
aggregate. It combines data from Order and Customer into the shape that the
consumer (the order list/detail page) needs. The projector handles events from
both aggregates, keeping the projection current as either side changes.
Example 2: Cache-backed projection for a real-time dashboard
A warehouse dashboard shows live order counts by status. The data changes every few seconds as orders move through the pipeline. A database-backed projection would add unnecessary I/O for data that is only ever looked up by a single key:
@domain.projection(cache="default")
class WarehouseDashboard:
"""Real-time order status counts for the warehouse display.
Cache-backed because:
- Looked up by a single key (warehouse_id)
- Updates frequently (every order status change)
- Never queried with filters or sorting
- Acceptable to lose on cache eviction (rebuilt from events)
"""
warehouse_id: Identifier(identifier=True)
pending_count: Integer(default=0)
processing_count: Integer(default=0)
shipped_count: Integer(default=0)
delivered_count: Integer(default=0)
last_updated: DateTime()
from datetime import datetime, timezone
@domain.projector(
projector_for=WarehouseDashboard,
aggregates=[Order],
)
class WarehouseDashboardProjector:
@on(OrderPlaced)
def on_order_placed(self, event: OrderPlaced) -> None:
repo = current_domain.repository_for(WarehouseDashboard)
try:
dashboard = repo.get(event.warehouse_id)
except ObjectNotFoundError:
dashboard = WarehouseDashboard(
warehouse_id=event.warehouse_id,
)
dashboard.pending_count += 1
dashboard.last_updated = datetime.now(timezone.utc)
repo.add(dashboard)
@on(OrderShipped)
def on_order_shipped(self, event: OrderShipped) -> None:
repo = current_domain.repository_for(WarehouseDashboard)
dashboard = repo.get(event.warehouse_id)
dashboard.processing_count -= 1
dashboard.shipped_count += 1
dashboard.last_updated = datetime.now(timezone.utc)
repo.add(dashboard)
@on(OrderDelivered)
def on_order_delivered(self, event: OrderDelivered) -> None:
repo = current_domain.repository_for(WarehouseDashboard)
dashboard = repo.get(event.warehouse_id)
dashboard.shipped_count -= 1
dashboard.delivered_count += 1
dashboard.last_updated = datetime.now(timezone.utc)
repo.add(dashboard)
The API layer reads from the cache via ReadView:
view = domain.view_for(WarehouseDashboard)
dashboard = view.get("warehouse-east-1")
Cache-backed limitations
Cache-backed projections only support get(), count(), and exists() on
the ReadView. They do not support query or find_by() because cache
stores are key-value backends. If you need filtered queries, use a
database-backed projection.
Example 3: Database-backed projection for searchable order history
A customer service tool needs to search orders by date range, status, and customer name. This requires a database-backed projection:
@domain.projection(provider="default")
class OrderHistory:
"""Searchable order history for customer service.
Database-backed because it needs filtering, ordering, and pagination.
"""
order_id: Identifier(identifier=True)
customer_id: Identifier()
customer_name: String()
status: String()
total: Float()
placed_at: DateTime()
shipped_at: DateTime()
cancelled_at: DateTime()
cancellation_reason: String()
The projector follows the same pattern as Example 1 -- listening to order and
customer stream categories, handling each event type to create or update the
projection. The key difference is in how consumers query it:
view = domain.view_for(OrderHistory)
# Find all orders for a customer, newest first
orders = view.query.filter(customer_id="cust-123").order_by("-placed_at").all()
# Find cancelled orders in a date range
cancelled = (
view.query
.filter(status="cancelled")
.filter(cancelled_at__gte=start_date)
.filter(cancelled_at__lte=end_date)
.all()
)
# Count pending orders
pending_count = view.count(status="placed")
Database-backed projections give you the full ReadView query API: filter(),
order_by(), limit(), count(), and exists(). This is what makes them the
right choice when the consumer needs to search, sort, or paginate.
Example 4: Shared projection serving two similar API endpoints
The order list page and the order detail page need almost the same data. The detail page just needs a few extra fields (tracking number, cancellation reason). Instead of two projections, use one with optional fields:
@domain.projection
class OrderView:
"""Serves both the order list and order detail endpoints.
The list endpoint returns: order_id, customer_name, status, total, placed_at
The detail endpoint returns: all fields
One projection, two serializers in the API layer.
"""
order_id: Identifier(identifier=True)
customer_id: Identifier()
customer_name: String()
customer_email: String()
status: String()
total: Float()
item_count: Integer()
placed_at: DateTime()
# Detail-only fields (optional, populated when available)
shipped_at: DateTime()
delivered_at: DateTime()
tracking_number: String()
cancellation_reason: String()
The API layer selects which fields to expose:
# In your FastAPI or equivalent API layer
def get_orders():
"""GET /orders -- list view, subset of fields."""
view = domain.view_for(OrderView)
orders = view.query.order_by("-placed_at").limit(50).all()
return [
{
"order_id": o.order_id,
"customer_name": o.customer_name,
"status": o.status,
"total": o.total,
"placed_at": o.placed_at,
}
for o in orders
]
def get_order(order_id: str):
"""GET /orders/{id} -- detail view, all fields."""
view = domain.view_for(OrderView)
order = view.get(order_id)
return order.to_dict()
This approach works because:
- One projector maintains one projection. Adding a field means updating one projector, not two.
- One rebuild replays events once, not twice.
- Consistent staleness. Both views are always at the same version of the data.
- The API layer owns field selection, which is its responsibility anyway. The projection owns the data shape; the API owns the response shape.
The 80% rule
When two consumers share 80% or more of their fields, prefer one projection with optional fields. When they share less than 50%, they probably represent genuinely different read patterns and deserve separate projections. The 50-80% range is a judgment call -- lean toward fewer projections unless the optional fields are expensive to maintain.
Example 5: Knowing when NOT to use a projection
Sometimes the right answer is to not create a projection at all. If the read pattern matches the aggregate structure exactly and the data lives in one aggregate, query the aggregate's repository directly:
# No projection needed -- the aggregate is the read model
@domain.aggregate
class Product:
product_id: Auto(identifier=True)
name: String(required=True)
description: Text()
price: Float(required=True)
category: String()
is_active: Boolean(default=True)
# The product detail page needs exactly these fields.
# No cross-aggregate data, no denormalization needed.
# Query the aggregate repository directly.
repo = domain.repository_for(Product)
product = repo.get(product_id)
Create a projection only when the read model's shape differs from the write model -- because it combines multiple aggregates, precomputes derived data, or optimizes for a specific query pattern that the aggregate's table does not support well.
Anti-Patterns
The carbon copy projection
# Anti-pattern: projection is a field-for-field copy of the aggregate
@domain.aggregate
class Invoice:
invoice_id: Auto(identifier=True)
customer_id: Identifier(required=True)
amount: Float(required=True)
status: String(default="draft")
issued_at: DateTime()
due_at: DateTime()
@domain.projection
class InvoiceProjection:
invoice_id: Identifier(identifier=True)
customer_id: Identifier()
amount: Float()
status: String()
issued_at: DateTime()
due_at: DateTime()
# Identical structure. What's the point?
If the projection mirrors the aggregate, you are maintaining two copies of the same data with no benefit. Either add consumer-specific value (join in customer name, precompute derived fields) or remove the projection entirely.
The per-endpoint explosion
# Anti-pattern: separate projections for every slight variation
@domain.projection
class InvoiceListView: # list page
invoice_id: Identifier(identifier=True)
customer_name: String()
amount: Float()
status: String()
@domain.projection
class InvoiceDetailView: # detail page
invoice_id: Identifier(identifier=True)
customer_name: String()
customer_email: String()
amount: Float()
status: String()
issued_at: DateTime()
@domain.projection
class InvoiceForExport: # CSV export
invoice_id: Identifier(identifier=True)
customer_name: String()
amount: Float()
issued_at: DateTime()
# Three projections, three projectors, all overlapping.
These share most of their fields. Consolidate into one InvoiceView with
optional fields and let the API layer select what to return.
The projector that queries aggregates
# Anti-pattern: projector loads aggregates to fill projection fields
@domain.projector(projector_for=OrderSummary, aggregates=[Order])
class OrderSummaryProjector:
@on(OrderPlaced)
def on_order_placed(self, event: OrderPlaced) -> None:
# Event only has order_id -- must load the aggregate
order = current_domain.repository_for(Order).get(event.order_id)
customer = current_domain.repository_for(Customer).get(order.customer_id)
repo = current_domain.repository_for(OrderSummary)
repo.add(OrderSummary(
order_id=order.order_id,
customer_name=customer.name, # Loaded from Customer aggregate
total=order.total,
))
If the projector loads aggregates to get data, the events are too thin. Fix the events first -- they should carry enough context for the projector to work independently. See Design Events for Consumers.
The orphaned projection
A projection that no API endpoint or UI view actually reads. This happens when
endpoints are deleted or refactored but the projection and projector remain.
The projector keeps processing events, writing data that nobody reads. Audit
your projections periodically: if nothing calls view_for() or
repository_for() on a projection, remove it.
Summary
| Aspect | Mirror-the-Aggregate | Per-Endpoint | Consumer-Oriented (Pattern) |
|---|---|---|---|
| Number of projections | One per aggregate | One per endpoint | One per distinct read pattern |
| Data shape | Same as write model | Hyper-specific | Shaped for consumer need |
| Cross-aggregate data | Requires joins | Fully denormalized | Denormalized where needed |
| Projector count | Low but useless | High and overlapping | Moderate and focused |
| Rebuild cost | Fast but pointless | Expensive (N projectors) | Proportional to real needs |
| Maintenance burden | Low (trivial copy) | High (N projectors per event) | Moderate |
| Field selection | Consumer must filter | Perfect fit | API layer selects fields |
| Staleness | N/A (same as write) | Multiple windows | One window per projection |
The principle: design projections around consumer needs, not domain entities. Combine data from multiple aggregates into the shape the consumer requires. Use cache-backed projections for volatile key-value lookups and database-backed projections for complex queries. When two consumers need similar data, prefer one projection with optional fields over two separate ones.
Related reading
Patterns:
- Design Events for Consumers -- Events carry enough context for projectors to act independently.
- Design Small Aggregates -- Small aggregates affect projection design.
Concepts:
- Projections -- What projections are and how they fit in CQRS.
- Projectors -- How projectors maintain projections.
Guides:
- Projections -- Defining and configuring projections.
- Projectors -- Defining projectors and handling events.