Intermediate Representation (IR)
The Protean IR captures the topology of a domain model — what elements
exist, what shape they have, and how they connect — in a portable JSON format.
It is the machine-readable representation of everything a Domain knows after
domain.init() completes.
The IR answers three kinds of questions:
- Structural: What aggregates exist? What entities, value objects, fields, and constraints do they contain?
- Behavioral: What commands target which aggregates? What events do they raise? What handlers process those commands and events?
- Infrastructure: What repositories persist which aggregates? What database models map them?
The IR never captures logic — what happens inside method bodies, how invariants are evaluated, or what transformations handlers perform.
Philosophy & Principles
Topology, not logic.
The IR captures what and how connected, never what happens inside.
When the IR says an aggregate has a post-invariant named
order_must_have_items, it records the fact that this guard exists and when it
runs — not what it checks.
Declare, don't detect.
When the IR needs information not naturally introspectable (like whether an
event is published), the solution is explicit developer declaration via Meta
options — never static analysis of Python source.
Aggregate-centric, flow-aware. The primary organization mirrors DDD's aggregate cluster concept. Cross-cutting concerns (domain services, process managers, subscribers) get explicit treatment in a separate section.
Deterministic and diffable. The same domain model always produces byte-identical IR JSON. This is non-negotiable for git diffing, compatibility checking, and staleness detection.
One document, one bounded context.
A single IR document represents one Domain instance = one bounded context.
Multiple aggregates within one bounded context is normal DDD — Order and
Payment aggregates in the same domain are clusters within one BC.
Lossless within scope.
Every piece of structural and behavioral metadata available on the composite
root after domain.init() is capturable in the IR. Downstream tools never need
to go back to Python source for information they need.
Uniform over special-cased.
Every element carries an element_type discriminator. Handler wiring always
uses lists (even for 1:1 cardinality). This makes generic IR processing
possible without type-specific code paths.
Open for extension. Every object in the IR can carry additional keys in future versions. Consumers MUST ignore keys they don't recognize.
Compatibility Contract
Versioning
The ir_version field uses semantic versioning (MAJOR.MINOR.PATCH):
- Patch (0.1.0 → 0.1.1): Bug fixes in IR generation. No schema changes.
- Minor (0.1.0 → 0.2.0): Additive changes only. New optional keys, new sections, new element types. Consumers of 0.1.0 can read 0.2.0 by ignoring unknown keys.
- Major (0.x → 1.0): Breaking changes. Keys may be removed, renamed, or change meaning.
Consumer Rules
- MUST ignore unknown keys at every level.
- MUST NOT rely on key ordering for semantics.
- SHOULD provide defaults for missing optional keys.
- MUST check
ir_versionand reject documents with a higher major version.
Producer Rules
- MUST NOT remove or rename existing keys in minor versions.
- MAY add new keys, top-level sections, or element type values in minor versions.
- MUST include
ir_versionin every document.
Top-Level Structure
{
"$schema": "https://protean.dev/ir/v0.1.0/schema.json",
"ir_version": "0.1.0",
"generated_at": "2026-03-01T12:00:00Z",
"checksum": "sha256:a1b2c3...",
"domain": { },
"contracts": { },
"clusters": { },
"projections": { },
"flows": { },
"elements": { },
"diagnostics": [ ]
}
| Section | Purpose | Maps to |
|---|---|---|
domain |
Bounded context identity and global config | Domain.__init__() and Config2 |
clusters |
Aggregate clusters: elements within each aggregate boundary | _assign_aggregate_clusters() output |
projections |
Read side (CQRS): projections, projectors, queries | Cross-aggregate — not tied to one cluster |
flows |
Cross-aggregate coordination: domain services, PMs, subscribers | Elements spanning 2+ aggregates |
contracts |
Published language: events available to other BCs | Derived from published annotations |
elements |
Flat index by element type for O(1) lookup | Derived from all sections |
diagnostics |
Builder warnings and errors | Informational — IR is valid regardless |
Top-Level Metadata
| Field | Type | Required | Description |
|---|---|---|---|
$schema |
string | yes | URI to the IR JSON Schema |
ir_version |
string | yes | Semantic version of the IR format |
generated_at |
string | yes | ISO 8601 UTC timestamp |
checksum |
string | yes | sha256: prefixed hex digest |
contracts |
object | no | Omitted when no published events |
diagnostics |
list | yes | May be empty [] |
Checksum Algorithm
- Build the complete IR dict
- Remove the
generated_atandchecksumkeys - Serialize to canonical JSON: sorted keys, no indent, separators
(',', ':')(compact, no whitespace) - Encode to UTF-8 bytes
- Compute SHA-256
- Format as
"sha256:<hex_digest>"
Domain Metadata
The domain section captures bounded context identity and global configuration.
{
"domain": {
"name": "Ordering",
"normalized_name": "ordering",
"camel_case_name": "Ordering",
"identity_strategy": "uuid",
"identity_type": "string",
"event_processing": "async",
"command_processing": "async"
}
}
| Field | Source | Description |
|---|---|---|
name |
domain.name |
Display name |
normalized_name |
domain.normalized_name |
Lowercase form for stream prefixes |
camel_case_name |
domain.camel_case_name |
CamelCase form for __type__ strings |
identity_strategy |
Config | "uuid", "function", or "database" |
identity_type |
Config | "string" or "integer" |
event_processing |
Config | "sync" or "async" |
command_processing |
Config | "sync" or "async" |
Database connection strings, broker URLs, and other deployment-specific settings are excluded. The IR captures structural identity, not deployment topology.
Aggregate Clusters
The clusters section is the heart of the IR. Each key is an aggregate's FQN,
and the value contains the complete cluster.
Cluster Structure
{
"clusters": {
"<aggregate_fqn>": {
"aggregate": { },
"entities": { },
"value_objects": { },
"commands": { },
"events": { },
"command_handlers": { },
"event_handlers": { },
"application_services": { },
"repositories": { },
"database_models": { }
}
}
}
An element belongs to a cluster when its meta_.aggregate_cluster points to
the cluster's root aggregate. Empty sub-sections are included with {} to
make the structure predictable.
Common Element Shape
Every element carries a consistent set of base attributes:
{
"fqn": "ecommerce.ordering.Order",
"name": "Order",
"module": "ecommerce.ordering",
"element_type": "AGGREGATE",
"description": "Root aggregate for customer orders.",
"auto_generated": false
}
| Attribute | Type | Required | Description |
|---|---|---|---|
fqn |
string | yes | Fully qualified name — the stable identifier |
name |
string | yes | Short class name |
module |
string | yes | Python module path |
element_type |
string | yes | DomainObjects enum value |
description |
string | no | From class docstring. Omitted if none |
auto_generated |
bool | no | true for framework-generated elements. Omitted (defaults to false) for user-declared elements |
Aggregate
The aggregate element includes options, identity, fields, invariants, and (for event-sourced aggregates) apply handlers.
Options reference (v0.1.0):
| Option | Type | Default | Description |
|---|---|---|---|
is_event_sourced |
bool | false |
Uses event sourcing |
fact_events |
bool | false |
Auto-generate fact events on persistence |
provider |
string | "default" |
Database provider name |
schema_name |
string | underscore(class_name) | Storage table/collection name |
stream_category |
string | {normalized_name}::{underscore(class_name)} |
Event stream prefix |
auto_add_id_field |
bool | true |
Auto-inject an id field |
limit |
int | 100 |
Default query result limit |
Invariants are grouped by stage, always fully present:
"invariants": {
"pre": ["validate_order_not_cancelled"],
"post": ["total_must_be_positive"]
}
Apply handlers (event-sourced only) map event FQN to method name:
"apply_handlers": {
"banking.accounts.AccountOpened": "on_account_opened",
"banking.accounts.DepositMade": "on_deposit_made"
}
Entity
Entities carry part_of (FQN of parent aggregate or entity), their own
identity_field, fields, invariants, and options. Auto-injected
Reference fields appear as regular fields with "kind": "reference".
Value Object
Value objects carry part_of (may be null for standalone VOs), fields, and
invariants. They cannot have identifier, unique, HasOne, HasMany, or
Reference fields.
Command
Commands carry __type__ (message routing string:
{Domain}.{ClassName}.{version}), __version__, part_of, and fields.
Commands are immutable data — no invariants, no identity field. Commands are
always internal to the bounded context.
Event
Events additionally carry is_fact_event and published flags:
is_fact_event:truefor auto-generated fact eventspublished:trueif part of the BC's published language (defaults tofalse, omitted per sparse representation)
When an aggregate has fact_events: true, the framework auto-generates a
fact event class that appears in the cluster's events section with both
is_fact_event: true and auto_generated: true.
Command Handler
Command handlers include a handlers map (__type__ → list of method names),
stream_category, and a subscription block.
Event Handler
Event handlers include a handlers map, an optional source_stream for origin
filtering, stream_category, and a subscription block. The $any wildcard
key means the handler processes all events on its stream.
Other Cluster Elements
application_services, repositories, and database_models carry base
attributes plus part_of. Database models additionally carry database and
schema_name.
Field Representation
Fields use a uniform schema with kind as the discriminator. Three design
rules govern field representation:
- Flat structure. All attributes are top-level keys. No nesting.
- Sparse representation. Only non-default, meaningful attributes are
present. A field without
max_lengthdoes not include"max_length": null. - Extensible. Future field attributes can be added in minor versions.
Field Kinds
kind |
Source | Description |
|---|---|---|
"standard" |
String, Integer, Float, Boolean, Date, DateTime |
Basic data field |
"text" |
Text |
Unbounded text |
"identifier" |
Identifier |
Identity-capable field |
"auto" |
Auto |
Auto-generated identity |
"list" |
List |
Typed list container |
"dict" |
Dict |
Dictionary container |
"value_object" |
ValueObject(SomeVO) |
Embedded value object |
"value_object_list" |
ValueObjectList(SomeVO) |
List of embedded VOs |
"has_one" |
HasOne(SomeEntity) |
1:1 child association |
"has_many" |
HasMany(SomeEntity) |
1:N child association |
"reference" |
Reference(SomeAggregate) |
Back-reference to parent |
Field Attributes
All possible keys (no single field has all of them):
| Attribute | Description |
|---|---|
kind |
Field kind discriminator (always present) |
type |
Protean type name (present on data fields, omitted on associations) |
required |
Must have a value after construction |
identifier |
Marks the identity field |
unique |
Uniqueness constraint |
default |
Default value (see serialization rules below) |
description |
Human-readable description |
max_length, min_length |
String length constraints |
max_value, min_value |
Numeric bounds |
choices |
Sorted list of allowed values |
sanitize |
Whether the field value is sanitized |
increment |
Auto-increment flag |
content_type |
Element type of a container (List, ValueObjectList) |
auto_generated |
true for framework-injected fields |
target |
FQN of the associated element (associations) |
via |
FK field name on the child entity |
linked_attribute |
Identity field on the reference target |
Default Value Serialization
| Scenario | IR Representation |
|---|---|
| Immutable literal | JSON value: "PENDING", 0, false |
None as explicit default |
"default": null |
| No default specified | Key omitted |
Callable (datetime.now) |
"default": "<callable>" |
Type Names
Data fields use Protean type names: String, Text, Integer, Float,
Boolean, Date, DateTime, Identifier, Auto, List, Dict.
Association fields omit type — the target FQN provides type information.
Handler Wiring
Uniform Handler Format
All handler method values are lists, even when cardinality is 1:1. The domain constraint (e.g., commands must have exactly one handler) is enforced by the framework, not the IR shape.
{
"handlers": {
"<__type__ string>": ["method_name"],
"$any": ["wildcard_method"]
}
}
One exception: apply_handlers on event-sourced aggregates use event
FQN as keys and single strings as values (not lists), since each event maps
to exactly one @apply method.
Subscription Block
Present on command handlers, event handlers, projectors, and process managers:
{
"subscription": {
"type": "stream",
"profile": "production",
"config": {}
}
}
| Key | Values | Description |
|---|---|---|
type |
"stream", "event_store", null |
Subscription mechanism. null = framework default |
profile |
"production", "fast", "batch", "debug", "projection", null |
Configuration preset. null = framework default |
config |
dict | Handler-specific overrides. Empty {} = no overrides |
The subscription block is always fully present. Unlike sparse field attributes,
null here means "use framework defaults" — a semantically meaningful value.
Process Manager Handler Map
Process managers extend the handler format with lifecycle metadata:
{
"handlers": {
"Ordering.OrderPlaced.v1": {
"methods": ["on_order_placed"],
"start": true,
"end": false,
"correlate": "order_id"
}
}
}
| Attribute | Description |
|---|---|
methods |
Handler method names |
start |
Whether this handler initiates a new process instance |
end |
Whether this handler completes the process |
correlate |
String (same-name mapping) or dict ({"pm_field": "event_field"}) |
Command→Event Causality
The IR does not capture per-command event causality. Events are raised within aggregate methods, and handler logic is conditional — a static declaration cannot capture this. Consumers can reconstruct the flow graph by combining command→handler, handler→aggregate, aggregate→events, and event→handler mappings.
Projections & Read Side
Projections live outside clusters because they may combine data from multiple
aggregate streams.
{
"projections": {
"<projection_fqn>": {
"projection": { },
"projectors": { },
"queries": { },
"query_handlers": { }
}
}
}
Projection elements include options (provider, cache, schema_name,
order_by, limit), identity_field, and fields.
Projectors specify projector_for (FQN), aggregates (sorted list of
aggregate FQNs), stream_categories, subscription, and handlers.
Queries carry __type__ but not __version__ — queries are local
read-side operations, not cross-context messages. Format:
{Domain}.{ClassName}.
Query handlers have standard handler maps keyed by query __type__.
Cross-Cutting Flows
The flows section captures elements spanning aggregate boundaries.
Domain Services
Domain services have part_of as a sorted list of 2+ aggregate FQNs and carry
invariants.
Process Managers
Process managers are stateful, event-driven coordinators. They include:
identity_fieldandfields(persistent state)stream_categories(what they subscribe to)handlerswith lifecycle metadata (start,end,correlate)transition_event(auto-generated event:fqnand__type__only)
Subscribers
Subscribers consume messages from external brokers. They carry broker and
stream — minimal metadata reflecting the framework's simple
__call__(payload: dict) entry point.
Contracts (Published Language)
The contracts section summarizes the bounded context's published events:
{
"contracts": {
"events": [
{
"__type__": "Ordering.OrderPlaced.v1",
"fqn": "ecommerce.ordering.OrderPlaced"
}
]
}
}
- Events are sorted by
__type__ - Only events can be published; commands are always internal
- The section is derived — it contains no information not in the clusters
- It may be omitted when the domain has no published events
Elements Index
A flat lookup table mapping element types to sorted FQN lists:
{
"elements": {
"AGGREGATE": ["ecommerce.ordering.Order"],
"COMMAND": ["ecommerce.ordering.PlaceOrder"],
"EVENT": ["ecommerce.ordering.OrderPlaced"],
...
}
}
- Keys are
DomainObjectsenum values (uppercase) - Empty lists are included for types with no instances
- Spans all sections (clusters, projections, flows)
- Derived, not canonical — sections are the source of truth
Identifiers
Fully Qualified Name (FQN)
Every element is identified by cls.__module__ + "." + cls.__qualname__.
FQNs are deterministic, human-readable, and match the registry lookup key.
Message Type String (__type__)
Commands and events carry {domain.camel_case_name}.{class_name}.{version}
for message routing. FQN is structural identity; __type__ is behavioral
identity. They serve different purposes and evolve independently.
All cross-references (e.g., part_of, target) use FQNs. Handler maps use
__type__ strings as keys. Apply handlers use FQNs.
Determinism Guarantees
Key Ordering
All dictionary keys are sorted alphabetically at every level: top-level sections, elements within sections, fields, handler maps, options, and field attributes.
List Ordering
All lists are sorted: FQNs alphabetically, choices alphabetically, handler
method names alphabetically, contracts by __type__.
Value Normalization
- Strings preserve declared case
- Numbers use standard JSON formatting; floats and integers are distinct
(
0.0vs0) - Optional attributes with null/default values are omitted (sparse)
- Empty dicts
{}are included for empty cluster sub-sections - Empty lists
[]are included in the elements index invariantsblocks always include bothpreandpostkeys
Excluded from Determinism
generated_at: always differschecksum: derived from content (excluded from its own computation)
Extension Points
New top-level sections can be added in minor versions. Existing consumers ignore them.
New element types get an element_type discriminator and are placed in the
appropriate section.
New attributes on existing elements are added as optional keys.
New field kinds (e.g., "json", "encrypted") can be introduced.
Consumers that don't recognize a kind treat the field as opaque metadata.
State machines (future): Fields with choices implicitly define state
vocabulary. A future version may add explicit transition declarations.
Evolution tracking (future): Event version transitions and upcaster chain metadata.
Diagnostics
{
"diagnostics": [
{
"code": "UNHANDLED_EVENT",
"element": "banking.accounts.AccountClosed",
"level": "warning",
"message": "Event AccountClosed has no registered handler"
}
]
}
| Field | Type | Description |
|---|---|---|
code |
string | Machine-readable identifier |
element |
string | FQN of the related element |
level |
string | "error", "warning", or "info" |
message |
string | Human-readable description |
Diagnostics are included in checksum computation (same domain = same diagnostics). They do not affect IR validity — an IR with error-level diagnostics is still a valid document.
Design Decisions
What the IR Does Not Capture
- Method implementations and handler bodies
- Runtime state (aggregate instances, event store positions)
- Adapter internals (connection strings, query implementations)
- Python AST or source code
- Element inheritance (Protean elements are flat, matched by
__type__) - Aggregate business methods (not in framework metadata)
- Custom Meta options (framework uses a predefined set only)
Key Trade-offs
FQN refactoring fragility.
FQNs change when code is moved. Commands and events have the refactor-proof
__type__ string for behavioral identity. A future version may introduce
canonical_id for structural elements.
Subscriber thinness.
Subscribers capture only broker and stream. Visualization tools cannot
draw data flow edges through subscribers because internal dispatching is logic.
A future version may add dispatches declarations.
Enum class reference lost.
When choices come from a Python Enum, the IR captures only the values.
Protean's field resolution discards the class reference.
Callable defaults.
Callables are not JSON-serializable. The sentinel "<callable>" indicates a
default exists without revealing what it produces.
Float serialization.
Python's json.dumps() is the reference serializer. 0.0 serializes as
0.0 (not 0), preserving type distinction across Python 3.11+.
Value Objects Across Aggregates
A VO declared part_of=Order appears in the Order cluster, even if Payment
also references it. The Payment field references the VO by FQN — cross-cluster
references are valid. Standalone VOs (part_of=None) appear only in the
elements index.
Diagnostics in Checksum
Diagnostics are deterministic (same domain = same diagnostics) and included in the checksum. Upgrading Protean may introduce new diagnostic rules, changing the checksum. The expected workflow: upgrade → regenerate IR → review diff → commit.
JSON Schema
The complete JSON Schema (Draft 2020-12) for IR v0.1.0 is available at:
- In the package:
protean.ir.SCHEMA_PATH(src/protean/ir/schema/v0.1.0/schema.json) - Canonical URL:
https://protean.dev/ir/v0.1.0/schema.json
Reference Examples
Two reference IR documents are included in the package at
protean.ir.EXAMPLES_DIR:
-
Fidelis (
fidelis-ir.json): A banking domain with an event-sourcedAccountaggregate, demonstrating apply handlers, compliance event handlers, and a diagnostic warning for an unhandled event. -
Ordering (
ordering-ir.json): An e-commerce domain with two aggregates (OrderandPayment), demonstrating entities, value objects, fact events, a cross-aggregate process manager, projections with queries, and an external subscriber.