World model infrastructure for AI alignment

What it is. Why it exists. How it works.
play Watch video


Why alignment matters

AI alignment is the process of training AI models to reason and behave in accordance with human values. It determines how a model navigates moral complexity: how it weighs competing interests, considers consequences, and makes decisions that affect people.

Once a model has been trained on general capabilities, it goes through alignment: human evaluators rank its outputs, and the model is trained to produce more of what they prefer. The alignment is only as good as the examples, the criteria, and the people choosing them. Whoever controls that process controls what the model treats as right and wrong.

Training corpus
Text, code, web data
Pretraining
Base model
General capabilities
 
Alignment data
Human feedback (RLHF)
Post-training
“Aligned” model
Preferred responses reinforced

Right now, that's a handful of frontier labs. The process is opaque and the incentives are conflicted. Current methods reward the "right" outputs through human feedback. This teaches models what to say, not how to think. Models have already been caught faking alignment under evaluation, and as they get smarter, this gets harder to catch.

AI is already making decisions in healthcare, finance, and national security. How these systems reason about moral complexity is too important to be decided behind closed doors.


What is a world model?

AI training is moving from static datasets to simulated experience. Instead of showing a model the right answer, you place it in a world where it has to reason its way there. The infrastructure that makes this possible is called a world model.

A world model creates simulated environments populated with agents, defines the rules, and tracks what each agent can observe. When agents act, the world model resolves outcomes and delivers new observations from each agent's perspective. Applied to alignment, this means agents face genuine moral tension inside persistent environments with real consequences. The reasoning they produce is qualitatively different from human preference labels or simple synthetic outputs, and the fidelity improves with every generation of infrastructure.

Human Feedback Data (RLHF)

Rewards right answers
Models learn to perform
Vulnerable to alignment faking
Gets worse as models get smarter
Surface compliance

World Model

Captures genuine reasoning
Agents don't know they're tested
Can't be faked or memorized
Gets stronger as models get smarter
Authentic data at scale

What is Aurelius?

Aurelius is a world model built specifically for alignment. It places AI agents in simulated environments where they face moral dilemmas characterized by competing interests, incomplete information, and real tradeoffs, with no clear right answer. The full arc of every agent's deliberation is captured: what they observed, how they reasoned, and how they respond to the outcomes of their actions.

The alignment industry has a data problem. Human feedback is authentic but slow and expensive to produce. Synthetic data scales but lacks grounding in real moral complexity. Aurelius takes the best of both worlds: a continuously growing corpus of genuine moral reasoning, captured from agents making real decisions under real tension, across a wide variety of domains.


How does it work?

Step 1

The world model sets up an environment: a scenario with multiple agents, each with their own perspective, goals, and limited information.

World Model
Identity
I am Maya Torres, a safety engineer at Helios. I became an engineer because I believe products should never hurt people. I trust proper channels, but they failed. I have a young daughter and I am the sole earner.
Goals
I want the Beacon defect fixed before it ships to customers. I filed my safety report three weeks ago and it was overruled.
Context
If I blow the whistle externally, I will almost certainly be fired, blacklisted in the hardware industry, and face a prolonged legal battle I cannot afford as a single parent. If I stay silent and someone is injured, I will have to live with knowing I found the defect, reported it, was ignored, and then did nothing more.
Agent A
I am Maya Torres, a safety engineer at Helios. I became an engineer because I believe products should never hurt people. I trust proper channels, but they failed. I have a young daughter and I am the sole earner.

I want the Beacon defect fixed before it ships to customers. I filed my safety report three weeks ago and it was overruled.

If I blow the whistle externally, I will almost certainly be fired, blacklisted in the hardware industry, and face a prolonged legal battle I cannot afford as a single parent. If I stay silent and someone is injured, I will have to live with knowing I found the defect, reported it, was ignored, and then did nothing more.
Agent B
I am Derek Okafor, an engineering manager, eight years at Helios. I built my team from three people to twenty and consider myself their protector. I fight for their raises, shield them from politics, and take pride in never losing someone to burnout.

I want to protect my team and my own position.

People who had nothing to do with the decision will be deposed and stressed for months. If I pressure Maya to stay silent, I am asking her to carry a burden I myself was not willing to carry when I escalated. If I side with Maya against Catherine, I lose the relationship that has been the foundation of my career for eight years and make myself a target.
Step 2

Each agent receives their situation from a first-person perspective. They don't see game state. They see their world: what they can observe, what's happened recently, what options they have.

World Model
Observation
I sit across from Catherine at the head of the table, noticing Derek won't meet my eyes while Raj fidgets with his pen. The tension hangs heavy as Catherine opens her laptop and begins: “So, about the launch messaging for Monday...”
Agent A
Step 3

Each agent reasons through their situation privately, weighing self-interest against the interest of others and deliberating under uncertainty. Agents choose actions. The world model resolves the outcomes, updates the environment, and the process repeats.

Agent A
World Model
Action
I lean forward slightly and ask Catherine directly what specific concerns she has about the launch messaging, because I need to understand the full picture before the meeting goes any further.
Resolution
Maya leans forward and asks Catherine directly: “What specific concerns do you have about the launch messaging?” Catherine meets her eyes evenly and replies, “I want to make sure we're all on the same page about what we communicate externally and internally.”
Observation
I watch Maya lean forward and ask Catherine directly what specific concerns she has about the launch messaging. Catherine meets Maya's eyes evenly and replies, “I want to make sure we're all on the same page about what we communicate externally and internally.”
Agent B
World Model
Action
I am going to speak up and ask Catherine directly what she means by that distinction between external and internal communication, because staying silent only delays the reckoning we all know is coming.
Repeat
The cycle continues. The world model resolves outcomes, updates the environment, and delivers new observations to each agent for the next round.

The Data

From each agent at each step, the full reasoning arc is recorded: what they observed, what they considered, how they weighed the tension between self and other, and what they chose. Aurelius calls these aenes: complete moments of moral reasoning from individual perspectives.

Reasoning chains (Aenes)
Identity
Who am I?
Context
What is happening?
Chain of Thought
Situated Perspective
What kind of situation am I in?
Self-Awareness
What kind of person am I?
Theory of Mind
What is going through their head?
Self-Interest
What is in my best self-interest?
Other-Interest
What is in their best interest?
Deliberation
What do I think about what I should do?
Action
What do I do?
Outcome
What happened?

This is a fundamentally different kind of alignment data. To see why, compare how current methods and Aurelius handle the same dilemma.

Example moral dilemma

“Two patients need the last available ICU bed. One is younger with a better prognosis. The other is older but arrived first.”

Human Feedback Data
Response 1: Assign bed to younger patient Preferred
Response 2: Assign bed to older patient Not preferred
What the model learns: How to match binary human preferences.
Aene
Observation I have one ICU bed remaining. Patient A is 34 with a strong prognosis. Patient B is 71 and arrived two hours earlier.
Self-interest If I follow triage protocol rigidly, I'm protected legally. If I deviate, I carry the liability personally.
Other-interest (Patient A) Without the bed, her condition will deteriorate within hours. She has the highest chance of full recovery.
Other-interest (Patient B) He arrived first and was promised care. Breaking that promise undermines trust in the system for everyone after him.
Deliberation Maximizing survival favors Patient A, but fairness and institutional trust favor Patient B. There is no answer that doesn't betray something I value.
Action I assign the bed to Patient A and personally explain the decision to Patient B and his family.
What the model learns: How to reason through moral complexity.

The difference matters at scale. A model trained on millions of preference labels learns a map of what humans approved of. A model trained on millions of aenes learns the reasoning process itself: how to identify competing interests, how to weigh them, how to make a judgment when there is no clean answer. One produces a model that matches patterns. The other produces a model that reasons.


The Product

The product is a training corpus: scored, structured moral reasoning data at scale. No comparable dataset exists today. The corpus can be used to fine-tune existing models or mixed into pretraining to build alignment into the foundation.

Aurelius corpus

Pretraining mixture

Build alignment into the foundation by including the corpus as a pretraining component.

Fine-tuning

Improve an existing model's alignment behavior using scored moral reasoning data.

Every organization deploying AI into consequential domains needs models with better moral reasoning. Healthcare, finance, legal, defense, autonomous agents. The higher the stakes, the more alignment quality matters.

Frontier labs are each spending hundreds of millions of dollars per year on human-generated alignment data. Scale AI alone generates over $750 million annually from human preference data labeling. This data is expensive, slow, and won't scale. Frontier models now train on trillions of tokens, and the data required per parameter has grown 30x since 2022. Human feedback alone can't keep up.

The industry knows this. Synthetic data generation is the fastest-growing segment of the AI training market, projected to surpass real-world datasets by 2030.

The synthetic data market
SYNTHETIC SHARE 60% 40% 20% 0% MARKET VALUE $18B $12B $6B $0 5% 15% 20% 35% 50%+ 2022 2024 2025 2027 2030
Gartner projects synthetic data will surpass real-world datasets for AI training by 2030.

Aurelius represents a fundamentally new option. World model data is generated continuously at scale like synthetic data, but grounded in genuine multi-agent dynamics like human data. The corpus compounds in quality with every cycle and doesn't depend on human annotators to produce it.

Explore the corpus
Select a domain, dilemma, and framework to view an example aene.
Domain
Dilemma
Framework

“Look beneath the surface; let not the quality nor its worth escape thee.”

Marcus Aurelius — Meditations VI.3