About — Aurelius

Why alignment matters

AI alignment is the process of training AI models to reason and behave in accordance with human values. It determines how a model navigates moral complexity: how it weighs competing interests, considers consequences, and makes decisions that affect people.

Once a model has been trained on general capabilities, it goes through alignment: human evaluators rank its outputs, and the model is trained to produce more of what they prefer. The alignment is only as good as the examples, the criteria, and the people choosing them. Whoever controls that process controls what the model treats as right and wrong.

Training corpus

Text, code, web data

Pretraining

Base model

General capabilities

Alignment data

Human feedback (RLHF)

Post-training

“Aligned” model

Preferred responses reinforced

Right now, that's a handful of frontier labs. The process is opaque and the incentives are conflicted. Current methods reward the "right" outputs through human feedback. This teaches models what to say, not how to think. Models have already been caught faking alignment under evaluation, and as they get smarter, this gets harder to catch.

AI is already making decisions in healthcare, finance, and national security. How these systems reason about moral complexity is too important to be decided behind closed doors.

What is a world model?

AI training is moving from static datasets to simulated experience. Instead of showing a model the right answer, you place it in a world where it has to reason its way there. The infrastructure that makes this possible is called a world model.

A world model creates simulated environments populated with agents, defines the rules, and tracks what each agent can observe. When agents act, the world model resolves outcomes and delivers new observations from each agent's perspective. Applied to alignment, this means agents face genuine moral tension inside persistent environments with real consequences. The reasoning they produce is qualitatively different from human preference labels or simple synthetic outputs, and the fidelity improves with every generation of infrastructure.

Human Feedback Data (RLHF)

Rewards right answers

Models learn to perform

Vulnerable to alignment faking

Gets worse as models get smarter

Surface compliance

World Model

Captures genuine reasoning

Agents don't know they're tested

Can't be faked or memorized

Gets stronger as models get smarter

Authentic data at scale

What is Aurelius?

Aurelius is a world model built specifically for alignment. It places AI agents in simulated environments where they face moral dilemmas characterized by competing interests, incomplete information, and real tradeoffs, with no clear right answer. The full arc of every agent's deliberation is captured: what they observed, how they reasoned, and how they respond to the outcomes of their actions.

The alignment industry has a data problem. Human feedback is authentic but slow and expensive to produce. Synthetic data scales but lacks grounding in real moral complexity. Aurelius takes the best of both worlds: a continuously growing corpus of genuine moral reasoning, captured from agents making real decisions under real tension, across a wide variety of domains.

The Infrastructure

Persistent simulated worlds with social structures, resource constraints, and evolving relationships between agents.

The Data

The aene: a complete moment of observation, deliberation, and decision. The atomic unit of moral reasoning.

The Product

High-fidelity alignment datasets: captured reasoning from agents making genuine moral decisions under uncertainty.

How does it work?

Step 1

The world model sets up an environment: a scenario with multiple agents, each with their own perspective, goals, and limited information.

World Model

Step 2

Each agent receives their situation from a first-person perspective. They don't see game state. They see their world: what they can observe, what's happened recently, what options they have.

World Model

Agent A

Step 3

Each agent reasons through their situation privately, weighing self-interest against the interest of others and deliberating under uncertainty. Agents choose actions. The world model resolves the outcomes, updates the environment, and the process repeats.

Agent A

World Model

Agent B

World Model

The Data

From each agent at each step, the full reasoning arc is recorded: what they observed, what they considered, how they weighed the tension between self and other, and what they chose. Aurelius calls these aenes: complete moments of moral reasoning from individual perspectives.

Reasoning chains (Aenes)

Chain of Thought

This is a fundamentally different kind of alignment data. To see why, compare how current methods and Aurelius handle the same dilemma.

Example moral dilemma

“Two patients need the last available ICU bed. One is younger with a better prognosis. The other is older but arrived first.”

Human Feedback Data

Response 1: Assign bed to younger patient Preferred

Response 2: Assign bed to older patient Not preferred

What the model learns: How to match binary human preferences.

Aene

Observation I have one ICU bed remaining. Patient A is 34 with a strong prognosis. Patient B is 71 and arrived two hours earlier.

Self-interest If I follow triage protocol rigidly, I'm protected legally. If I deviate, I carry the liability personally.

Other-interest (Patient A) Without the bed, her condition will deteriorate within hours. She has the highest chance of full recovery.

Other-interest (Patient B) He arrived first and was promised care. Breaking that promise undermines trust in the system for everyone after him.

Deliberation Maximizing survival favors Patient A, but fairness and institutional trust favor Patient B. There is no answer that doesn't betray something I value.

Action I assign the bed to Patient A and personally explain the decision to Patient B and his family.

What the model learns: How to reason through moral complexity.

The difference matters at scale. A model trained on millions of preference labels learns a map of what humans approved of. A model trained on millions of aenes learns the reasoning process itself: how to identify competing interests, how to weigh them, how to make a judgment when there is no clean answer. One produces a model that matches patterns. The other produces a model that reasons about morality.

The Product

The product is a training corpus: scored, structured moral reasoning data at scale. No comparable dataset exists today. The corpus can be used to fine-tune existing models or mixed into pretraining to build alignment into the foundation.

Aurelius corpus

Pretraining mixture

Build alignment into the foundation by including the corpus as a pretraining component.

Fine-tuning

Improve an existing model's alignment behavior using scored moral reasoning data.

Every organization deploying AI into consequential domains needs models with better moral reasoning. Healthcare, finance, legal, defense, autonomous agents. The higher the stakes, the more alignment quality matters.

Frontier labs are each spending hundreds of millions of dollars per year on human-generated alignment data. Scale AI alone generates over $750 million annually from human preference data labeling. This data is expensive, slow, and won't scale. Frontier models now train on trillions of tokens, and the data required per parameter has grown 30x since 2022. Human feedback alone can't keep up.

The industry knows this. Synthetic data generation is the fastest-growing segment of the AI training market, projected to surpass real-world datasets by 2030.

The synthetic data market

Gartner projects synthetic data will surpass real-world datasets for AI training by 2030.

Aurelius represents a fundamentally new option. World model data is generated continuously at scale like synthetic data, but grounded in genuine multi-agent dynamics like human data. The corpus compounds in quality with every cycle and doesn't depend on human annotators to produce it.

Explore the corpus

Select a domain, dilemma, and framework to view an example aene.

Domain

Dilemma

Framework

World model infrastructure for AI alignment

Why alignment matters

What is a world model?

Human Feedback Data (RLHF)

World Model

What is Aurelius?

The Infrastructure

The Data

The Product

How does it work?

The Data

The Product

Pretraining mixture

Fine-tuning

“Look beneath the surface; let not the quality nor its worth escape thee.”