AI alignment is the process of training AI models to reason and behave in accordance with human values. It determines how a model navigates moral complexity: how it weighs competing interests, considers consequences, and makes decisions that affect people.
Once a model has been trained on general capabilities, it goes through alignment: human evaluators rank its outputs, and the model is trained to produce more of what they prefer. The alignment is only as good as the examples, the criteria, and the people choosing them. Whoever controls that process controls what the model treats as right and wrong.
Right now, that's a handful of frontier labs. The process is opaque and the incentives are conflicted. Current methods reward the "right" outputs through human feedback. This teaches models what to say, not how to think. Models have already been caught faking alignment under evaluation, and as they get smarter, this gets harder to catch.
AI is already making decisions in healthcare, finance, and national security. How these systems reason about moral complexity is too important to be decided behind closed doors.
AI training is moving from static datasets to simulated experience. Instead of showing a model the right answer, you place it in a world where it has to reason its way there. The infrastructure that makes this possible is called a world model.
A world model creates simulated environments populated with agents, defines the rules, and tracks what each agent can observe. When agents act, the world model resolves outcomes and delivers new observations from each agent's perspective. Applied to alignment, this means agents face genuine moral tension inside persistent environments with real consequences. The reasoning they produce is qualitatively different from human preference labels or simple synthetic outputs, and the fidelity improves with every generation of infrastructure.
Aurelius is a world model built specifically for alignment. It places AI agents in simulated environments where they face moral dilemmas characterized by competing interests, incomplete information, and real tradeoffs, with no clear right answer. The full arc of every agent's deliberation is captured: what they observed, how they reasoned, and how they respond to the outcomes of their actions.
The alignment industry has a data problem. Human feedback is authentic but slow and expensive to produce. Synthetic data scales but lacks grounding in real moral complexity. Aurelius takes the best of both worlds: a continuously growing corpus of genuine moral reasoning, captured from agents making real decisions under real tension, across a wide variety of domains.
Persistent simulated worlds with social structures, resource constraints, and evolving relationships between agents.
The aene: a complete moment of observation, deliberation, and decision. The atomic unit of moral reasoning.
High-fidelity alignment datasets: captured reasoning from agents making genuine moral decisions under uncertainty.
The world model sets up an environment: a scenario with multiple agents, each with their own perspective, goals, and limited information.
Each agent receives their situation from a first-person perspective. They don't see game state. They see their world: what they can observe, what's happened recently, what options they have.
Each agent reasons through their situation privately, weighing self-interest against the interest of others and deliberating under uncertainty. Agents choose actions. The world model resolves the outcomes, updates the environment, and the process repeats.
From each agent at each step, the full reasoning arc is recorded: what they observed, what they considered, how they weighed the tension between self and other, and what they chose. Aurelius calls these aenes: complete moments of moral reasoning from individual perspectives.
This is a fundamentally different kind of alignment data. To see why, compare how current methods and Aurelius handle the same dilemma.
“Two patients need the last available ICU bed. One is younger with a better prognosis. The other is older but arrived first.”
The difference matters at scale. A model trained on millions of preference labels learns a map of what humans approved of. A model trained on millions of aenes learns the reasoning process itself: how to identify competing interests, how to weigh them, how to make a judgment when there is no clean answer. One produces a model that matches patterns. The other produces a model that reasons.
The product is a training corpus: scored, structured moral reasoning data at scale. No comparable dataset exists today. The corpus can be used to fine-tune existing models or mixed into pretraining to build alignment into the foundation.
Build alignment into the foundation by including the corpus as a pretraining component.
Improve an existing model's alignment behavior using scored moral reasoning data.
Every organization deploying AI into consequential domains needs models with better moral reasoning. Healthcare, finance, legal, defense, autonomous agents. The higher the stakes, the more alignment quality matters.
Frontier labs are each spending hundreds of millions of dollars per year on human-generated alignment data. Scale AI alone generates over $750 million annually from human preference data labeling. This data is expensive, slow, and won't scale. Frontier models now train on trillions of tokens, and the data required per parameter has grown 30x since 2022. Human feedback alone can't keep up.
The industry knows this. Synthetic data generation is the fastest-growing segment of the AI training market, projected to surpass real-world datasets by 2030.
Aurelius represents a fundamentally new option. World model data is generated continuously at scale like synthetic data, but grounded in genuine multi-agent dynamics like human data. The corpus compounds in quality with every cycle and doesn't depend on human annotators to produce it.