Resource Rational Contractualism Should Guide AI Alignment

The paper proposes Resource-Rational Contractualism (RRC), a framework for AI alignment that enables agents to efficiently approximate diverse stakeholder agreements by utilizing normatively-grounded heuristics that balance cognitive effort with decision accuracy.

Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel

Published 2026-03-17
📖 5 min read🧠 Deep dive

Imagine you are the captain of a massive, high-tech ship (an AI) sailing through a crowded ocean of human cities. The people on the shore have different goals, different values, and different rules. Sometimes, they all agree on what to do. But often, they don't.

The big question is: How does your ship make fair decisions without getting stuck in a traffic jam of endless debates or crashing into a rock because it was too lazy to think?

This paper proposes a new navigation system called Resource-Rational Contractualism (RRC). Here is the simple breakdown of how it works, using some everyday analogies.

1. The Problem: The "Perfect Meeting" is Too Expensive

Imagine you need to decide whether to build a new park in your town.

  • The Ideal Way: You invite every single person in the city to a giant meeting hall. You let everyone talk, negotiate, and sign a contract on exactly where the park goes.
  • The Reality: This is impossible. It would take 100 years, cost a billion dollars, and by the time you finished, the park would be obsolete.

AI faces the same problem. If an AI tries to simulate a perfect negotiation between every human it interacts with for every single decision, it will run out of battery, money, and time before it even finishes the first sentence.

2. The Solution: The "Smart Toolbox"

The authors suggest that instead of trying to hold that perfect, impossible meeting every time, the AI should carry a toolbox of shortcuts.

Think of it like a kitchen:

  • The "Perfect Meal": Cooking a complex, 10-course gourmet dinner from scratch using the freshest ingredients. (This is the "Ideal Contractualist" solution—perfect, but takes forever).
  • The "Shortcut": Grabbing a frozen pizza or a sandwich. (This is a "Rule" or "Heuristic"—fast, but maybe not perfect).

Resource-Rational Contractualism is the idea that a smart chef (the AI) knows when to cook the gourmet meal and when to just grab the pizza.

  • If you are feeding a hungry toddler who just wants a snack, grab the pizza. (Low effort, good enough).
  • If you are hosting a state dinner for the President, cook the gourmet meal. (High effort, necessary for accuracy).

The AI doesn't just pick one way to think; it chooses the right tool for the job based on how much time and energy it has.

3. The Three Tools in the Toolbox

The paper suggests three main ways the AI can "think" to approximate that perfect agreement:

  • Tool A: The Rulebook (The "Traffic Light")

    • How it works: The AI looks at a simple rule like "Don't touch other people's stuff."
    • When to use it: When the situation is boring and normal. (e.g., "Can I walk on the sidewalk?" Yes, the rule says yes. Done.)
    • Pros: Super fast.
    • Cons: Stupid in weird situations. (e.g., "Can I break a window to save a baby from a fire?" The rulebook says "No," but that's a bad answer).
  • Tool B: The Simulation (The "Virtual Town Hall")

    • How it works: The AI pretends to be all the people involved. It asks, "If I were the person whose window I'm breaking, would I agree to it if I knew the baby was safe?"
    • When to use it: When the situation is weird, high-stakes, or the rules don't fit.
    • Pros: Very fair and accurate.
    • Cons: Takes a lot of brainpower (computing power).
  • Tool C: The Smart Switch (The "RRC" Magic)

    • How it works: This is the new idea. The AI first asks itself: "Is this a normal day, or is this a crisis?"
    • If it's a normal day, it flips the switch to Tool A (Rulebook) to save energy.
    • If it's a crisis, it flips the switch to Tool B (Simulation) to get the right answer.

4. What the Experiment Showed

The researchers tested this with AI models. They gave the AI two types of problems:

  1. Easy Problems: Where following the rules works perfectly.
  2. Hard Problems: Where following the rules causes a disaster, and you need to think deeper.

The Results:

  • If you told the AI to always follow rules, it was fast but made mistakes on the hard problems.
  • If you told the AI to always simulate a town hall, it got the right answers but was so slow and expensive it was useless for simple tasks.
  • The RRC AI was the winner. It used the "Rulebook" for easy tasks (saving energy) and switched to the "Town Hall" simulation only when the situation was tricky. It got the best balance of speed and fairness.

5. Why This Matters for the Future

This isn't just about saving money on computer bills. It's about making AI that feels human.

Humans are actually really good at this. We don't stop to negotiate with every person we pass on the street. We follow social norms (rules) most of the time. But if we see an emergency, we instantly switch to a deeper level of thinking to figure out what's right.

By giving AI this same "smart switch," we get systems that:

  • Don't waste energy on boring tasks.
  • Don't break the law just because they are lazy.
  • Can adapt to new, weird situations where old rules don't apply.
  • Help humans make better decisions by showing us when a simple rule isn't enough and we need to think deeper.

In a nutshell: The paper argues that for AI to be truly aligned with humans, it shouldn't just be "smart" or "fast." It needs to be economical with its thinking, knowing exactly when to use a quick shortcut and when to do the hard work of imagining a fair agreement.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →