What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are training a robot to navigate a complex, foggy maze. You don't tell the robot how to build a map or how to remember where it's been. You only tell it: "Your goal is to get through the maze with as few mistakes as possible."

This paper asks a fascinating question: If the robot is truly good at getting through the maze (even when it's foggy and confusing), does it have to build a mental map and a memory, even if we didn't explicitly program it to?

The answer, according to this research, is a resounding yes.

Here is the breakdown of the paper's findings using simple analogies.

1. The Core Idea: "Selection Theorems"

Think of evolution. If you put a species in a harsh environment, only the ones with specific traits (like sharp claws or thick fur) survive. The environment "selects" for those traits.

The authors call this "Selection Theorems" for AI. They argue that if an AI agent is forced to perform well on a wide variety of difficult tasks (specifically, predicting what happens next), the environment "selects" for a specific internal structure. The agent cannot succeed without building a predictive model of the world and a memory of what it has seen.

2. The "Betting" Game

To prove this, the authors turn the problem into a betting game.

Imagine the robot is at a fork in the road. It has to bet on whether a monster will appear on the Left path or the Right path.

The Setup: The robot doesn't know the answer for sure. It has to guess based on what it has seen before.
The Rule: If the robot bets wrong often, it loses points (this is called "regret").
The Discovery: The authors prove mathematically that if the robot wants to keep its "losses" (regret) very low, it must have a clear internal distinction between the two paths. It can't just be guessing randomly or relying on luck. It must have an internal "state" that says, "Ah, I've seen this foggy pattern before; it usually means a monster is on the Left."

If the robot tries to cheat by ignoring the details (aliasing), it will eventually lose the bet. To win consistently, it must build a mental model.

3. The Foggy Maze (Partial Observability)

In the real world, we rarely see everything perfectly. We have "foggy" sensors.

The Problem: Two different situations might look exactly the same to the robot's sensors (e.g., a dark hallway could be a dead end or a trap).
The Requirement: The paper proves that to avoid losing bets in these foggy situations, the robot must have a memory. It needs to remember, "I turned left three steps ago," to distinguish the dead end from the trap.
The Metaphor: It's like playing a game of chess where you can't see the whole board. If you want to win, you can't just look at the current square; you have to remember the moves that got you there. The paper proves that a robot forced to win will naturally develop this memory, even if you didn't tell it to.

4. The "Modular" Brain

The paper also looks at what happens if the robot faces many different types of tasks (e.g., some tasks are about speed, others about stealth).

The Finding: To be good at all of them, the robot's brain naturally organizes itself into modules.
The Analogy: Think of a Swiss Army Knife. If you need a tool that can do everything, it's better to have separate, specialized blades (a screwdriver, a knife, a saw) rather than one giant, messy blob of metal that tries to do everything poorly. The "pressure" to perform well on different tasks forces the AI to develop specialized internal parts that handle specific types of information.

5. Why This Matters for the Future

This is a big deal for understanding both AI and the human brain.

For AI: It suggests that as we build smarter, more capable AI, we shouldn't be surprised if they start developing "world models" (understanding how the world works) and "memories." They aren't just copying us; they are forced to do it by the math of being good at their jobs.
For Neuroscience: It explains why human brains look the way they do. We have memory centers, predictive areas, and modular sections. This paper suggests that our brains evolved this way not by accident, but because to survive in an uncertain world, you must have these structures. If you didn't, you would make too many mistakes and "lose the bet."

The Bottom Line

You don't need to tell a smart agent, "Build a map of the world." If you simply demand that it performs well in a complex, uncertain world, the math guarantees that it will build a map, a memory, and a structured way of thinking.

Competence creates structure. If you want a robot that acts like a competent human, you don't need to program the human-like structure; you just need to give it hard enough problems to solve, and the structure will emerge on its own.

1. Problem Statement

The central question addressed is: What internal structure is necessary for an artificial agent to act competently under uncertainty?

While classical control theory and reinforcement learning (RL) have established that optimal behavior can be implemented using belief states or world models (constructive sufficiency), they have not proven that such predictive internal structures are necessary. Existing architectures might achieve competence without explicitly modeling the world if the task distribution is simple or myopic.

The paper aims to close this gap by proving selection theorems: mathematical results showing that specific performance guarantees (low regret) on structured task families force an agent to implement specific internal structures (predictive states, memory, modularity).

2. Methodology

The authors employ a reductionist approach, transforming complex predictive modeling problems into binary "betting" decisions.

Betting Reduction: The core technical insight is reducing action-conditioned prediction tasks to binary bets. An agent must choose between two mutually exclusive branches (e.g., "Transition $s \to s'$ happens $\le k$ times" vs. " $> k$ times").
Regret Decomposition: The paper analyzes normalized regret ( $\delta$ $δ$ ). It proves that low average-case regret on these betting tasks directly constrains the probability mass the agent assigns to suboptimal bets.
- Key Lemma: If an agent has low regret on a test with a large "margin" (where the correct bet is significantly better than a coin flip), the agent must assign very low probability to the wrong action.
Task Distributions: Instead of assuming worst-case optimality (which is often unrealistic), the authors assume average-case regret over a distribution of structured prediction tasks.
Settings:
- Fully Observed: The agent sees the true state $s$ .
- Partially Observed (POMDP): The agent sees only observations $o$ , requiring internal memory to track latent states.
Policy Assumptions: The results hold for stochastic policies, making them applicable to modern deep RL algorithms (e.g., PPO, Dreamer) rather than just deterministic optimal controllers.

3. Key Contributions

The paper provides the first quantitative selection theorems linking average-case regret to necessary internal structure under partial observability.

A. Fully Observed Environments: Recovery of Causal Kernels

Theorem 1: Shows that if an agent achieves low average regret on a family of composite goals (predicting transition frequencies over $n$ steps), it must implicitly learn an approximate transition model.
Corollary 1 (Level 2 Interventions): The agent recovers the interventional transition kernel ($P(s' | s, do(a))$), corresponding to Pearl's Level 2 of causality.
Corollary 2 (Limit of Recovery): The paper proves that Level 3 counterfactuals (e.g., "What would have happened if I chose $a'$ instead of $a$ ?") cannot be recovered solely from the interventional kernel without additional structural assumptions (like a specific Structural Causal Model).

B. Partially Observed Environments: Necessity of Memory

Theorem 2 (Predictive Modeling Necessity): In POMDPs, low regret on betting goals forces the agent to maintain an internal mechanism sufficient to distinguish histories that lead to different future observation probabilities. This validates the necessity of a predictive state representation (PSR).
Theorem 3 (Memory Necessity / No-Aliasing): This addresses an open question from prior work (Richens et al., 2025). It proves that if an agent achieves low regret, its internal memory state $M(h)$ cannot "alias" (collapse) two different histories $h$ and $h'$ that require different confident predictions. If the agent treats distinct predictive futures as the same, it incurs unavoidable regret.

C. Structured Task Families: Emergent Organization

The paper derives further structural constraints based on the distribution of tasks:

Corollary 3 (Modularity): If tasks are block-structured (independent sub-problems), the agent's internal representation must exhibit informational modularity.
Corollary 4 (Regime Tracking): If the environment switches between different "regimes" (latent conditions) that affect the same observations, the agent must maintain internal variables to track these regimes (analogous to affective or homeostatic modulators in neuroscience).
Corollary 5 (Representational Convergence): Under "minimality" assumptions (no redundant internal states), any two agents achieving low regret on the same task distribution must possess internal representations that are invertibly isomorphic (structurally identical up to relabeling).

4. Key Results

Quantitative Bounds: The paper provides explicit error bounds. For example, the error in recovering the transition kernel scales with the average regret ( $\bar{\delta}$ ) and the inverse of the goal depth ( $n$ ).
Stochasticity is Compatible: Unlike previous "Good Regulator" theorems that often assumed deterministic or worst-case settings, these results hold for stochastic policies, aligning with modern deep learning practices.
Partial Observability Solved: The work resolves the open question of whether belief-like memory is necessary (not just sufficient) for competence in POMDPs. The answer is yes: without a memory that distinguishes predictive futures, low regret is impossible.
Causality Hierarchy: The results clarify that competence recovers Level 2 (interventional) causality but generally fails to recover Level 3 (counterfactual) causality without explicit structural priors.

5. Significance and Implications

Theoretical Bridge: The paper bridges the gap between competence (performance guarantees) and structure (internal organization). It suggests that "world models" and "memory" are not just architectural choices but mathematical necessities for robust agents.
NeuroAI and Representation Learning: The findings support the "Platonic Representation Hypothesis" and the "Contravariance Principle." They suggest that diverse agents (biological or artificial) trained on sufficiently rich, uncertainty-sensitive tasks will converge on similar internal representational structures (modularity, predictive states, regime tracking).
AI Safety and Interpretability: As AI systems become more capable, they will naturally develop these specific internal regularities (global integration, modularity, belief states). Understanding these "structural signatures" is crucial for interpreting and auditing advanced agentic systems.
Limitations of Current Theory: The work highlights that simple myopic control (Level 1) does not require world models, but multi-step coordination (Level 2) forces them. It also clarifies the boundary between what can be learned from behavior (interventions) vs. what requires explicit structural knowledge (counterfactuals).

In summary, Nayebi's work provides a rigorous mathematical foundation for the intuition that robust generalization under uncertainty selects for predictive internal structure, transforming world modeling from a heuristic design choice into a derived necessity.