Generative Models in Decision Making: A Survey

Imagine you are teaching a robot to walk, drive a car, or even design a new medicine. For decades, the standard way to do this was Reinforcement Learning (RL). Think of traditional RL like a student taking a multiple-choice test where they are only told "Right" or "Wrong" (a scalar reward). The student tries to guess the single best answer to get the highest score.

The problem? Real life isn't a multiple-choice test. There isn't just one way to walk across a room or drive through traffic. Sometimes you walk left, sometimes right. Sometimes you brake hard, sometimes you slow down gently. Traditional RL often gets stuck trying to find that "one perfect answer," leading to rigid, robotic behavior that breaks when things get messy.

This paper proposes a massive shift in thinking: Generative Decision Making. Instead of guessing the single best answer, the robot learns to generate a whole bundle of possible futures and pick the best one from that crowd. It's like moving from a student memorizing one answer to an artist who can paint a thousand different versions of a sunset and choose the most beautiful one.

Here is the paper broken down into simple concepts and analogies:

1. The Big Idea: From "Point" to "Picture"

Old Way (Scalar Maximization): Imagine trying to hit a bullseye on a dartboard. You throw one dart, get a score, and try to hit that exact spot again. If the wind changes, you miss. This is "Point Optimization."
New Way (Distribution Matching): Imagine instead of throwing one dart, you throw a net that covers the whole board, capturing all the ways a human might throw a dart. You learn the shape of the crowd of throws. This is "Distribution Matching."
Why it matters: Humans are messy and creative. We have many ways to solve a problem. Generative models (like the ones that make AI art) are great at capturing this messiness. They don't just predict the action; they predict all the likely actions.

2. The Four Roles of the AI Team

The authors realized that instead of just looking at how the AI is built (its architecture), we should look at what job it is doing. They break the decision-making process into four distinct roles, like a movie production crew:

The Controller (The Director):
- Job: Decides what action to take right now based on the current scene.
- Analogy: The Director shouting "Action!" and telling the actor exactly what to do. In the old days, the Director only knew one script. Now, the Director can improvise and offer three different ways to say a line, letting the actor choose the most natural one.
The Modeler (The Special Effects Team):
- Job: Predicts what will happen next. "If I turn left, what will the world look like?"
- Analogy: This is the "World Simulator." Instead of crashing a real car to see what happens, the AI builds a realistic dream world inside its head. It can "daydream" a thousand scenarios to see which ones end in a crash and which ones lead to a goal.
The Evaluator (The Critic):
- Job: Judges how good a plan is. "Is this a safe path? Is this a good move?"
- Analogy: The Film Critic. Instead of just saying "Good/Bad," this critic gives a detailed review. It can say, "This path is 90% safe, but that one looks risky." It helps filter out bad ideas before the robot tries them.
The Optimizer (The Editor):
- Job: Takes a rough draft and polishes it.
- Analogy: Imagine the Director has a rough sketch of a scene. The Editor goes in, frame by frame, smoothing out the jerky movements and fixing the timing. The AI starts with a messy, random idea and slowly "denoises" it into a perfect, smooth plan.

3. How They Work Together (The "Control as Inference" Framework)

The paper argues that all these different AI tools (Diffusion models, Transformers, GANs) are just different tools for these four jobs.

Diffusion Models are like a sculptor chipping away stone. They start with a block of noise (random ideas) and slowly chip away the bad parts until a perfect statue (a good plan) remains. This is great for Optimizing complex paths.
Transformers (like the brain behind ChatGPT) are like a storyteller reading a book. They look at the past and predict the next word (or action). This is great for Controlling a robot to follow a long sequence of instructions.

4. Where This Is Used (Real World Examples)

The paper looks at three high-stakes areas where this new approach is vital:

Robots (Embodied AI):
- Problem: Robots often break because they are too rigid.
- Solution: Generative models let robots learn from human videos. If a human demonstrates opening a door by pushing, pulling, or sliding, the robot learns all those ways, not just one.
Self-Driving Cars:
- Problem: Cars need to handle "corner cases" (rare, weird situations like a ball rolling into the street followed by a child).
- Solution: The AI can generate millions of "what-if" scenarios in its head to practice for rare events without ever crashing a real car.
Science & Medicine:
- Problem: Designing a new drug is like finding a needle in a haystack of billions of molecules.
- Solution: The AI generates thousands of potential drug structures, evaluates which ones are safe and effective, and optimizes the design.

5. The Dangers (The "Hallucination" Risk)

Just because an AI can generate a beautiful picture doesn't mean it's real.

The Risk: The AI might "hallucinate." It might generate a plan that looks perfect on paper but is physically impossible (e.g., a car driving through a wall because the AI forgot gravity exists).
The Fix: The paper suggests a "Safety Guard" system. The Generative AI proposes the crazy, creative ideas, but a strict, rule-based safety filter (like a human supervisor) checks them before the robot actually moves. "You can dream it, but you can't do it until I say it's safe."

The Bottom Line

This paper is a roadmap for the future of AI. It says: Stop trying to force AI to be a rigid calculator. Start letting it be a creative generator.

By treating decision-making as "generating possibilities" rather than "calculating the one right answer," we can build robots and systems that are more flexible, more human-like, and better at handling the messy, unpredictable real world. The goal is to create Generalist Physical Intelligence—AI that can understand the physical world, dream up solutions, and act safely in it.

1. Problem Statement

Traditional Reinforcement Learning (RL) and optimal control have historically dominated sequential decision-making, focusing on scalar reward maximization (finding a single optimal policy $\pi^*$ ). However, this paradigm faces critical bottlenecks when scaling to open-world, high-dimensional tasks:

Limited Expressivity: Standard RL policies often rely on unimodal distributions (e.g., Gaussian), failing to capture the complex, multi-modal nature of human behavior found in diverse offline datasets (e.g., D4RL).
Sample Inefficiency: The entanglement of dynamics modeling and policy optimization in model-free RL leads to poor sample efficiency.
Fragility: Point estimates are often brittle to perturbations and struggle with distributional shifts.
Fragmented Literature: Existing reviews categorize generative decision-making by specific architectures (e.g., Diffusion vs. Transformers) rather than their functional roles, lacking a unified theoretical framework to connect diverse mechanisms like Energy-Based Models (EBMs), GFlowNets, and Diffusion models under a common decision-theoretic lens.

2. Methodology: The Unified Taxonomy

The authors propose a function-centric taxonomy grounded in the probabilistic framework of Control as Inference. Instead of categorizing by neural architecture, they derive four distinct functional roles by factorizing the trajectory posterior distribution $p(\tau | O)$ , where $O$ represents the event of optimality.

The core mathematical derivation (Eq. 4) factorizes the posterior as:
$p(\tau | O) \propto \underbrace{\rho_0(s_0) \prod_{t=0}^{T-1} p(s_{t+1}|s_t, a_t)}_{\text{Dynamics (Modeler)}} \cdot \underbrace{\prod_{t=0}^{T-1} \pi(a_t|s_t)}_{\text{Policy (Controller)}} \cdot \underbrace{\exp(R(\tau))}_{\text{Value (Evaluator)}}$

Based on this, the survey defines four functional roles:

Controller (Amortized Policy Inference):
- Role: Approximates the optimal policy $\pi(a|s)$ directly.
- Mechanism: Uses generative models (e.g., Diffusion Policies, VAEs, GANs) to map states to actions. Unlike standard RL, these capture multi-modal action distributions, essential for offline imitation learning where human demonstrations vary.
- Key Examples: Diffusion Policy, Decision Transformer, GAIL.
Modeler (Dynamics Priors / World Models):
- Role: Approximates transition dynamics $p(s'|s, a)$ .
- Mechanism: Acts as a "World Model" allowing agents to "dream" or simulate futures. It constrains trajectory inference to physically plausible paths.
- Key Examples: Dreamer (RSSM), IRIS (Token-based), Genie (Interactive Environments).
Optimizer (Iterative Trajectory Refinement):
- Role: Performs the inference engine to solve for the optimal trajectory $\tau^*$ or parameters.
- Mechanism: Treats planning as an iterative sampling or "in-painting" problem (e.g., reverse diffusion). It trades computational cost at test time for higher precision and long-horizon consistency, avoiding compounding errors of step-by-step rollouts.
- Key Examples: Diffuser, GFlowNets (for diverse sampling), DiffOPT.
Evaluator (Trajectory Guidance & Safety):
- Role: Approximates the optimality likelihood $p(O|\tau) \propto \exp(R(\tau))$ .
- Mechanism: Provides dense gradient signals or safety checks. It moves beyond scalar rewards to distributional guidance, acting as a safety guard to reject out-of-distribution (OOD) or unsafe trajectories.
- Key Examples: Energy-Based Models (EBMs), Discriminators in GAIL, Conformal Prediction monitors.

3. Key Contributions

A Unified, Function-Centric Taxonomy: The paper moves beyond architecture-centric reviews (e.g., "Diffusion for RL") to a grounded functional perspective. It demonstrates that a single architecture (e.g., a Transformer) can serve as a Controller, Modeler, or Optimizer depending on its inference-time behavior.
Critical Synthesis of Methodologies: The authors systematically evaluate representative generative families (GANs, VAEs, Diffusion, Flow Matching, GFlowNets, Autoregressive models) across the four functional roles, analyzing trade-offs in latency, mode coverage, and scalability.
Safety and Risk Analysis: The survey identifies systemic risks specific to generative decision-making, such as Physics Hallucination (generating physically impossible transitions in world models) and Proxy Exploitation (optimizing against imperfect reward proxies in scientific discovery).
Application Roadmap: It provides a comprehensive catalog of algorithms mapped to specific domains (Embodied AI, Autonomous Driving, Scientific Discovery) and outlines mitigation strategies like hierarchical safeguards and conformal prediction.

4. Results and Analysis

The survey synthesizes the state-of-the-art to reveal several key trends and trade-offs:

Paradigm Shift: The field is transitioning from "Point Optimization" (finding a single best action) to "Distribution Matching" (generating a bundle of diverse, robust trajectories).
Performance Trade-offs (Fig. 5):
- One-Step Mappings (GANs/VAEs): High speed, low latency, but suffer from mode collapse and blurry actions.
- Iterative Refinement (Diffusion/Flow): High fidelity and mode coverage, but computationally heavy and slow at inference time.
- Autoregressive Models: Extremely scalable and good for long-horizon consistency, but prone to error accumulation in open-loop generation.
- Amortized Structural Inference (GFlowNets): Unmatched for diverse exploration in discrete/combinatorial spaces but complex to train.
Domain Specifics:
- Robotics: Diffusion policies are becoming the standard for high-fidelity imitation learning, replacing unimodal Gaussian policies.
- Autonomous Driving: Requires hierarchical safeguards where generative models propose plans, and deterministic filters (e.g., RSS, collision checkers) verify safety.
- Science: Generative models enable diverse candidate search (e.g., protein design) but face risks of generating toxic or invalid structures (proxy exploitation).

5. Significance and Future Directions

This survey is significant because it provides the first principled framework to unify the rapidly expanding field of generative decision-making. It clarifies that the choice of generative mechanism should be driven by the functional role required (Controller vs. Optimizer) rather than just the model architecture.

Future Directions Identified:

Physical Foundation Models (PFMs): Moving from domain-specific controllers to generalist agents that natively model continuous physical dynamics (e.g., Cosmos, Pi0).
Inference Efficiency: Developing techniques like Consistency Models, Speculative Decoding, and Flow Matching to bridge the gap between high-fidelity generation and real-time control (kHz frequencies).
Trustworthy AI: Integrating rigorous safety guarantees, such as Control Barrier Functions (CBFs) into diffusion processes and Machine Unlearning to prevent dual-use risks (e.g., biosecurity).
Causal Reasoning: Transitioning from correlational simulators to Causal World Models to support valid counterfactual reasoning and intervention.

In conclusion, the paper argues that the convergence of generative AI and physical systems signals the dawn of Generalist Physical Intelligence, where agents can not only hallucinate plausible futures but effectively realize them in the physical world through robust, safe, and efficient generative decision-making.

Generative Models in Decision Making: A Survey

1. The Big Idea: From "Point" to "Picture"

2. The Four Roles of the AI Team

3. How They Work Together (The "Control as Inference" Framework)

4. Where This Is Used (Real World Examples)

5. The Dangers (The "Hallucination" Risk)

The Bottom Line

1. Problem Statement

2. Methodology: The Unified Taxonomy

3. Key Contributions

4. Results and Analysis

5. Significance and Future Directions

More like this

Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Algorithmic Barriers to Detecting and Repairing Structural Overspecification in Adaptive Data-Structure Selection

Zero-Cost NDV Estimation from Columnar File Metadata

Persistence-based topological optimization: a survey

Multi-LLM Query Optimization