From Next Token Prediction to (STRIPS) World Models

Imagine you are teaching a robot how to play a complex board game, like Chess or a puzzle, but you aren't allowed to give it the rulebook. Instead, you just show it thousands of videos of people playing the game—some moves are legal, and some are illegal (like trying to move a knight like a rook).

Your goal is for the robot to watch these videos, figure out the hidden rules on its own, and then use those rules to solve brand-new puzzles it has never seen before.

This paper is about a team of researchers who tried to teach a specific type of AI (called a Transformer, the same technology behind chatbots like me) to do exactly this. They wanted to see if the AI could learn a "World Model"—an internal understanding of how the world works—just by predicting the next move in a sequence.

Here is the breakdown of their journey, using some everyday analogies.

The Problem: The "Black Box" vs. The "Rulebook"

Most modern AIs are like brilliant parrots. They are amazing at mimicking patterns. If you show them enough videos of a game, they can guess the next move with high accuracy. But do they actually understand the rules? Or are they just memorizing that "Knights usually jump over two squares"?

The researchers wanted to know: Can we force the AI to learn the actual rulebook (the logic) so it can plan ahead, even for situations it has never seen?

The Two Students: The "Symbolic" vs. The "Intuitive" Learner

To test this, they created two different types of AI students to learn the rules of a game called STRIPS (a standard way to describe logic puzzles in computer science).

1. The "Rule-Follower" (The STRIPS Transformer)

This student was built with a very specific instruction manual baked into its brain.

The Analogy: Imagine a student who is given a blank notebook and told, "You must write down every rule exactly as it is written in the book. You have a specific column for 'Preconditions' and a specific column for 'Effects'."
How it worked: The researchers forced the AI to align its internal math directly with the logical structure of the game.
The Result: It was very hard to train. It was like trying to teach a child to write perfectly by forcing them to hold a pen in a specific, awkward way. It needed a massive amount of data to get it right, and sometimes it just gave up. But when it did learn, it was very precise.

2. The "Intuitive" Learner (The Stick-Breaking Transformer)

This student was a standard AI, but with a special trick called "Stick-Breaking Attention."

The Analogy: Imagine a student who is told, "Don't worry about writing down the rules. Just pay attention to the most recent thing that happened that matters."
- Think of a stick of length 1. If a new event happens, you break off a piece of the stick to represent that event. If an even newer, more important event happens, you break off the rest of the stick. The AI learns to focus on the "rightmost" (most recent) relevant piece of history.
The Result: This student was a natural. It learned the patterns incredibly fast, needed less data, and was much easier to train. It didn't have the "rulebook" built-in, but it figured out the logic on its own.

The Big Test: Can They Plan?

The real test wasn't just guessing the next move; it was Planning.

The Scenario: Imagine you teach the AI how to move blocks in a small room (5 blocks).
The Challenge: Can the AI now solve a puzzle in a huge room with 100 blocks, or a room with a completely different layout it has never seen?
The Magic: Both students, once trained, could extract a "Rulebook" (a symbolic model) from their brains. They handed this rulebook to a standard planning computer program.
The Outcome:
- The Intuitive Learner (Stick-Breaking) was the winner. It learned the rules so well that it could solve puzzles with exponentially more complexity than what it was trained on. It could handle millions of unseen starting positions and goals.
- The Rule-Follower struggled to learn in the first place.
- The Surprise: They also tested standard AI models (without the special "stick-breaking" trick). These models were great at memorizing short videos but failed miserably when the videos got long. They couldn't generalize. However, if you took a standard AI trained on short videos, you could still extract a decent rulebook from it. But the Stick-Breaking model was the only one that could handle long, complex sequences and still produce a perfect rulebook.

The "Setup" Trick

One clever part of the experiment was how they taught the AI about the "state" of the world (e.g., "Is the block on the table?").

They added special "setup moves" to the training videos.
- Init-p: A move that says, "Hey, remember, this block is currently on the table."
- Test-p: A move that asks, "Is the block still on the table?"
This forced the AI to keep track of the truth of every single fact in the game, ensuring it didn't just guess the next move but actually tracked the state of the world.

The Takeaway

The paper proves that next-token prediction (guessing the next word/move) can indeed lead to a true World Model, but you need the right architecture to make it stick.

Don't over-engineer: Trying to force the AI to be "symbolic" from the start (the Rule-Follower) made it harder to learn.
Focus on the recent past: The "Stick-Breaking" method, which forces the AI to focus on the most relevant recent history, allowed it to learn the underlying logic naturally.
Generalization is real: These models didn't just memorize; they learned the rules of the game. Once they knew the rules, they could solve problems that were exponentially harder than anything they had seen during training.

In short: You don't need to give the robot a rulebook. If you show it enough examples and give it the right way to pay attention, it can write the rulebook itself, and then use it to become a master planner.

1. Problem Statement

The paper investigates whether Next Token Prediction (NTP), the core training objective of Large Language Models (LLMs) and Transformers, is sufficient to learn World Models capable of supporting planning.

The Gap: While Transformers have shown success in predicting sequences, it is unclear if they learn the underlying causal dynamics (state transitions) required for planning, or if they merely rely on surface-level statistical regularities. Previous attempts to extract planning capabilities from LLMs often fail to produce exact symbolic models or struggle with generalization to unseen states and long horizons.
The Setting: The authors focus on Propositional STRIPS, a standard formalism in classical planning where states are sets of Boolean atoms, and actions have preconditions, add lists, and delete lists.
The Task: Learn a model $M = \langle F, A \rangle$ (atoms and actions) solely from action traces (sequences of actions). The model must determine if a trace is "positive" (all actions are applicable given the history) or "negative" (an action is inapplicable). Crucially, the learned model must support planning over exponentially many unseen initial states and goals.

2. Methodology

The authors propose and evaluate two distinct Transformer architectures designed to learn STRIPS action models from traces.

A. The STRIPS Transformer (Symbolically Aligned)

This architecture is explicitly grounded in theoretical results linking hard-attention Transformers to B-RASP (a Boolean programming language) and star-free languages.

Structure: It uses a single-layer, multi-head architecture where each attention head corresponds to a specific domain atom ( $p \in F$ ).
Mechanism: It utilizes Masked Hard Attention (approximated via Stick-Breaking Attention). For each atom, the attention mechanism identifies the most recent preceding action that affects that atom.
Logic: It directly encodes the STRIPS logic: an action is inapplicable if a precondition atom was deleted by the most recent prior action affecting it.
Parameters: The weights are initialized to encode the specific preconditions, add-lists, and delete-lists of the ground-truth model, effectively hard-coding the symbolic structure into the network's inductive bias.

B. The Stick-Breaking (SB) Transformer (Standard Architecture)

This is a standard decoder-style Transformer but with specific modifications to handle the sequential nature of state changes without explicit symbolic alignment.

Architecture: Standard multi-layer Transformer with learned token embeddings.
Key Innovation: It replaces standard Softmax attention with Stick-Breaking Attention (Tan et al., 2025).
- Why Stick-Breaking? Softmax attention distributes weight across all tokens, which can dilute the focus on the most recent relevant event. Stick-breaking attention is a differentiable mechanism that approximates hard attention, forcing the model to focus on the rightmost (most recent) high-scoring predecessor. This is crucial for tracking state changes (e.g., "Was this block cleared by the last move?").
Training: It learns purely from data without built-in symbolic constraints.

C. Model Extraction and Planning

To use these models for planning, the authors extract a symbolic STRIPS model $M'$ from the trained neural networks:

Setup Actions: Training traces are augmented with special "setup actions" (init-p, test-p) to encode initial states and query final states.
Extraction:
- STRIPS Transformer: Parameters are binarized to directly read off preconditions and effects.
- SB Transformer: A "state probing" process is used. The model predicts the applicability of test-p actions for every prefix of a trace to reconstruct the state trajectory. The STRIPS model is then derived via majority consensus over these reconstructed states.
Planning: The extracted symbolic model $M'$ is fed into off-the-shelf planners (e.g., Mimir with FF heuristic) to solve new planning problems.

3. Key Contributions

Demonstration of World Model Learning: The paper provides empirical evidence that NTP-trained Transformers can learn exact symbolic world models (STRIPS) that support planning, moving beyond mere pattern matching.
Stick-Breaking Attention for Planning: The authors identify Stick-Breaking Attention as a critical mechanism for Transformers to generalize to long sequences and learn state dynamics. Standard Softmax attention fails to generalize to longer traces, whereas Stick-Breaking achieves near-perfect performance.
Compositional Generalization: Both architectures demonstrate the ability to generalize to exponentially many unseen initial states and goals, proving they have learned the underlying transition rules rather than memorizing specific traces.
Symbolic vs. Statistical Trade-off: The study highlights a counter-intuitive finding: the architecture with strong symbolic inductive bias (STRIPS Transformer) is harder to optimize and requires more data than the standard architecture with the right attention mechanism (SB Transformer).

4. Experimental Results

The models were evaluated on five classical planning domains: Blocksworld, Ferry, Npuzzle, Maze, and Logistics, across small and large problem sizes.

Next-Token Prediction Accuracy:
- SB Transformer: Achieved near-perfect training and test accuracy (e.g., 99.9% on long traces).
- Standard Transformers (Softmax): Achieved high training accuracy but failed catastrophically on long test traces (dropping to ~23-36% accuracy), indicating a failure to learn long-horizon dependencies.
- STRIPS Transformer: Achieved high accuracy but required larger datasets and showed higher variance in convergence compared to the SB Transformer.
Planning Performance:
- Models extracted from SB Transformers achieved 100% planning accuracy across all domains and problem sizes.
- Models extracted from STRIPS Transformers also achieved high planning accuracy (often >90-98%) when successfully trained.
- Baseline Failure: While standard Softmax transformers failed to predict long traces, the symbolic models extracted from them when trained on short traces could still plan perfectly. However, when extracted from long traces, their planning performance collapsed.
Generalization:
- Both proposed models successfully solved planning problems with initial states and goals never seen during training, confirming compositional reasoning.

5. Significance and Conclusion

This work bridges the gap between neural sequence modeling and symbolic AI planning.

Theoretical Insight: It confirms that the computational power of Transformers (specifically with hard/stick-breaking attention) is sufficient to recognize the star-free languages that characterize valid STRIPS action traces.
Practical Implication: It suggests that "World Models" for planning do not necessarily require explicit symbolic architectures or reinforcement learning; they can emerge from supervised next-token prediction if the attention mechanism is correctly designed to track state changes (recency bias).
Future Direction: The results suggest that Stick-Breaking Attention is a vital component for training LLMs on tasks requiring long-horizon reasoning and state tracking. The authors propose future work on learning lifted (first-order) STRIPS models directly from traces, moving beyond the propositional setting.

In summary, the paper demonstrates that with the right inductive bias (Stick-Breaking Attention), standard Transformers can learn exact symbolic world models from action traces alone, enabling robust planning over unseen scenarios.