From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers

This paper demonstrates that by introducing three minimal inductive biases—spatial smoothness, stability, and temporal locality—generic Transformers can evolve from mere curve-fitters into agents capable of discovering fundamental physical laws like Newtonian forces, thereby bridging the gap between high predictive accuracy and true causal understanding.

Original authors: Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias

Published 2026-02-09
📖 5 min read🧠 Deep dive

Original authors: Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a super-smart robot student. You want to teach it how planets move around the sun. You give it a massive history book of where the planets have been, and you ask it to guess where they will be next.

The big question this paper asks is: Can this robot student just memorize the path, or can it actually understand the laws of physics that cause the movement?

The authors found that without some special "training wheels" (which they call inductive biases), the robot is a brilliant memorizer but a terrible physicist. It learns to draw the path perfectly but has no idea why the planet is moving that way.

Here is the story of how they fixed the robot, broken down into three simple lessons.

The Problem: The Robot is a "Curve-Fitter," Not a "Physicist"

Think of the robot's brain as a giant library.

  • The Kepler Approach (What the robot did naturally): The robot looks at the last 1,000 points of a planet's journey. It says, "Aha! I see the pattern. It's an oval shape. I will just keep drawing the oval." It's like a child tracing a picture. It gets the picture right, but if you ask, "Why is it an oval?" or "What force is pulling it?", the robot has no answer. It just knows the shape.
  • The Newton Approach (What we want): We want the robot to say, "The sun is pulling the planet with gravity. If I know the planet's current speed and position, I can calculate the pull and predict the next step." This is understanding the cause, not just the effect.

The paper shows that standard AI models (Transformers) naturally become "tracers" (Kepler) and fail to become "calculators" (Newton). To fix this, the authors added three specific "training wheels."


Lesson 1: The "Pixelated Map" Problem (Spatial Smoothness)

The Analogy: Imagine you are trying to teach a robot to navigate a city.

  • The Mistake: You give the robot a map where every single street corner is a completely different, random color. "Red" is the corner of 1st and Main. "Blue" is the corner of 1st and 2nd. Even though these corners are right next to each other, the robot sees them as totally unrelated. It has to relearn the relationship between "Red" and "Blue" from scratch every time.
  • The Fix: The authors realized that when they chopped the planet's position into tiny "bins" (like pixels), they broke the natural smoothness of space.
  • The Solution: They made the "bins" bigger (fewer colors) or stopped using bins entirely and just gave the robot the exact coordinates (like a GPS). This allowed the robot to see that "Point A" is right next to "Point B," helping it build a real mental map of space instead of a confusing jumble of random codes.

Lesson 2: The "Domino Effect" Problem (Spatial Stability)

The Analogy: Imagine playing a game of "Telephone" where you whisper a number to the next person.

  • The Mistake: If the first person whispers "50.1" and the second person hears "50.2," the third person might hear "50.5," and by the time it gets to the end, the number is "100." In physics, if the robot makes a tiny mistake predicting the planet's position, that mistake gets bigger and bigger with every step, until the planet flies off into deep space or crashes into the sun.
  • The Fix: The authors realized that standard AI training is too "perfect." It only learns from perfect past data.
  • The Solution: They started "breaking" the robot's training data on purpose. They added a little bit of static noise (like static on a radio) to the history the robot was reading. This forced the robot to learn how to recover from small mistakes, making it robust enough to predict the future without the errors piling up.

Lesson 3: The "Long Memory" vs. "Short Memory" Problem (Temporal Locality)

The Analogy: This is the most important part.

  • The Long Memory (Kepler): Imagine a robot that remembers everything that happened in the last hour. When it tries to guess what happens next, it looks at the whole hour of history to draw a giant curve. It's like looking at a whole rollercoaster track to guess where the cart is going next. It works for the curve, but it doesn't understand the physics.
  • The Short Memory (Newton): Now, imagine a robot that is only allowed to remember the last two seconds. It can't see the whole track. It must look at where the cart is right now and how fast it's going right now to figure out where it goes next.
  • The Solution: The authors forced the robot to have a short memory. They told it, "You can only look at the immediate past."
  • The Result: Because the robot couldn't rely on the "big picture" curve anymore, it was forced to figure out the rules of the game. It had to calculate the invisible "pull" (gravity) acting on the planet right now to predict the next step. Suddenly, the robot stopped drawing ellipses and started calculating forces. It became a physicist.

The Big Takeaway

The paper concludes that how you design the AI's brain determines what it learns.

  • If you let it look at everything and use a pixelated map, it becomes a curve-fitter (Kepler). It draws pretty pictures but doesn't understand the universe.
  • If you give it a smooth map, teach it to handle mistakes, and force it to have a short memory, it becomes a physicist (Newton). It discovers the laws of gravity on its own.

The authors show that you don't need to program the laws of physics into the AI. You just need to give it the right "inductive biases" (the right training constraints), and it will discover the laws itself.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →