Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures

This paper presents a unifying theoretical framework demonstrating that gradient descent in diverse neural network architectures exhibits a simplicity bias by following saddle-to-saddle dynamics, which iteratively evolve near invariant manifolds to progressively learn solutions of increasing complexity such as higher rank, more kinks, or additional kernels and attention heads.

Yedi Zhang, Andrew Saxe, Peter E. Latham

Published 2026-03-12
📖 5 min read🧠 Deep dive

Imagine you are teaching a child to draw. You don't start by asking them to draw a masterpiece with perfect shading and complex details. Instead, you start with a stick figure. Then, maybe they add a circle for a head. Then, they add arms. Finally, they add details like fingers and clothes.

This paper explains that neural networks (AI brains) learn in the exact same way. They don't just "get smarter" smoothly; they go through distinct stages, starting with very simple solutions and gradually adding complexity, one piece at a time.

Here is the breakdown of the paper's big ideas using simple analogies:

1. The "Saddle-to-Saddle" Hike

Imagine a mountain range where the valleys are the "best" answers (low error) and the peaks are the "worst" answers. Usually, we think of learning as sliding down a hill into a valley.

But this paper says learning is more like hiking across a series of mountain passes (saddles).

  • The Plateau: The AI gets stuck on a flat, high part of the mountain (a "saddle point"). It's not moving much, and the error (loss) stays high. This is a "pause" in learning.
  • The Jump: Suddenly, the AI finds a way to slide down into a slightly lower valley.
  • The Repeat: It gets stuck on the next flat spot, then jumps down again.

The paper calls this "Saddle-to-Saddle Dynamics." It explains why training curves often look like a staircase: long flat periods (plateaus) followed by sudden drops in error.

2. What is "Simple" to an AI?

In this paper, "simple" doesn't mean "easy to understand." It means "using fewer building blocks."

  • In a standard brain, a "block" is a neuron.
  • In a convolutional network (like those that see images), a "block" is a filter (a pattern detector).
  • In a Transformer (like the one powering this chat), a "block" is an attention head (a focus mechanism).

The AI starts by using zero blocks (it just guesses the average answer). Then it wakes up one block. Then two. It keeps recruiting new blocks only when it absolutely needs to solve a harder part of the puzzle.

3. The "Invisible Tracks" (Invariant Manifolds)

Why does the AI stick to these simple steps? Why doesn't it just jump straight to a complex solution?

The authors discovered that the AI's math creates "invisible tracks" (called invariant manifolds).

  • Imagine the AI is a train. Even though the train has 100 cars (units), the tracks force it to behave as if it only has 1 car, then 2 cars, then 3.
  • The AI gets "locked" onto a track where it can only express simple ideas. It stays there until it gathers enough momentum to switch tracks to a slightly more complex one.

4. Two Different Ways to Switch Tracks

The paper found that there are two different "engines" that push the AI from one simple stage to the next, depending on the type of AI:

  • Engine A: The Data Push (Linear Networks)

    • Analogy: Imagine a group of rowers. The water (the data) has a strong current in one direction. All the rowers naturally start rowing in that direction first. Once that direction is mastered, the current shifts slightly, and they adjust.
    • Result: The AI learns low-rank solutions. It finds the most important patterns in the data first.
  • Engine B: The Initialization Push (Quadratic/Attention Networks)

    • Analogy: Imagine a race where everyone starts with a tiny, random head start. One runner happens to be slightly faster at the start. Because of the way the race is set up, that one runner pulls ahead massively while the others stay behind. Once that one runner is dominant, the next fastest one starts to pull ahead, and so on.
    • Result: The AI learns sparse solutions. It activates one specific unit (neuron/head) at a time, leaving the others dormant.

5. Why Does This Matter?

This theory solves a mystery: Why do some AI models learn in "stages" while others learn smoothly?

  • If you start with tiny weights: The AI follows the "invisible tracks" and learns simply first, then complexly. This is the "Saddle-to-Saddle" behavior.
  • If you start with huge weights: The AI skips the tracks. It jumps straight to a complex solution (or gets stuck in a messy place). It loses the "simplicity bias."
  • If the data is messy: The "tracks" might be broken, and the AI might not learn in clean stages.

The Big Takeaway

Neural networks aren't magic black boxes that instantly become geniuses. They are like construction crews that build a skyscraper floor by floor.

  1. They lay the foundation (zero complexity).
  2. They hit a pause while they figure out how to build the first floor (Saddle 1).
  3. They build the first floor (Simple solution).
  4. They hit another pause (Saddle 2).
  5. They build the second floor (Slightly more complex).

This paper gives us the blueprint for why they build it this way, and how we can control the speed of construction by changing how we start the project (initialization) or what materials we give them (data).