Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement

This paper proposes a computationally efficient, performance-independent metric grounded in low-rank bias to measure dynamical richness in neural networks, enabling the analysis of training factors and their relationship to representation learning without relying on predictive accuracy.

Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Niclas Goring, Ouns El Harzli, Abdurrahman Hadi Erturk, Soufiane Hayou, Ard A. Louis

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a robot to recognize cats and dogs. You have two main ways to look at how the robot learns:

  1. The "Scorecard" View: Did it get the right answer? (High accuracy = Good learning).
  2. The "Brain Structure" View: How did it rewire its internal connections to get there? (Did it simplify its thinking, or did it just memorize every single detail?).

For a long time, scientists assumed that if the robot got a high score, it must have developed a "rich" and efficient brain structure. This paper argues that assumption is wrong. You can get a high score by memorizing (lazy learning), or you can get a low score by overthinking (rich learning).

The authors introduce a new tool called DLR (Dynamical Low-Rank measure) to measure how the robot thinks, completely ignoring whether it got the answer right or wrong.

Here is a breakdown of their ideas using simple analogies:

1. The Problem: The "Rich" vs. "Lazy" Trap

In machine learning, there are two modes of learning:

  • Lazy Training: The robot barely changes its internal brain. It just tweaks the final "decision button" (the last layer) to fit the data. It's like a student who doesn't study the textbook but just memorizes the answer key for the specific test questions.
  • Rich Training: The robot fundamentally reorganizes its internal features. It learns the concept of a cat or dog. It's like a student who actually reads the book, understands the biology, and can identify a cat even if it's wearing a hat.

The Catch: Usually, we think "Rich = Good." But this paper shows that sometimes, being "Rich" (overthinking) makes you perform worse on a specific test, while being "Lazy" (memorizing the right features) makes you perform better.

2. The Solution: A New "Brain Scan" (DLR)

Previously, to measure if a robot was learning "richly," scientists had to look at how much the robot's brain changed compared to its starting point, or how complex its math was. These methods were slow, expensive, and often confused by the robot's final score.

The authors created DLR, a new metric that acts like a structural MRI scan of the robot's brain.

  • How it works: It looks at the "features" (the internal signals) the robot uses right before making a decision.
  • The Analogy: Imagine a chef making a soup.
    • Rich Dynamics (Low DLR): The chef uses only 3 essential ingredients (e.g., salt, pepper, tomato) to create a complex flavor. The recipe is simple, efficient, and focused.
    • Lazy Dynamics (High DLR): The chef dumps in 50 different ingredients, hoping the flavor works out. The recipe is messy and unfocused.
  • The Magic: DLR measures how many ingredients are actually doing the work. It doesn't care if the soup tastes good (accuracy); it only cares if the recipe is efficient (richness).

3. Key Discoveries (The "Aha!" Moments)

A. Richness \neq Success
The authors ran an experiment where they gave the robot a "trick" test.

  • Scenario: They trained the robot on pictures where the real image was the clue, but the labels (the text saying "cat" or "dog") were hidden in the first 10 pixels of the image.
  • Result:
    • The Rich robot (which tried to understand the whole picture) got confused by the hidden labels and failed the test.
    • The Lazy robot (which just looked at the specific pixels where the labels were) got a perfect score.
  • Lesson: Being "smart" (rich dynamics) doesn't always mean you will win the game. Sometimes, a simple, focused approach wins.

B. The "Grokking" Mystery
"Grokking" is a phenomenon where a robot suddenly goes from failing a math problem to solving it perfectly after a long time of training.

  • Using DLR, the authors showed that this sudden jump happens exactly when the robot switches from "Lazy" (memorizing) to "Rich" (understanding the pattern).
  • This proves that DLR can detect when a robot is truly learning, even before the test scores improve.

C. The Secret Sauce: Batch Normalization
They tested a common tool called "Batch Normalization" (a technique to stabilize training).

  • Without it: The robot was "Lazy" and performed poorly.
  • With it: The robot became "Rich" and performed much better.
  • Why it matters: This helps explain why this tool works. It forces the robot to reorganize its brain into a more efficient, rich structure.

4. The Visualization: Seeing the Invisible

To make this easier to understand, the authors created a visual tool. Imagine a graph showing the "importance" of every single neuron in the robot's brain.

  • In a Rich Robot: The graph looks like a steep mountain. Only the top 10 neurons are huge; the rest are tiny. The robot is focused.
  • In a Lazy Robot: The graph looks like a gentle hill. Hundreds of neurons are all slightly active. The robot is scattered and unfocused.

Summary

This paper gives us a new way to look at AI. Instead of just asking, "Did it get the answer right?" we can now ask, "How efficiently did it think?"

  • Old Way: Check the test score.
  • New Way (DLR): Check the "recipe" the AI used.

This tool helps researchers understand why some AI models learn fast, why some get stuck, and how to build robots that don't just memorize, but actually understand. It separates the "richness" of the learning process from the "score" of the result, showing us that sometimes, the most efficient path to a solution isn't the one that looks the smartest on paper.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →