Spectral Reach: Understanding Neural Scaling as Progress into the Spectral Tail

This paper introduces "spectral position" to demonstrate that larger neural models achieve superior performance by extending their learning capacity into the spectral tail of the empirical neural tangent kernel, a capability enabled by feature learning that adaptively amplifies gradients to access weak signals inaccessible to smaller models.

Original authors: Konstantin Nikolaou, Jonas Scheunemann, Sven Krippendorf, Samuel Tovey, Christian Holm

Published 2026-06-01
📖 5 min read🧠 Deep dive

Original authors: Konstantin Nikolaou, Jonas Scheunemann, Sven Krippendorf, Samuel Tovey, Christian Holm

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: Why Bigger Models Learn Better

Imagine you are trying to learn a new language.

  • Small models are like students who only learn the most obvious, common words (like "hello," "cat," "run"). Once they know these, they stop improving because they can't understand the complex grammar or rare idioms.
  • Large models are like students who not only know the common words but also keep digging deeper to learn obscure vocabulary, complex sentence structures, and subtle nuances.

This paper asks: Why do larger models keep learning while smaller ones stop?

The authors discovered that larger models have a special ability they call "Spectral Reach." It's like having a longer ladder. While small models can only reach the top rungs (the easy, obvious patterns), large models can climb all the way down to the very bottom rungs (the tiny, hidden, difficult patterns) to keep improving.


The Core Concept: The "Spectral Tail"

To understand this, imagine the learning process as a giant library of books, where each book represents a different pattern in the data.

  • The Bestsellers (The Head): These are the popular, easy-to-learn patterns. They are loud, clear, and easy to hear. Every model, big or small, learns these first.
  • The Obscure Archives (The Tail): These are the quiet, faint, and difficult patterns. They are buried deep in the library.

The Problem: As a model trains, it finishes reading the "Bestsellers" first. Once it's done, it needs to move to the "Archives" to keep getting better.

  • Small models hit a wall. They run out of "brainpower" to read the faint books in the archives. They get stuck.
  • Large models have a "super-ear." They can hear the faint whispers in the archives. They keep reading, learning the subtle details that others miss. This ability to reach deep into the "spectral tail" is Spectral Reach.

The New Tool: The "Spectral Position" Meter

The authors invented a new tool called Spectral Position (or χpos\chi_{pos}). Think of this as a GPS tracker for the model's learning journey.

  • High GPS Value (Close to 1): The model is currently reading the "Bestsellers." It's learning the big, easy patterns.
  • Low GPS Value (Close to 0): The model has moved deep into the "Archives." It is now learning the tiny, difficult patterns.

What they found:

  1. Time Travel: As training goes on, the GPS value drops. The model naturally moves from easy patterns to hard ones.
  2. The Size Difference: Bigger models drop their GPS value much lower than smaller models. They go deeper into the archives. This explains why they end up with lower errors (better performance)—they simply learned more of the hidden details.

The Secret Ingredient: Feature Learning

You might ask, "Why can big models hear the faint whispers?"

The paper tested this by freezing the "brain" of a model (preventing it from changing its internal features) and only letting the final layer learn.

  • Frozen Models: These models stopped learning early. They couldn't reach the deep archives.
  • Active Models: These models kept changing their internal "features" (how they see the world).

The Analogy: Imagine trying to listen to a faint radio station.

  • A frozen model is like a radio with a broken antenna. No matter how much you turn the volume up, you can't hear the faint station.
  • A learning model is like a radio that builds a better antenna while you are listening. As it learns, it reshapes its internal structure to amplify those faint signals. This "antenna building" (feature learning) allows the model to sustain its progress even when the signals get very weak.

The "LNP" Decomposition: Breaking Down the Math

The authors created a formula to measure this without needing to do impossible calculations. They broke the learning process into three parts, like a recipe:

  1. Loss Scale (χloss\chi_{loss}): How "loud" the mistake is right now. (If the model is wrong, this is high).
  2. Network Scale (χnet\chi_{net}): How sensitive the model is to changes. (Big models can build stronger "antennas" here).
  3. Spectral Position (χpos\chi_{pos}): The GPS value. Where in the library is the model reading?

The Magic: They found that as the model gets deeper into the "Archives" (Spectral Position drops), the "Network Scale" (the antenna strength) actually increases in big models. This extra strength compensates for the faintness of the signals, allowing the model to keep learning. Small models don't get this boost, so they give up.

Summary of Findings

  • Learning is a journey: Models start with easy patterns and slowly move to hard, fine-grained details.
  • Size matters: Bigger models can go further into the "hard details" (the spectral tail) than smaller ones.
  • Adaptability is key: This ability isn't just about having more memory; it's about the model actively reshaping itself (feature learning) to amplify weak signals.
  • The Metric: The new "Spectral Position" tool allows scientists to watch this journey in real-time, even for massive models, without needing supercomputers to do impossible math.

In short, bigger models win because they don't stop learning when the easy stuff is done; they have the "reach" to keep digging for the hidden gems that smaller models can't find.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →