Feature Identification via the Empirical NTK

This article demonstrates that the eigenanalysis of the empirical neural tangent kernel (eNTK) effectively identifies ground-truth and interpretable features in trained neural networks, exhibiting superior alignment with known structures compared to PCA on synthetic arithmetic tasks as well as on a pretrained language model.

Original authors: Jennifer Lin

Published 2026-05-07
📖 5 min read🧠 Deep dive

Original authors: Jennifer Lin

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Idea: Finding the "Hidden Switches" in AI

Imagine a huge, complex machine (like a neural network) that has learned to perform a task, such as adding numbers or writing stories. You can watch the machine at work, but you cannot see how it thinks. It is like looking into a black box: you put a number in, and another number comes out, yet the gears inside remain hidden.

Scientists want to open the box and find the specific "switches" or "knobs" inside that the machine uses to understand concepts like "grammar," "addition," or "sentiment." This is called mechanistic interpretability.

The problem is that the machine has millions of knobs, all tangled together. Picking one at random is like trying to find a specific needle in a haystack by guessing.

Jennifer Lin's paper proposes a new, clever way to find these needles. Instead of guessing, the author uses a mathematical tool called the Empirical Neural Tangent Kernel (eNTK).

The Analogy: The "Echo Chamber" Test

Imagine the neural network as a giant echo chamber. If you shout a specific word (a feature like "noun" or "add 5"), the sound echoes around the room and hits the walls (the model's parameters) in a very specific pattern.

The eNTK is like a highly sensitive microphone that records how the entire room vibrates when you shout.

  • If you shout "noun," the room vibrates in a specific rhythm.
  • If you shout "verb," it vibrates in a different rhythm.

The author's hypothesis is: If we analyze the strongest vibrations (the "principal eigen-directions") in this echo chamber, we can pinpoint exactly which words were shouted.

In technical terms, the paper claims that by examining the "strongest patterns" of how the model's internal gears move while it learns, we can identify the exact directions the model uses to detect features.

The Three Experiments: From Simple Math to Large Language Models

The author tested this "echo chamber" idea on three different machine types, each becoming increasingly complex.

1. The Simple Math Machine (MLP)

  • The Task: A simple machine learned to add numbers modulo a prime number (a specific type of math puzzle).
  • The "Truth": We already knew the secret recipe the machine used: it transformed numbers into waves (Fourier features), for instance, by converting a number into a sine wave.
  • The Result: The author used the eNTK to listen to the machine. The strongest vibrations found by the eNTK matched the "sine wave" recipe perfectly.
  • The "Grokking" Moment: There is a phenomenon called "grokking," where a model suddenly shifts from failing a test to solving it perfectly after a long period of mere memorization. The paper found that at the moment the machine "grokked" (understood the math), the alignment between the eNTK vibrations and the mathematical features increased sharply. It is as if, at the exact moment the machine finally "got it," the echo chamber suddenly began singing the right song.

2. The Slightly Smarter Math Machine (Transformer)

  • The Task: A somewhat more complex machine (a Transformer) learned the same math puzzle.
  • The Difference: This machine did not use every possible wave; it selected some random, specific frequencies to solve the problem.
  • The Result: Even though the machine chose random frequencies, the eNTK still found them. It successfully identified the specific "notes" the machine used for math.

3. The Large Language Model (Gemma-3-270M)

  • The Task: This is a real, pre-trained language model (like a mini-version of the AI you chat with) that reads stories.
  • The Challenge: Here, we do not know the "secret recipe." We only want to see if the machine can recognize grammar (such as nouns, verbs, or past tense).
  • The Test: The author took a small set of stories and asked: "Can the eNTK vibrations tell us which words are nouns?"
  • The Comparison: She compared the eNTK method with PCA (a standard, older method that only looks at the most active parts of the machine).
  • The Result: The eNTK method was better. It found the "grammar switches" more accurately than the standard method. For example, it was better at recognizing "verbs" or "past tense" than the old method.

The Main Takeaway

The paper claims that analyzing the "vibrations" of the model's learning process (via the eNTK) is a powerful new flashlight.

  • It works on simple mathematical models where we know the answer.
  • It works on complex language models where we do not know the answer, and it finds grammar features better than current standard tools.
  • It seems to shine brightest exactly when a model suddenly understands a concept (the "grokking" moment).

What the Paper Does Not Claim

It is important to stick to what the paper actually says:

  • It is not a cure-all: The paper admits these are "correlative" results. Just because the eNTK finds a direction that looks like "grammar" does not prove that changing that direction will fix the model. It is a discovery tool, not necessarily a control panel.
  • It is not about future AI safety: The paper mentions that this could be useful for safety in the future, but it does not present safety applications or clinical uses. It is purely a method to understand how models work now.
  • It is not perfect: The experiment with the language model used a relatively small dataset and a specific model. The author suggests testing this on larger models and datasets to be sure.

Summary in One Sentence

This paper proposes that by listening to the "echoes" of how a neural network learns (using a tool called eNTK), we can successfully identify the hidden "switches" the model uses to understand math and grammar, often finding them more clearly than previous methods.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →