A mathematical theory for understanding when abstract representations emerge in neural networks

This paper mathematically proves that abstract, disentangled representations of latent variables are guaranteed to emerge at all global minima in feedforward neural networks trained on tasks dependent on those variables, offering a unified explanation for such representations observed in both biological and artificial systems.

Original authors: Bin Wang, W. Jeffrey Johnston, Stefano Fusi

Published 2026-03-16
📖 6 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Picture: Why Do Brains (and AI) Think in "Abstract" Ways?

Imagine you are trying to teach a robot to recognize animals. You show it a picture of a big, striped tiger and a small, striped cat.

  • The "Messy" Way (Non-Abstract): The robot might memorize that "striped + big = tiger" and "striped + small = cat." If you show it a big, spotted leopard, it gets confused because it never saw "big + spotted" before. It's stuck on the specific details.
  • The "Abstract" Way (Disentangled): The robot learns two separate, independent ideas: "Size" and "Pattern." It understands that "Size" is one concept and "Pattern" is another. Because it has separated these ideas, it can instantly recognize a big, spotted leopard even though it's never seen one before. It knows "Big" + "Spots" = Leopard.

In neuroscience, scientists have noticed that real brains do this "Abstract" thing. When animals learn tasks, their brain cells organize themselves so that different variables (like size, color, or direction) are represented in separate, clean "lanes" or subspaces. This helps them learn new things quickly and generalize.

The Question: How does a neural network (whether a brain or an AI) naturally figure out how to separate these ideas? Is it magic, or is there a mathematical rule forcing it to happen?

The Answer: This paper says it's not magic. It's math. If you train a network on a task where the answer depends on specific hidden variables (like size and pattern), the network is mathematically guaranteed to organize itself into these clean, abstract lanes.


The Analogy: The "Chef's Kitchen" vs. The "Blender"

To understand how the authors proved this, let's use a kitchen analogy.

1. The Old Way (The Blender)

Usually, when we study neural networks, we look at the "ingredients" (the weights connecting the neurons). It's like looking at a blender full of smoothie ingredients. You see the blueberries, the spinach, and the yogurt all mixed together. It's hard to tell how the blender decided to mix them.

2. The New Way (The Chef's Recipe Book)

The authors of this paper decided to stop looking at the ingredients and start looking at the output of the cooking process before the final dish is served.

They realized that instead of tracking every single weight in the network, they could track the "pre-activations."

  • Analogy: Imagine the hidden layer of the network is a room full of chefs. Before they cook the final meal, they all shout out what they are thinking about.
  • The authors created a new mathematical "recipe" (called a Mean-Field Optimization) that looks at the collective shouting of the chefs rather than the individual knives they are holding.

By looking at the "shouting patterns" (the distribution of neural activity), they turned a messy, impossible-to-solve problem into a clean, solvable one.


The Key Discovery: The "Perfect Geometry"

The paper proves that when you train a network to solve a task with clear, separate variables (like "Odd/Even" and "Big/Small"), the network naturally arranges its "shouting chefs" into a perfect geometric shape.

  • The Shape: Imagine a cube.
    • One axis of the cube represents "Size."
    • Another axis represents "Pattern."
    • Another represents "Color."
  • The Result: The network learns to point its "Size" neurons strictly along the Size axis and its "Pattern" neurons strictly along the Pattern axis. They don't mix.

The authors call this a Parallelism Score (PS).

  • PS = 0: The variables are all mixed up in a tangled ball (like a bowl of spaghetti).
  • PS = 1: The variables are perfectly separated, like the axes of a 3D graph.

The Surprise: The paper proves that for a wide variety of network types (even those with different "activation functions," which are like the different ways neurons react to input), the network always finds a solution where the PS is 1. It wants to be abstract because that is the most efficient way to minimize error.


Why Does This Happen? (The "Competition" Analogy)

Think of the input data (the pictures) and the output labels (the answers) as two teams in a tug-of-war.

  1. The Input Team: The pictures might be messy, unstructured, or "whitened" (random noise).
  2. The Output Team: The answers are structured (e.g., "Is it odd? Yes/No").

The network is the rope. The paper shows that even if the Input Team is messy, the network will stretch itself out to perfectly match the structure of the Output Team.

  • If the input is "Target-Aligned": The input pictures already look a bit like the answers. The network easily snaps into an abstract shape.
  • If the input is "Whitened" (Random): The input is pure chaos. You might think the network would stay messy. But it doesn't. The math shows that the network actually expands its dimensions to create enough space to "move around" and eventually organize itself into the clean, abstract structure required by the answer.

It's like a chaotic dance floor. Even if everyone is dancing randomly at first, if the music (the task) demands a specific formation (like a square dance), the dancers will naturally rearrange themselves into that perfect square because it's the only way to satisfy the music.


What About Different Types of Neurons?

The paper also looked at whether the specific "personality" of the neurons matters.

  • ReLU Neurons: These are like "one-way valves." They only fire if the signal is positive.
  • Tanh/Linear Neurons: These are like "volume knobs" that can go up or down.

The Finding: It doesn't matter if the neurons are "one-way" or "volume knobs." As long as the task requires separating variables, the group of neurons will organize itself into an abstract, clean geometry.

  • However: The individual neurons might look different.
    • With "one-way" neurons, you might get a "modular" team where each neuron is an expert on just one thing (e.g., "I only care about Size").
    • With "volume knob" neurons, the neurons might be "mixed," where one neuron cares about a combination of things, but the group still forms the perfect abstract shape.

The Takeaway

This paper provides a "Theory of Everything" for why abstract thinking emerges.

  1. It's Inevitable: You don't need special tricks or extra rules to make AI or brains think abstractly. If you give them a task with clear, separate variables, they will naturally organize themselves to solve it that way.
  2. The Tool: The authors built a new mathematical "microscope" (the Mean-Field framework) that lets us see the hidden structure of neural networks without getting lost in the millions of tiny weights.
  3. The Bridge: This explains why we see the same "abstract geometry" in human brains, monkey brains, and artificial neural networks. They are all solving the same math problem: How do I organize my thoughts to answer this question most efficiently?

In short: Abstract representation isn't a bug; it's the most efficient solution to the math of learning.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →