Factual recall in linear associative memories: sharp asymptotics and mechanistic insights

This paper employs statistical physics to precisely characterize the storage capacity of linear associative memories, demonstrating that a decoupled model equivalent to the original system can store up to pclogpc/d2=1/2p_c \log p_c / d^2 = 1/2 associations and revealing that optimal solutions achieve this by raising correct scores just above the extreme-value threshold of competing outputs rather than broadly boosting alignments.

Original authors: Alessio Giorlandino, Sebastian Goldt, Antoine Maillard

Published 2026-05-12
📖 5 min read🧠 Deep dive

Original authors: Alessio Giorlandino, Sebastian Goldt, Antoine Maillard

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: The "Fact-Checking" Problem

Imagine you are trying to teach a robot to memorize a phone book. You want the robot to look at a name (the input) and instantly recall the correct phone number (the output).

In the world of Large Language Models (like the ones that write essays or chat with you), this is called "factual recall." These models are amazing at it, but scientists didn't really know the hard limit: How many facts can a simple neural network actually store before it starts getting confused and mixing things up?

This paper tries to find that exact limit for a very simple type of neural network (a "linear associative memory").

The Challenge: The "Shared Waiting Room"

To understand the problem, imagine a waiting room with pp people (inputs) and a single line of pp possible destinations (outputs).

  • The Goal: Person A needs to go to Destination A, Person B to Destination B, and so on.
  • The Problem: Everyone is standing in the same room looking at the same list of destinations.
  • The Confusion: If the network tries to send Person A to Destination A, it has to make sure Person A doesn't accidentally look more like they belong at Destination B, C, or D. Because everyone shares the same list of destinations, the rules for Person A are tightly linked to the rules for Person B. It's like a crowded dance floor where everyone is trying to find their partner, but they are all bumping into each other.

The authors call this the Original Problem. It's very hard to solve mathematically because the constraints are "coupled" (tangled together).

The Solution: The "Private Waiting Rooms"

To make the math easier, the authors invented a clever trick. They imagined a Decoupled Problem.

Instead of one big waiting room, imagine pp separate, private waiting rooms.

  • In Room 1, Person A is trying to find Destination A, but they are only competing against a private list of fake destinations that only exist in Room 1.
  • In Room 2, Person B is doing the same thing, but with their own private list.

In this version, the rules for Person A have nothing to do with Person B. The math becomes much simpler because the "noise" from other people is gone.

The Big Discovery: The authors found that even though these two scenarios look different, they have the exact same storage limit.

  • If the network can memorize the facts in the "Private Rooms" scenario, it can also memorize them in the "Shared Room" scenario.
  • This allows them to solve the easy version and apply the answer to the hard, real-world version.

The Magic Number: How Much Can It Hold?

The paper calculates a specific "tipping point" where the network stops working. They define a "load" based on how many facts you are trying to store versus how big the network is.

  • The Limit: The network can perfectly store facts as long as the number of facts is roughly half of the square of the network's size (specifically, plogp/d2=1/2p \log p / d^2 = 1/2).
  • What happens if you go over? If you try to store more facts than this limit, the network collapses. It can no longer distinguish the correct answer from the wrong ones, and accuracy drops to zero.

How It Works: The "Just Enough" Strategy

The paper also explains how the network achieves this perfect memory, which is different from how we might guess it works.

The Naïve Way (Hebbian Learning):
Imagine a student trying to memorize facts by shouting the correct answer louder and louder. They boost the "correct" signal so high that it drowns out everything else. This works okay, but it's inefficient. The paper shows this method hits a much lower limit (only about 1/8th of the capacity).

The Smart Way (Optimal Solution):
The optimal network is much more subtle. Instead of shouting, it acts like a judge at a competition.

  1. It knows that the "wrong" answers (the competitors) will naturally have some random noise or fluctuation.
  2. It calculates the highest score any "wrong" answer might accidentally get (the "extreme-value threshold").
  3. It then pushes the "correct" answer just barely above that threshold.

The Analogy:
Think of a high-jump competition.

  • The Naïve jumper tries to jump 10 meters high to be sure they win. It's exhausting and unnecessary.
  • The Optimal jumper watches the other competitors. If the best competitor is likely to jump 2.0 meters, the optimal jumper only needs to jump 2.01 meters. They don't need to jump to the moon; they just need to be just enough better than the competition.

This "just enough" strategy allows the network to pack in twice as many facts as the naïve method.

The Two-Layer Twist

The authors also looked at what happens if the network is slightly more complex (two layers instead of one). They found that if you restrict the network's "width" (make it thinner), the storage limit drops. They provided a formula to calculate exactly how much capacity is lost based on how thin the network is.

Summary

  1. The Problem: We wanted to know the absolute limit of how many facts a simple neural network can store.
  2. The Trick: We replaced a messy, shared problem with a clean, private version that turns out to have the same answer.
  3. The Result: The limit is sharp and predictable. If you try to store too much, the system fails completely.
  4. The Insight: The best way to store facts isn't to make the correct answer huge; it's to make it just slightly better than the worst-case scenario of the wrong answers.

This work gives us a precise mathematical "speed limit" for factual memory in these types of networks.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →