Imagine the internet as a giant, ever-growing library. For a long time, this library was filled with books written by humans. But now, AI models are writing new books, and humans are publishing them. Soon, these AI-written books will be used to train the next generation of AI.
This creates a loop: AI writes, humans publish, new AI learns from that, and the cycle repeats.
The paper you shared asks a scary but fascinating question: What happens to the library if we keep recycling the same books over and over? Does the library get smarter, or does it start to lose its mind?
The author, Søren Riis, uses a mathematical model to show that two main forces are at play in this loop: Drift and Selection.
Here is the story of the library, explained simply.
1. The Force of Drift: The "Fading Echo"
The Analogy: Imagine a game of "Telephone" (or "Broken Telephone") played in a very large room. One person whispers a story to the next, who whispers it to the next, and so on.
In the real world, if you whisper a story, you might forget a rare word or a specific detail. If you pass that story on, the next person forgets a little more. Eventually, the story becomes very generic. The weird, unique, and rare details disappear first.
In the AI Library:
- The Problem: AI models are trained on a finite amount of text. When they generate new text, they are essentially "whispering" what they learned.
- The Result: Rare words and complex, unique phrases are the first to vanish. They are like the rare coins in a jar; if you keep scooping out a handful and putting them back, eventually, you might accidentally scoop out all the rare coins and leave only the common ones.
- The Outcome: The library becomes "shallow." It still has words, but it loses the deep, complex, and rare structures that make human language rich. The AI starts repeating the same safe, common patterns because the "rare" options have drifted away into nothingness.
2. The Force of Selection: The "Editor's Filter"
The Analogy: Now, imagine the library has a strict librarian (the "Editor").
- Scenario A (Descriptive Selection): The librarian just copies whatever is written and puts it on the shelf, regardless of quality. This is like the "Drift" scenario above—the library slowly becomes boring and repetitive.
- Scenario B (Normative Selection): The librarian is picky. They only put books on the shelf if they are correct, novel, or high-quality. They throw away the boring, repetitive, or wrong answers.
In the AI Library:
- The Good News: If the AI is forced to pass a "test" (like a math check or a code verification) before its text is published, it keeps the deep structure alive. The "rare" and "complex" ideas survive because they are the only ones that pass the test.
- The Bad News: If the librarian just copies what is popular (the "status quo"), the library collapses into a shallow state where no amount of "thinking ahead" helps.
The Big Discovery: Two Different Futures
The paper proves mathematically that the future of AI text depends entirely on how we filter what gets published.
Future 1: The "Model Collapse" (The Shallow Pool)
If we just let AI generate text and feed it back to itself without strict quality checks, the library becomes a shallow pool.
- What it looks like: The AI can still write sentences, but they lack depth. It's like a song that only has a simple, repetitive beat.
- Why it happens: The "Drift" force wins. Rare ideas die out, and the AI gets stuck in a loop of generating the most statistically probable (and boring) words.
- The Catch: Even if you give the AI a longer memory (a bigger brain), it can't help. The information it needs to use that memory has already been deleted from the library.
Future 2: The "Deep Structure" (The Rich Garden)
If we use Normative Selection (checking for truth, logic, or creativity), the library remains a rich garden.
- What it looks like: The AI continues to produce complex, deep, and surprising text.
- Why it happens: The "Selection" force acts like a gardener, pruning the weeds (bad text) and keeping the rare flowers (complex ideas).
- The Result: The AI keeps getting better at "thinking ahead" because the deep structures it needs to learn are preserved in the library.
The "Lookahead" Metaphor
The paper uses a concept called "Lookahead." Imagine you are walking through a maze.
- Shallow AI: Takes one step at a time, looking only at the tile right in front of it. It often walks into dead ends.
- Deep AI: Looks 5 steps ahead. It sees the dead end and chooses a different path.
The paper shows that if the library is "shallow" (due to Drift), looking 5 steps ahead is useless because the map is broken. But if the library is "deep" (due to good Selection), looking ahead is powerful and keeps the system stable.
Why This Matters for You
This isn't just about math; it's about the future of the internet and AI.
- If we are careless: If we just let AI write everything and feed it back to itself without checking for quality, we risk creating a "Model Collapse." The internet could become a hall of mirrors, reflecting only the most generic, repetitive, and shallow ideas. We might lose the ability to learn from complex human thought.
- If we are careful: If we build systems that verify facts, check for logic, and reward creativity (Normative Selection), we can sustain a rich, deep, and evolving digital culture. The AI can continue to learn from the best parts of human knowledge.
The Bottom Line
The paper is a warning and a guide. It tells us that recycling AI text is dangerous unless we filter it strictly.
- Drift is the natural tendency for things to become simple and boring over time.
- Selection is the human (or AI) effort to keep things complex, true, and interesting.
To keep our digital future rich, we must ensure that the "Editor" in the loop is strict enough to stop the library from becoming a shallow pool of repetitive noise.
Get papers like this in your inbox
Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.