The Rise and Fall of GG in AGI

This paper applies psychometric principal component analysis to LLM benchmark data from 2019–2025, revealing that while a strong "general intelligence" factor initially dominated model performance, its explanatory power is declining as models increasingly specialize in reasoning and outsource tasks to tools, creating a complex hierarchy of capabilities that inverts the ideal of parsimonious mechanisms.

David C. Krakauer

Published 2026-04-14
📖 6 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: Is AI Getting "Smarter" or Just "Different"?

Imagine you are a teacher trying to measure the "general intelligence" of a class of students. You give them a test with math problems, history questions, and coding puzzles.

For a long time, you noticed a pattern: If a student is good at math, they are usually good at history and coding too. In psychology, this is called the "Positive Manifold." It suggests there is one big, underlying engine of intelligence (called gg) that powers all these skills. If you boost that engine, the student gets better at everything at once.

This paper asks: Does this same pattern exist in Artificial Intelligence (AI)? And if so, is it still true today?

The author, David Krakauer, says: "Yes, it used to. But now, it's changing."


Part 1: The "Rise" – The Era of the Giant Engine (2019–2024)

The Analogy: The "Super-Student" who just studies harder.

In the early days of Large Language Models (LLMs), the recipe for success was simple: Make the model bigger and feed it more data.

  • What happened: Every time a new AI was released, it got better at everything. If it got 10% better at math, it also got 10% better at writing poetry and solving logic puzzles.
  • The Result: The AI behaved like a perfect "Super-Student." There was a single "General Intelligence" score (let's call it GG) that predicted how well the AI would do on any test.
  • The Metaphor: Imagine a car engine. In this era, everyone just kept adding more cylinders to the engine. The car got faster, and it got faster at every task (driving, drifting, towing) simultaneously. The "General Intelligence" was very strong and dominant.

The Paper's Finding: During this time, the "General Intelligence" factor explained about 90% of the differences between models. It was a one-dimensional world: Bigger = Better at Everything.


Part 2: The "Fall" – The Era of Specialization (Late 2024–Present)

The Analogy: The "Tool-Using" Specialist.

Recently, AI labs stopped just making "bigger" models. They started building models that use tools (like calculators, web search, and code interpreters) and models that are specifically trained to reason step-by-step.

  • What happened: The AI landscape got messy. Some models became amazing at deep reasoning (solving hard physics problems) but stopped being as good at simple tasks. Others became great at writing code but less good at general knowledge.
  • The Result: The single "Super-Student" engine cracked. The "General Intelligence" score (GG) dropped from explaining 90% of the variance down to about 77%.
  • The Metaphor: The car engine didn't just get bigger; it got re-engineered. Now, you have:
    • Race Cars: Incredible at speed (reasoning), but bad at hauling cargo.
    • Trucks: Incredible at hauling (execution), but slow on the track.
    • Hybrids: They use external tools (like a tow truck) to do the heavy lifting.

The paper calls this the "Rise and Fall of G." The "Fall" isn't that AI is getting dumber; it's that AI is becoming more diverse. The single "General Intelligence" factor is no longer the whole story.


Part 3: The "Hedgehog" vs. The "Fox"

The paper uses a famous metaphor from the philosopher Isaiah Berlin to describe what's happening inside the AI:

  • The Hedgehog (The Old AI): "Knows one big thing." In the early days, the AI was a Hedgehog. It had one giant brain (General Intelligence) that did everything.
  • The Fox (The New AI): "Knows many things." The new AI is a Fox. It is actually a collection of many different specialized skills working together.

The Twist: The paper argues that the "General Intelligence" we see today is actually a mask. It hides the fact that the AI is becoming a "Society of Minds."

  • When you remove the "General" part of the score, you see that Reasoning and Execution (doing things) are actually fighting each other.
  • If an AI gets really good at deep reasoning, it often gets slightly worse at simple execution, and vice versa. They are trading off against each other.

Part 4: The "Ptolemaic Succession" – Why We Keep Adding More Tests

The Analogy: The Geocentric Universe.

In ancient astronomy, people thought the Earth was the center of the universe. When planets moved in weird ways, they didn't change the theory; they just added a new circle (an "epicycle") to explain the movement. The model got more and more complicated, but it never found a simple law.

The paper argues AI benchmarking is doing the same thing:

  1. AI learns to use a tool (like a calculator).
  2. We create a new test to measure that.
  3. AI learns to use a web browser.
  4. We create another test.

We are just adding more "epicycles" (more tests) to keep up with the AI. The paper suggests we need a Newtonian approach: Find the simple, underlying laws of how these tools and models work together, rather than just testing them on isolated tasks.

The Final Takeaway: Intelligence is a Team Sport

The most important conclusion of the paper is this:

Intelligence is no longer just about the "brain" (the model); it's about the "brain + the tools."

  • Old View: Intelligence is a property of the AI itself.
  • New View: Intelligence is a property of the AI + its tools (calculators, search engines, code).

Just as a human isn't "smart" because of their brain alone, but because they have access to libraries, the internet, and writing, AI is becoming smart because it can use external tools.

In short: The era of the "One-Size-Fits-All" Super-Intelligence is fading. We are entering an era of specialized, tool-using intelligences that are more complex, more diverse, and perhaps more "human" in their ability to outsource work to tools. The "General Intelligence" score is dropping not because AI is failing, but because it is finally learning to be a Fox instead of a Hedgehog.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →