When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

Imagine a room full of people trying to agree on a name for a mysterious object they've never seen before. There is no "right" answer, and no one has a favorite name to begin with. They just start guessing.

This paper asks a scary but fascinating question: If everyone in the room eventually agrees on one name, does that mean they figured out the truth together? Or did they just get lucky?

The authors, led by Hidenori Tanaka, discovered that in groups powered by Large Language Models (LLMs), the answer is often: It's a lottery.

Here is the story of their discovery, broken down into simple concepts and analogies.

1. The "Echo Chamber" Effect (Mutual In-Context Learning)

Usually, when a computer learns, it reads a massive library of books (data) that never changes. But in a group of AI agents talking to each other, the "library" is made up of what the other agents just said.

The Analogy: Imagine a game of "Telephone," but everyone is writing down what they hear and then reading it back to the group.
The Mechanism: If Agent A randomly guesses "Blue" for the object, Agent B hears "Blue," thinks, "Oh, maybe it is Blue," and starts leaning that way. Then Agent C hears Agent B say "Blue" and leans even harder.
The Result: A random, tiny guess by one person can snowball into a group-wide belief, even if the object has nothing to do with the color blue. The group isn't reasoning; it's just amplifying a random noise.

2. The "Drift" vs. The "Selection"

The authors use a concept from biology called Genetic Drift to explain this.

Drift (The Lottery): In a small group, random chance rules. If you flip a coin 10 times, you might get 8 heads just by luck. In a small AI group, a random "head" (a random label) can take over the whole population just because of bad luck.
Selection (The Meritocracy): In a huge group, random noise gets washed out. If you flip a coin 10,000 times, you'll get close to 50/50. If there is a real reason to prefer one label (a "bias"), a large group will eventually find it.

The paper shows that AI groups often get stuck in the Drift phase. They reach a consensus, but it's a "consensus of chance," not a "consensus of truth."

3. The Three Levers of Control

The authors built a simple math model (called Quantized Simplex Gossip, or QSG) to predict when the group will act like a lottery and when it will act like a smart team. They found three knobs you can turn:

Group Size (N):
- Small Group: Like a small town. One person's weird idea can take over the whole town quickly. High risk of a "lottery win."
- Large Group: Like a big city. It's harder for one random idea to spread everywhere. The group is more stable.
Communication Bandwidth (m):
- Low Bandwidth (Hard): Imagine agents can only say one word at a time. This is noisy. "Cat" or "Dog"? It's a guess. This creates high drift.
- High Bandwidth (Soft/Top-m): Imagine agents can send a whole sentence or a list of options. This is clearer. The "noise" is reduced, and the group is less likely to drift randomly.
Adaptation Speed (α):
- Fast Learners: If agents change their minds instantly after hearing one person, the group swings wildly like a pendulum.
- Slow Learners: If they take time to think, the random noise averages out, and the group moves more steadily.

4. The "Tipping Point"

The most important finding is the Crossover.

If your group is small, or they talk in short, noisy bursts, the outcome is a lottery. The winner is just the label that got lucky first.
If you make the group bigger, or let them communicate more clearly, the "lottery" stops. Then, if there is a slight bias (e.g., one label is slightly easier to say), the group will actually find that bias and agree on it.

Why Should You Care?

We are starting to use AI groups to make big decisions in law, finance, and science. We assume that if 100 AIs agree on a stock price or a medical diagnosis, they have "reasoned" their way there.

This paper warns us: They might just be playing a game of chance. If the group is too small or the communication is too noisy, the "collective intelligence" is actually just amplified randomness.

The Takeaway

Think of an AI group like a crowd of people trying to pick a song for a party.

If the room is small and everyone is shouting one word at a time, the song they pick might just be the first one someone guessed.
If the room is huge and everyone can send detailed playlists, they are more likely to find a song everyone actually likes.

Until we understand these "lottery" mechanics, we can't be sure if our AI societies are wise, or just lucky.

1. Problem Statement

As Large Language Model (LLM) populations are increasingly deployed in multi-agent systems for consequential tasks (law, finance, policy), a critical question arises: When a population of LLMs reaches a consensus, does it reflect collective reasoning, systematic bias, or mere stochastic chance?

Recent studies on "naming games" (where agents agree on labels for shared referents) show that LLM populations can spontaneously break symmetry and reach consensus even when no individual agent has a prior preference for any label. The paper seeks to isolate the microscopic mechanisms driving this phenomenon, distinguishing between:

Selection: Systematic biases or external rewards driving consensus.
Memetic Drift: Consensus driven purely by stochastic sampling noise amplified through interaction, effectively making the outcome a "lottery."

2. Methodology

The authors introduce a minimal, analytically tractable model called Quantized Simplex Gossip (QSG) to study these dynamics.

The QSG Model

State Representation: Each agent $i$ maintains an internal belief state $x_i$ as a probability distribution over $K$ labels (a point on the probability simplex $\Delta_{K-1}$ ).
Interaction Protocol:
1. A random speaker-listener pair is selected.
2. Quantized Communication: The speaker samples a discrete message $y$ $y$ from its internal distribution $x_S$ $x_{S}$ . This introduces "quantization noise."
  - Hard ( $m=1$ ): Samples a single label.
  - Top- $m$ ( $m>1$ ): Samples $m$ labels and transmits their empirical distribution.
  - Soft ( $m=\infty$ ): Transmits the full distribution (no quantization noise).
3. In-Context Adaptation: The listener updates its belief $x_L$ toward the received message $y$ via a convex combination: $x'_L = (1-\alpha)x_L + \alpha y$ , where $\alpha$ is the adaptation rate.
Mechanism: The core insight is Mutual In-Context Learning. Unlike standard in-context learning where an agent learns from a fixed external distribution, here agents learn from each other's sampled outputs. An arbitrary early sample becomes evidence for the next agent, compounding into population-wide agreement.

Analytical Framework

The authors analyze the system using macroscopic observables:

Polarization ( $U$ ): $U = \|\bar{x}\|_2^2$ , measuring collective alignment (1/K at symmetry, 1 at consensus).
Disagreement Energy ( $V$ ): $V = \sum \|\bar{x} - x_i\|_2^2$ , measuring heterogeneity.
Drift vs. Selection: They derive the expected change in polarization ( $\Delta U$ ) to separate the effects of sampling variance (drift) from systematic bias (selection).

3. Key Contributions

Identification of Memetic Drift: The paper formalizes the regime where consensus arises from sampling-driven noise rather than reasoning. In a neutral setting (no prior bias), the "winner" is determined by which random sample happens to be amplified first, analogous to neutral evolution in population genetics.
Quantized Simplex Gossip (QSG): A minimal model that captures the interplay between continuous internal beliefs and discrete, quantized communication. It serves as a "null model" for multi-agent coordination.
Scaling Laws: The authors derive precise scaling laws governing the transition from drift-dominated to selection-dominated regimes:
- Drift Strength: Scales as $\frac{1}{mN^2}$ , where $N$ is population size and $m$ is communication bandwidth.
- Consensus Time: Scales as $N^2$ in interaction steps (or linearly $N$ in population rounds).
- Crossover Parameter: A dimensionless parameter $\Gamma_h \equiv \frac{mN}{\alpha} h$ determines the outcome. If $\Gamma_h \ll 1$ , the system is drift-dominated (lottery); if $\Gamma_h \gg 1$ , weak biases are amplified (selection).
Empirical Validation: The theoretical predictions are validated through:
- Simulations of the QSG model.
- Experiments with real LLM populations (GPT-4o and Claude Haiku 4.5) in naming-game settings.

4. Key Results

Symmetry Breaking via Noise: In the "Hard" sampling regime ( $m=1$ ), the symmetric state is unstable. The variance injected by quantized communication drives the system toward consensus even without external bias.
Bandwidth Suppression: Increasing the bandwidth $m$ (sending more labels per interaction) reduces the sampling variance by a factor of $1/m$ , thereby suppressing drift and making the system more stable against random fluctuations.
Population Size Effects:
- Small Populations: Drift dominates. Consensus is effectively a lottery; the winning label is random.
- Large Populations: Selection dominates. Weak systematic biases (even tiny asymmetries in the model or prompt) are amplified, leading to deterministic outcomes.
LLM Experiments:
- Experiments with GPT-4o and Claude Haiku 4.5 confirmed the $1/N^2$ scaling for early drift and the $N^2$ scaling for time-to-consensus.
- The "Top- $m$ " experiments confirmed that increasing message size ( $m$ ) linearly reduces the drift strength, matching the theoretical $1/m$ prediction.

5. Significance and Implications

Redefining Collective Intelligence: The paper argues that consensus in LLM populations is not inherently evidence of collective reasoning or truth aggregation. It can be a statistical artifact of "memetic drift," where the first random fluctuation wins.
Safety and Alignment: This has profound implications for AI safety. Harmful collective representations or biases could emerge not because agents are "biased" individually, but because the interaction dynamics amplify random noise or minor strategic signals.
Physics of Social Representation: The work proposes a "physics of artificial intelligence" approach, using minimal models and scaling laws to understand complex multi-agent behaviors. It suggests that synthetic games combined with minimal models can provide a rigorous framework for analyzing social dynamics in AI, similar to statistical mechanics in physics.
Design Guidelines: For developers of multi-agent systems, the results suggest that to avoid "lottery" outcomes, one must either increase communication bandwidth ( $m$ ), reduce adaptation rates ( $\alpha$ ), or ensure that population sizes are large enough to suppress drift if specific biases are desired, or conversely, that small populations are highly susceptible to random consensus.

In summary, the paper provides a mathematical foundation for understanding how LLM populations form conventions, revealing that without careful control of population size and communication bandwidth, "collective intelligence" may simply be amplified sampling noise.