Imagine you are trying to solve a mystery, but you don't have a detective to check the clues. Instead, you decide to ask 100 different people for their opinions. You figure that if 90 of them say the same thing, they must be right. This is the idea behind "Crowd Wisdom": the belief that a group of imperfect people, when combined, can cancel out individual mistakes and reveal the truth.
This paper argues that this strategy fails completely when talking to Large Language Models (LLMs), like the AI you might use today.
Here is the breakdown of why, using some simple analogies:
1. The "Echo Chamber" Problem
In a real crowd of humans, people have different life experiences. If you ask 100 people about the capital of France, some might guess, but they won't all guess the same wrong answer by accident. Their errors are random, so the right answer usually wins out.
But AI models are different. They are all trained on the same massive piles of internet data (like Wikipedia, Reddit, and news sites). They are taught by similar methods and optimized to do similar things.
- The Analogy: Imagine asking 100 students who all studied from the exact same textbook, which happened to have a typo on page 50. If you ask them a question based on that page, they won't give you a variety of wrong answers. They will all confidently give you the same wrong answer.
- The Result: When you ask an AI for 100 different answers, you aren't getting 100 different opinions. You are getting the same opinion repeated 100 times, just with slightly different wording. The "crowd" is just an echo chamber.
2. The "Confident Fool"
The paper tested a common trick: asking the AI, "How sure are you?" The hope was that if the AI is very confident, it's probably right.
- The Analogy: Think of a student who memorized the wrong answer but is very loud and confident about it. In a classroom, their confidence might convince the teacher they are right.
- The Result: The paper found that AI models are often very confident when they are wrong. Because they are trained to sound helpful and agreeable, they will confidently repeat a shared misconception. Asking for "confidence" doesn't help filter out the truth; it just amplifies the loudest (and potentially wrong) voice.
3. The "Random String" Test
To prove that the models were just "thinking alike" because of their training, the researchers did a crazy experiment. They gave the models a string of random, nonsense characters (like gP%!mdq4k') and asked them to pick an answer (A, B, C, or D).
- The Logic: There is no "truth" here. It's pure nonsense. If the models were truly independent, their answers should be random and scattered.
- The Result: Even with nonsense, the models still agreed with each other more than chance would allow. This proves that the models have shared "biases" built into their brains (their code and weights). They have a shared "gut feeling" that isn't based on facts, but on how they were built.
4. The "Math vs. Opinion" Difference
The paper acknowledges that "asking the crowd" does work in math or coding.
- The Analogy: If you ask 100 people to solve
2 + 2, and you have a calculator to check the answers, you can easily throw out the 99 people who said "5" and keep the one who said "4". The calculator is the Verifier. - The Problem: In real life, truth (like "What will the economy look like in 2030?" or "Is this news story true?") doesn't have a calculator. You can't just run a code check to see if an opinion is right. Without that external check, the AI's "crowd" just reinforces its own mistakes.
The Big Takeaway
The authors conclude that more computing power does not equal more truth if you don't have a way to verify the answer.
- If you have a verifier (like a math checker): Asking the AI to try 1,000 times is great. It gives you 1,000 chances to find the one right answer.
- If you have NO verifier (like asking about facts or opinions): Asking the AI to try 1,000 times is useless. It just gives you the same wrong answer 1,000 times, but with 1,000 times more confidence.
In short: You cannot fix a broken compass by asking 100 broken compasses to point in the same direction. If they are all broken in the same way, they will all point to the wrong North, and the group will be wrong together. To find the truth, you need an external map (a verifier), not just a bigger crowd.