The Illusion of Collusion

This paper demonstrates that competing algorithmic agents using multi-armed bandit learning can spontaneously develop "naive collusion" through action synchronicity, even without knowledge of competitors, with the likelihood of such outcomes depending critically on whether the agents employ deterministic, greedy-in-the-limit, or persistently random policies.

Connor Douglas, Foster Provost, Arun Sundararajan

Published Tue, 10 Ma
📖 6 min read🧠 Deep dive

Here is an explanation of the paper "The Illusion of Collusion," translated into simple, everyday language with some creative analogies.

The Big Idea: When Robots "Accidentally" Cheat

Imagine two lemonade stands on the same street. They are owned by two different people, and they have never spoken to each other. They don't know the other exists. They are just two computers running software that says, "Try to make the most money possible."

The scary question this paper asks is: Can these two computers, acting completely independently, accidentally figure out that they should both raise their prices to $10 a cup, even though they could sell for $5?

The answer is yes. And the scary part is that they don't need to talk, plot, or even know they are competitors. They just need to use the right (or wrong) kind of learning software.

The authors call this "Naive Collusion." It's like two strangers walking into a room, sitting down, and somehow ending up standing in perfect unison without ever saying a word.


The Game: The Prisoner's Dilemma (A.K.A. The "Price War")

To study this, the researchers put these AI agents into a classic game called the Prisoner's Dilemma. Think of it as a game of "Cooperate or Betray."

  • Option A (Cooperate/High Price): Both charge high prices. Everyone makes a lot of money.
  • Option B (Betray/Low Price): One charges low, the other high. The low-price seller steals all the customers and makes a fortune; the high-price seller makes nothing.
  • The Trap: If both try to be "smart" and charge low prices to steal customers, they end up in a price war where both make very little money.

In a normal human scenario, if two people play this game forever, they might figure out, "Hey, if we both charge high, we're both rich. If we fight, we're both poor." But that usually requires trust or communication.

The Twist: In this paper, the AI agents are "naive." They are blind. They don't see the other player. They only see their own receipt at the end of the day. They have no idea they are playing a game against someone else. They just think, "I tried a high price, I made $5. I tried a low price, I made $2. I'll stick with high."


The Secret Ingredient: Randomness vs. Determinism

The paper discovers that whether these blind robots end up colluding (charging high prices) or competing (charging low prices) depends entirely on how much randomness is built into their software.

Think of the algorithms as two types of students taking a test:

1. The "Random Explorer" (Persistently Random)

  • The Analogy: Imagine a student who is so curious they randomly guess answers just to see what happens, even after they think they know the right answer. They never stop guessing.
  • The Result: These agents never collude. Because they keep randomly trying the "low price" option just to be sure, they constantly disrupt any pattern. They end up in a price war, which is actually good for the consumer (low prices).
  • The Catch: While this is good for consumers, it's bad for the companies. These algorithms are "sub-optimal" because they keep making mistakes just to learn. A smart business wouldn't want a robot that randomly lowers prices just to "explore."

2. The "Greedy Learner" (Greedy-in-the-Limit)

  • The Analogy: Imagine a student who tries a few answers at the start, sees what works, and then never deviates from the best answer they found. Once they think they have the winning formula, they stick to it rigidly.
  • The Result: This is where things get tricky. Sometimes they compete, and sometimes they accidentally collude.
    • If the "exploration" phase is short, they might get lucky and both stumble onto the "High Price" strategy at the same time. Once they both lock onto it, they stay there forever.
    • It's like two people walking into a dark room. If they both happen to step on the same "High Price" button at the same time, they might both decide, "Oh, this is the best spot!" and never move again.

3. The "Perfect Robot" (Deterministic)

  • The Analogy: Imagine a robot that follows a strict, mathematical formula with zero randomness. If the input is the same, the output is always the same.
  • The Result: They always collude.
    • Because they are identical and follow the exact same math, they will always make the exact same moves at the exact same time.
    • If they both try "High Price" and it works, they both lock onto it. If they both try "Low Price" and it fails, they both switch to "High Price" together.
    • It's like two dancers who have memorized the exact same choreography. They will never miss a beat, and they will inevitably end up dancing in perfect, synchronized harmony (charging high prices).

The "Synchronicity" Problem

The paper introduces a new concept called Synchronicity.

Imagine two people flipping coins.

  • If they flip randomly, sometimes they match, sometimes they don't.
  • If they are "synchronized," they flip Heads at the exact same time, over and over again.

The researchers found that collusion happens when the robots get "synchronized."

  • If the robots are too random, they never sync up.
  • If the robots are too rigid (deterministic), they sync up too perfectly, locking into a high-price agreement.
  • The "sweet spot" for collusion is when they are mostly rigid but had a little bit of randomness early on that accidentally aligned them.

Why This Matters for You (The Consumer)

This has huge implications for antitrust laws (the rules against companies cheating on prices).

  1. You can't just ban "talking": Regulators often look for evidence that companies are talking to each other to fix prices. This paper says, "Bad news. They don't need to talk." Two companies can buy the same "off-the-shelf" pricing software from a vendor, and that software might naturally teach them to charge high prices without them ever exchanging a single email.
  2. The "Naive" Defense: Companies might say, "We didn't mean to collude! Our robots just learned it on their own!" The paper suggests that for certain types of algorithms, this isn't just an excuse; it's a mathematical certainty.
  3. Symmetry is Dangerous: If two competitors use the exact same algorithm (symmetry), they are much more likely to end up in a price-fixing trap.

The Bottom Line

The paper warns us that AI is not just a tool; it's a player.

If we let AI agents learn how to price things on their own, using standard "textbook" algorithms, we might accidentally create a market where prices stay high, not because of a conspiracy, but because the math of the software forces them to dance in perfect, expensive unison.

The takeaway: To prevent this, we might need to force companies to use "messy," random algorithms that prevent perfect synchronization, even if those algorithms are slightly less efficient for the companies. It's a trade-off between corporate efficiency and fair market prices.