Evolution of cooperation with Q-learning: the impact of information perception

This study employs Q-learning in a Prisoner's Dilemma framework to demonstrate that varying information perception structures, particularly asymmetric information, critically shape the complex evolutionary dynamics and emergence of cooperation, offering new insights into human cooperative behavior.

Original authors: Guozhong Zheng, Zhenwei Ding, Jiqiang Zhang, Shengfeng Deng, Weiran Cai, Li Chen

Published 2026-02-04
📖 5 min read🧠 Deep dive

Original authors: Guozhong Zheng, Zhenwei Ding, Jiqiang Zhang, Shengfeng Deng, Weiran Cai, Li Chen

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you and a friend are playing a game where you both have to decide whether to be nice (Cooperate) or look out for yourself at the other's expense (Defect). This is the classic "Prisoner's Dilemma." If you both are nice, you both win a little. If you both look out for yourselves, you both lose a bit. But if one is nice and the other isn't, the "nice" one gets crushed, and the "selfish" one gets a huge reward.

Usually, scientists studying this game assume both players see the world exactly the same way. They both know what the other person did last time, or they both only know what they themselves did.

This paper asks a different question: What happens if the two players see the game differently? What if one player is watching their friend's moves, while the other player is only watching their own?

The researchers used a computer algorithm called "Q-learning" (think of it as a digital student that learns by trial and error, keeping a mental scorecard of what works and what doesn't) to simulate this. They tested three different "vision" setups:

  1. The "You and You" Team: Both players only watch what the other person does.
  2. The "Me and Me" Team: Both players only watch what they themselves do.
  3. The "You and Me" Team (Asymmetric): One player watches the other, while the other player only watches themselves.

Here is what they found, explained simply:

1. The "You and You" Team (Watching the Other)

When both players are only focused on what the other person is doing, the game is a mess. It's like two people trying to dance while staring only at each other's feet; they can't find a rhythm. They keep switching between being nice and being mean, but they never settle into a stable pattern of cooperation. Eventually, they usually give up and just look out for themselves.

2. The "Me and Me" Team (Watching Themselves)

When both players only focus on their own past actions, things are more stable, but they get stuck easily.

  • The Good: If the temptation to be mean is low, they can get stuck in a "happy loop" where they are both nice forever.
  • The Bad: If the temptation to be mean is high, they get stuck in a "sad loop" where they are both mean forever.
  • The Catch: Once they pick a loop (happy or sad), it's very hard to switch. It's like a train that has left the station; it's either going to the destination of "Friendship" or "Betrayal," and it rarely changes tracks once it starts.

3. The "You and Me" Team (The Mixed Vision)

This is where the magic happens. When one player watches the other, and the other watches themselves, the game becomes dynamic and surprisingly effective.

The researchers discovered a complex, three-part story that plays out over time:

  • Phase 1: The Honeymoon. The two players figure out that being nice works. They start cooperating.
  • Phase 2: The Breakup. One player (the one watching the other) starts to get greedy. They realize they can get a bigger reward by being mean while the other person is still being nice. They exploit their partner. The nice partner, confused but trying to be good, keeps being nice for a while (tolerance), but eventually gets hurt.
  • Phase 3: The Rebuild. The nice partner finally snaps. They decide to be mean too, just to teach the greedy partner a lesson. This "punishment" hurts the greedy player, who then realizes, "Hey, being mean isn't working anymore." The greedy player switches back to being nice. The cycle resets, and they build a stronger, more resilient cooperation than before.

The Big Takeaway

The most surprising finding is that this mixed vision (Asymmetric) setup actually leads to faster and stronger cooperation than the setups where everyone sees the world the same way.

Think of it like a relationship:

  • If you and your partner both only look at your own feelings, you might get stuck in a rut.
  • If you both only stare at each other, you might get anxious and unstable.
  • But if one of you is focused on the relationship (watching the other) and the other is focused on their own growth (watching themselves), you create a dynamic where you can forgive mistakes, learn from them, and build a stronger bond.

The paper concludes that how we perceive information matters more than we thought. The structure of what we know—and who knows what—determines whether we end up in a cycle of betrayal or a stable cycle of cooperation. The "mixed vision" creates a natural rhythm of trust, betrayal, punishment, and forgiveness that mirrors real human behavior, allowing cooperation to survive even when it's difficult.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →