Modeling the dynamics of social exchange in groups with reinforcement learning and Theory of Mind

This study demonstrates that human resource sharing in groups is driven by a strategic balance between maintaining stable partnerships and exploring alternatives, a dynamic process best explained by a reinforcement learning model incorporating Theory of Mind rather than simple reciprocity or fairness concerns.

Zhang, S., Wang, H., Mendoza, R. B.

Published 2026-03-27
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: The "Token Party" Game

Imagine you are at a party with two friends. Every few minutes, the host hands you a single cookie (a "token"). Your job is to give that cookie to one of your two friends. You can't keep it for yourself.

The goal of the study was to figure out how people decide who gets the cookie when they are doing this over and over again for 90 rounds. Do you just give it back to the person who gave you a cookie last time? Do you try to be perfectly fair? Or do you have a secret strategy?

The researchers found that humans are actually quite clever strategists. They don't just follow simple rules; they use a kind of "mental simulation" to guess what others are thinking.


The Three Experiments: Testing the Rules

The researchers ran three different versions of this cookie game to see what drives our decisions.

Experiment 1: The Basic Cookie Game

The Setup: You get a cookie, give it to Friend A or Friend B.
The Result: People didn't just stick to one friend. They tended to alternate. If you gave a cookie to Friend A last time, you were likely to give it to Friend B this time.
The Twist: However, if Friend A gave you a cookie back immediately after you gave them one, you were less likely to switch. You stayed loyal to that friend.
The Lesson: We like to spread things out (alternating), but we also like to reward people who are nice to us (reciprocity).

Experiment 2: The "Bad Luck" Cookie Game

The Setup: Same as before, but with a catch. After you give a cookie, the host randomly takes a cookie away from one of your friends. It's pure bad luck; it has nothing to do with you.
The Question: If you are a "fair" person, you might think, "Oh, Friend A just lost a cookie by bad luck. I should give them my next cookie to balance things out."
The Result: Nope. People did the opposite. If Friend A lost a cookie, you were more likely to give your next cookie to Friend B.
The Lesson: This proves that people aren't trying to be "fair" in the sense of fixing losses. They are being strategic. They realized that giving to the person who just lost a cookie might not be the best move for their own long-term relationships. They are playing the long game, not just balancing the books.

Experiment 3: The "Mind Reader" Game

The Setup: This was the big test. While you are deciding who to give your cookie to, your two friends have to guess who you are going to pick.
The Question: Can people actually predict what others will do?
The Result: Yes! People were surprisingly good at guessing. They could predict that their friends would likely switch targets.
The Big Discovery: The only computer model that could explain why people were so good at guessing was a model that assumed everyone is simulating everyone else's brain.

  • Simple Model: "I give to whoever gave me a cookie last time." (This failed to explain the guessing).
  • The "Mind Reader" Model: "I think Friend A is thinking, 'Friend B is going to switch targets, so I should switch too.'" (This worked perfectly).

The Core Concept: "Theory of Mind" (ToM)

The paper's main conclusion is about Theory of Mind. This is a fancy psychological term for the ability to imagine what is going on inside someone else's head.

Think of it like a game of chess:

  • Level 1 (Simple): "I move my pawn here."
  • Level 2 (Smart): "If I move my pawn here, my opponent will move their knight there."
  • Level 3 (The Winner of this Study): "If I move my pawn here, my opponent will think I am trying to trap them, so they will move their knight to the side to avoid the trap."

The researchers found that when we share resources (like cookies, money, or favors) in a group, we aren't just reacting to the past. We are constantly running a mental simulation of how our actions will change how others think and act in the future.

Why This Matters

  1. We aren't robots: We don't just follow a "tit-for-tat" rule (I scratch your back, you scratch mine). We are constantly exploring new options and balancing stability with change.
  2. We are strategic, not just "fair": We don't share to make everyone equal; we share to build the best possible network of relationships.
  3. Mentalizing is key: The most successful way to navigate social groups is to understand that everyone else is also trying to figure out what you are going to do.

In a nutshell: Social exchange is like a complex dance where everyone is trying to predict the other person's next step. The people who win the dance aren't the ones who just follow the music; they are the ones who can imagine what the other dancers are thinking.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →