Asymmetric Reinforcement Learning Explains Human Choice Patterns in Decision-making Under Risk

This study demonstrates that an asymmetric Risk Sensitive reinforcement learning model, which differentially weights rewards and losses, provides a superior explanation for human choice patterns and response times in decision-making under risk compared to symmetric learning approaches.

Original authors: Shahdoust, N., Cowan, R. L., Price, T. A., Davis, T. S., Liu, A., Rabinovich, R., Zarr, V., Libowitz, M. R., Shofty, B., Rahimpour, S., Borisyuk, A., Smith, E. H.

Published 2026-03-11
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: How We Learn from Wins and Losses

Imagine you are playing a game where you have to guess if a hidden card is higher or lower than the one you are holding. Sometimes you win a dollar, and sometimes you lose a dollar.

Scientists have long debated a simple question: When we learn from these games, do we treat winning and losing exactly the same way?

  • The Old Theory (Symmetric Learning): This theory says our brains are like a perfectly balanced scale. If you win $1, your brain says, "Great, do that again!" If you lose $1, your brain says, "Bad, don't do that!" The weight of the win and the loss is identical.
  • The New Theory (Asymmetric Learning): This paper suggests our brains are more like a biased scale. We might learn much faster from a win than from a loss, or vice versa. We might ignore small losses but get super excited about big wins.

This study set out to find out which theory is actually true.


The Experiment: The "Starling" Card Game

The researchers created a new game called the Starling Task. Here's how it worked:

  1. The Setup: You see a card (say, a 4). You have to guess if a hidden opponent's card is higher or lower.
  2. The Twist: The deck of cards isn't always fair.
    • Uniform Deck: All numbers (1–9) are equally likely.
    • Low Deck: Mostly low numbers (1, 2, 3).
    • High Deck: Mostly high numbers (7, 8, 9).
  3. The Challenge: Sometimes the deck stays the same for a long time (so you learn the pattern). Other times, the deck changes every single turn, and you have to pay attention to a color clue to know which deck you are in.

47 people played this game (some were healthy volunteers, and some were patients with epilepsy who were already in the hospital for brain monitoring). They played hundreds of rounds, earning or losing fake money.


The Detective Work: Testing the Models

The researchers didn't just watch the people; they built five different computer "brains" (mathematical models) to see which one could best predict what the humans would do next.

  1. The "Win-Stay, Lose-Shift" Robot: A simple robot that just repeats a move if it wins and changes it if it loses. (Like a toddler learning to walk).
  2. The "Greedy" Robot: Always picks the option it thinks is best, but occasionally tries something random just to be safe.
  3. The "Smooth" Robot: Picks the best option but mixes in a little bit of randomness, like a smooth curve.
  4. The "Double-Tracker" Robot: Keeps two separate scorecards: one for "How much money did I make?" and one for "How risky was this?"
  5. The "Risk-Sensitive" (RS) Robot: This is the star of the show. It learns asymmetrically. It has two different "learning speeds": one speed for when it wins, and a different speed for when it loses.

The Results: The "Risk-Sensitive" Robot Wins

After running the numbers, the Risk-Sensitive (RS) Robot was the clear winner. It predicted human behavior better than any other model.

What does this mean?
It means that when humans make decisions under risk, we do not treat wins and losses equally. We update our expectations differently depending on whether the outcome was good or bad.

  • The Analogy: Imagine you are learning to cook.
    • If you burn a steak (a loss), you might think, "Okay, I'll lower the heat next time," but you might not remember the exact temperature perfectly.
    • If you cook a perfect steak (a win), you might think, "I'm a genius! I'll definitely do this again!" and remember the exact temperature very clearly.
    • The study suggests our brains work like this: we are asymmetric learners. We don't just add and subtract points on a scoreboard; we weigh the emotional impact of the win differently than the loss.

Why Does This Matter?

1. It explains why we make "weird" choices.
Sometimes people take huge risks because they remember the big wins vividly but forget the small losses. This model explains that behavior perfectly.

2. It helps us understand mental health.
The paper mentions that people with gambling disorders or addiction often have "broken" learning systems. Maybe their "loss learning speed" is too slow, so they keep playing even after losing money because they aren't updating their brain fast enough to realize it's a bad idea. This new model gives doctors a better tool to understand and treat these conditions.

3. It works for everyone.
Interestingly, the study found that people with epilepsy played the game just as well as healthy people. The only difference was that the epilepsy patients were slightly slower to press the buttons. This tells us that the logic of how we learn (the "software") is the same for everyone, even if the speed of our reaction (the "hardware") varies.

The Takeaway

Human decision-making isn't a cold, mathematical calculation where +1 and -1 cancel each other out. Instead, it's a dynamic process where wins and losses hit us with different weights.

We are not perfect calculators; we are Risk-Sensitive Learners. We learn faster from some outcomes than others, and that asymmetry is actually the key to understanding how we navigate a risky world.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →