Stimulus prior and reward probability differentially affect response bias in perceptual decision making

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are playing a video game where you have to guess whether a hidden object is a Red Ball or a Blue Ball. You can't see the object clearly; you only get a fuzzy hint. Your goal is to guess correctly to win a coin.

This paper is about how rats (our "players") learn to make these guesses when the game rules change. Specifically, the researchers wanted to know: Does it matter more if the game gives you Red Balls more often, or if it gives you more coins for guessing Red?

Here is the breakdown of their findings, using simple analogies.

The Setup: The Fuzzy Guessing Game

The researchers put rats in a box with two holes. They played two different sounds (like a high beep and a low beep).

Sound A meant the rat should poke the Left hole to get a water reward.
Sound B meant the rat should poke the Right hole.

The rats had to learn which sound meant which hole. But the researchers didn't just keep the rules static; they changed them to see how the rats adapted.

Experiment 1 & 2: The "Frequency" vs. The "Bonus"

The researchers ran two main tests:

The Frequency Test (Stimulus Prior): They made Sound A happen 80% of the time and Sound B only 20% of the time. The reward for a correct guess was the same for both.
- The Rat's Logic: "Wow, Sound A happens all the time! I should just guess Left almost every time, even if I'm not sure."
The Bonus Test (Reward Probability): They made Sound A and Sound B happen 50/50, but if you guessed correctly for Sound A, you got a big water reward. If you guessed Sound B correctly, you only got a tiny reward.
- The Rat's Logic: "Both sounds happen equally, but Sound A pays the bills! I should guess Left almost every time."

The Big Surprise:
You might think these two situations would make the rats behave the same way. After all, mathematically, the "best" strategy is the same in both cases.

But it wasn't.
When the researchers changed the Frequency (how often the sound happened), the rats adjusted their guesses slowly and moderately.
When they changed the Bonus (how much they got paid), the rats went crazy. They adjusted their guesses much faster and became much more extreme in their bias.

The Analogy:
Imagine you are a taxi driver.

Scenario A (Frequency): You notice that 80% of your passengers want to go to the Airport. You start driving toward the airport more often, but you still check your GPS carefully.
Scenario B (Bonus): You notice that 50% of passengers go to the Airport, but the Airport passengers tip $100, while the others tip $1. You immediately stop checking the GPS and just drive to the airport 99% of the time, ignoring the other passengers entirely.

The rats treated the Bonus as a much louder, more urgent signal than the Frequency.

Experiment 3: The Tug-of-War

In this experiment, the researchers pitted the two factors against each other.

They made Sound A happen 80% of the time (Frequency says: "Guess Left!").
BUT, they made the reward for Sound B 4x higher (Bonus says: "Guess Right!").

The Result: The rats ignored the frequency and followed the money. They guessed Right, even though Sound A was happening way more often. This proved that money (reward) trumps frequency in the rat brain.

The "Black Box" Problem: Why did the models fail?

The researchers tried to use three different computer models (mathematical formulas) to predict how the rats would behave.

Model 1 & 2 (The "Old School" Models): These assumed the rats just keep a simple scorecard of "How often did I get a reward?" They failed completely in Experiment 3. They couldn't explain why the rats ignored the frequency.
Model 3 (The "Learning" Model): This model tried to learn the value of every action. It worked okay for the Bonus test, but failed miserably when the Frequency changed.

The Conclusion:
The computer models failed because they assumed the rats were "dumb" calculators that only looked at the immediate reward. The researchers realized the rats are actually smart statisticians.

The rats aren't just counting rewards; they are keeping track of the whole game. They know, "Oh, the game is rigged to give me Sound A often," OR "The game is rigged to pay me more for Sound A."
The current computer models don't have a "memory" for the game's setup (the prior probabilities). They need to be upgraded to include a "mental map" of how the world works, not just how much money they just made.

Experiment 4 & 5: Does "How Often I Get Paid" Matter?

Finally, the researchers asked: "Does it matter if I get paid a lot of times in a row, or just rarely?" (This is called "Reward Density").

They tested if the rats learned faster when rewards were frequent vs. rare.
The Result: No. The rats learned at the same speed regardless of how often they got a reward. Whether the game was "high paying" or "low paying," the rats didn't speed up or slow down their learning.

The Takeaway

Money talks louder than frequency: If you want to change someone's mind, offering a bigger reward works much faster than just showing them something more often.
Rats are smarter than we thought: They don't just react to the last coin they got; they understand the "rules of the game" (the probability of events).
AI needs an upgrade: Our current computer models for decision-making are too simple. They need to be taught to understand the "context" or "background" of a situation, not just the immediate reward.

In short, the rats aren't just reacting to the present moment; they are building a mental model of the world, and rewards are the most powerful tool to update that model.

1. Problem Statement

Signal Detection Theory (SDT) is the standard framework for analyzing perceptual decisions, positing that subjects compare an evidence variable to a static decision criterion. While the theory assumes a static criterion, empirical evidence shows that the criterion fluctuates trial-by-trial and adapts to experimental manipulations, specifically Stimulus Presentation Probabilities (SPPs) and Reward Probabilities (RPs).

The core problem addressed is the lack of understanding regarding the mechanisms governing these trial-by-trial criterion changes. Specifically:

Do SPPs and RPs influence criterion learning to the same extent and via the same mechanisms?
Does Reward Density (global reward rate) affect the speed of criterion learning?
Can existing trial-by-trial learning models (based on SDT, the Matching Law, or Reinforcement Learning) capture the differential effects of these factors?

2. Methodology

Subjects and Task:

Subjects: Nine male rats.
Task: A two-stimulus, two-choice auditory discrimination task. Rats heard one of two chords (S1 or S2) and had to choose a left (R2) or right (R1) port. Correct choices were rewarded with water; incorrect choices resulted in a time-out.
Stimuli: Logarithmically spaced tone chords. Difficulty was adjusted to maintain a sensitivity ( $d'$ ) between 1.5 and 2.

Experimental Design (5 Experiments):

Exp 1 (SPP Varied): Asymmetric SPPs (ratios 1:4 to 4:1) with constant RPs (0.5).
Exp 2 (RP Varied): Constant SPPs (0.5) with asymmetric RPs (ratios 1:4 to 4:1).
Exp 3 (Opposing Manipulations): SPPs and RPs were manipulated in opposing directions (e.g., SPP 1:4 vs. RP 4:1) to test if their effects cancel out or if one dominates.
Exp 4 (Reward Density Varied): Symmetric SPPs and RPs (1:1), but the overall probability of reward varied (0.25, 0.5, 0.75, 1.0) to test global reward rate effects.
Exp 5 (Interaction): Combined asymmetric RPs with varying reward densities to test for interactions.

Modeling Approach:
The authors fitted three distinct trial-by-trial learning models to the data:

Modified KDB Model: Based on Kac (1962) and Dorfman et al. (1975). Assumes a static sensory distribution and a criterion that shifts by a fixed step ( $\Delta$ ) after rewarded trials, with a leaky integration parameter ( $\gamma$ ) to prevent drift.
DT Model: Based on Davison and Tustin (1978), combining SDT and the Matching Law. Assumes a criterion that updates based on the current position and a sensitivity parameter ( $a$ ) to reward ratios.
RL Model: Based on Lak et al. (2020b). A Reinforcement Learning framework where animals maintain "action values" ( $V$ ) updated by reward prediction errors, multiplied by sensory confidence to form Q-values.

Analysis:

Steady-State: Fitted a one-criterion-per-session SDT model to derive criterion values ( $c$ ) and analyzed adherence to the DT Law.
Trial-by-Trial: Fitted the three models using Maximum Likelihood Estimation (MLE) and Bayesian Information Criterion (BIC) to compare single-learning-rate vs. multiple-learning-rate versions.
Simulation: Generated synthetic data from fitted models to assess if they could reproduce the observed behavioral trajectories.

3. Key Contributions and Results

A. Differential Effects of SPP vs. RP (Experiments 1 & 2)

Behavioral Finding: While both manipulations biased behavior toward the more probable/profitable option, Reward Probability (RP) manipulations had a significantly stronger effect on response bias than Stimulus Presentation Probability (SPP) manipulations.
Quantitative Evidence:
- In the DT Law fits, the sensitivity to reward ( $a$ ) was consistently higher in Exp 2 (RP varied) than in Exp 1 (SPP varied).
- In model fits, the learning rates (step size $\Delta$ in KDB, max step $\Delta_{max}$ in DT, and learning rate $\alpha$ in RL) were more than tenfold higher when RPs were manipulated compared to SPPs.
- Steady-state criteria were more extreme in the RP condition.

B. Failure of Models in Opposing Conditions (Experiment 3)

The Conflict: In Exp 3, SPP and RP were set to opposing ratios (e.g., S1 is 4x more frequent, but R2 is 4x more rewarded). An optimal SDT observer should maintain a neutral criterion ( $c=0$ ) because the likelihood ratio $\beta = \frac{\pi_1 \rho_1}{\pi_2 \rho_2} = 1$ .
Result: Animals did not maintain a neutral criterion. They exhibited a strong bias toward the response associated with the higher Reward Probability, ignoring the stimulus frequency.
Model Failure: None of the three models could reproduce this behavior.
- The KDB and DT models predict an equilibrium criterion based solely on the product $\pi \rho$ . Since this product was constant (1) in all conditions, these models predicted a flat, neutral criterion, failing to capture the observed bias.
- The RL model failed because it assumes a fixed prior ( $\pi_1=\pi_2=0.5$ ) and could not adapt to the changing SPPs while simultaneously weighting the reward history correctly.

C. Reward Density Effects (Experiments 4 & 5)

Finding: Changing the global reward density (while keeping ratios constant) did not systematically affect learning rates.
Modeling: While models with condition-specific learning rates (multiple learning rates) provided better fits (lower BIC) than single-rate models, the fitted learning rates showed no consistent correlation with reward density (high vs. low).
Conclusion: Reward density does not appear to be the primary driver of learning speed in this perceptual context, contrary to some previous findings in non-perceptual tasks.

4. Significance and Implications

Theoretical Implications:

Critique of Current Models: The study demonstrates that standard trial-by-trial learning models (KDB, DT, RL) are insufficient because they do not explicitly represent stimulus priors or stimulus distributions. They treat the stimulus probability as a static background or fail to update it dynamically.
Mechanism of Bias: The results suggest that animals do not simply learn response-reward contingencies (as implied by the Matching Law) but explicitly represent stimulus priors or the entire stimulus distribution. The bias is driven more by the value of the reward associated with a specific stimulus than by the frequency of the stimulus itself.
Human vs. Animal Differences: The authors note a divergence from human studies (e.g., Maddox et al.), where SPP manipulations often yield near-optimal shifts. This study suggests that in rats, reward probability dominates stimulus probability, potentially due to differences in how feedback is processed (probability vs. magnitude) or species-specific learning strategies.

Future Directions:

Future models must incorporate a mechanism to learn and update stimulus priors dynamically (e.g., via sensory prediction errors) rather than assuming fixed priors.
Further investigation is needed to distinguish whether animals track simple stimulus probabilities or full stimulus distributions, especially in non-Gaussian environments.
The distinction between reward probability and reward magnitude manipulations requires further study, as they may engage different learning mechanisms.

In summary, the paper provides robust evidence that reward probability exerts a stronger influence on perceptual decision bias than stimulus probability, and that current computational models fail to capture this differential effect because they lack explicit representations of how subjects learn and update stimulus statistics.

Stimulus prior and reward probability differentially affect response bias in perceptual decision making

The Setup: The Fuzzy Guessing Game

Experiment 1 & 2: The "Frequency" vs. The "Bonus"

Experiment 3: The Tug-of-War

The "Black Box" Problem: Why did the models fail?

Experiment 4 & 5: Does "How Often I Get Paid" Matter?

The Takeaway

1. Problem Statement

2. Methodology

3. Key Contributions and Results

4. Significance and Implications

More like this

Acoustic markers of negative arousal in lambs: evidence from behavioural and eye thermal profiles

TRACE: End-to-end temporal inference and annotation of animal behaviors from video

Adolescent social isolation creates a latent vulnerability in maternal care with intergenerational social consequences, rescued by experienced mothers

A hierarchy of locomotion costs shapes optimal foraging strategy

Ontogeny of settlement behaviours in response to Grammatophora marina diatom biofilms in the marine polychaete, Platynereis dumerilii