Beyond model-free Pavlovian responding: a two-stage… — Plain-Language Explanation

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine your brain is a busy restaurant kitchen. For a long time, scientists thought the kitchen had only one type of chef: the Habit Chef. This chef works on autopilot. If they see a bell ring (a cue), they immediately start chopping vegetables because "Bell = Chopping" is a hard-wired rule. They don't think about why the bell rang or what might happen next; they just react. This is called Model-Free learning.

However, this new study suggests there's actually a second chef in the kitchen: the Strategist Chef. This chef builds a mental map of the whole restaurant. They know that "Bell A" usually leads to the "Grill," which usually leads to "Steak," but sometimes leads to "Salad." If the bell rings, the Strategist doesn't just react; they calculate, "Ah, the bell rang, but the Grill is broken today, so I should probably order the Salad instead." This is Model-Based learning.

The Big Question

For years, researchers believed that when we react to cues we can't control (like the smell of food making us hungry, or a notification sound making us check our phones), we are purely the Habit Chef. They thought these "Pavlovian" reactions were simple, automatic reflexes that couldn't be influenced by complex thinking.

This paper asks: Is that true? Can the Strategist Chef take over even during these automatic reactions?

The Experiment: The Casino Game

To find out, the researchers built a clever video game that acts like a "two-stage casino."

The Setup: You meet two different "Casino Workers" (let's call them Worker A and Worker B).
The Rules:
- Worker A usually (80% of the time) takes you to Slot Machine 1, but rarely (20% of the time) takes you to Slot Machine 2.
- Worker B does the opposite.
- The Slot Machines give you money or take it away, but the odds change slowly over time.
The Twist: After you learn these rules, the game pauses. You are asked to play a simple game where you press a button to "collect" or "avoid" cards. But while you play, one of the Casino Workers is shown in the background.

The Test:
If you are the Habit Chef, you only care about the worker you just saw. If Worker A just gave you money, you press the button more. If Worker A took money, you stop pressing. You don't care about the other worker or the weird transitions.

If you are the Strategist Chef, you think deeper. You realize, "Wait, Worker A rarely takes me to Slot Machine 2. If I just saw Worker A but ended up at Slot Machine 2, that was a fluke! The real connection is still Worker A $\to$ Slot Machine 1." You update your expectations based on the structure of the game, not just the immediate result.

What They Found

The results were surprising.

The Strategist Won: Most participants weren't just reacting on autopilot. They were using the Strategist Chef. They understood the complex rules of the casino and adjusted their button-pressing based on the hidden structure of the game, not just the last win or loss.
The "Mind Wandering" Effect: The researchers also asked participants, "Are you daydreaming right now?"
- When people were focused, the Strategist Chef was in charge. They made smart, calculated decisions.
- When people were daydreaming (mind wandering), the Strategist Chef went on break. The participants reverted to the Habit Chef, reacting only to the most recent outcome without thinking about the bigger picture.
- Crucially, the "Habit Chef" didn't care if people were daydreaming or not; they just kept doing their simple routine.

Why This Matters

Think of your daily life. When you see a "Sale" sign, do you buy things because you're a mindless robot (Habit), or because you've calculated that you actually need the item and the price is right (Strategy)?

This study shows that even our most "automatic" reactions—like craving a snack when we see a candy wrapper—are actually flexible and smart, as long as we are paying attention.

The Takeaway:

We are smarter than we think: Even when we are reacting to cues we can't control, our brains are often running complex simulations to figure out the best move.
Attention is the switch: If we are distracted or daydreaming, we lose our "Strategist" brain and fall back on bad habits.
Hope for mental health: Many addictions and disorders are linked to getting stuck in "Habit" mode. This research suggests that if we can help people stay focused and engaged, they might be able to switch back to their "Strategist" mode and break those bad cycles.

In short: Your brain isn't just a reflex machine; it's a brilliant detective. But if you stop paying attention, the detective goes home, and the autopilot takes over.

1. Problem Statement

Pavlovian-instrumental transfer (PIT) describes how Pavlovian cues (stimuli predicting rewards) bias instrumental actions. Traditionally, single-lever PIT paradigms (where participants choose between acting or not acting) have been interpreted as relying primarily on model-free reinforcement learning (RL)—associating cues directly with outcomes based on reinforcement history without an internal model of the environment.

However, this interpretation faces two critical limitations:

Lack of Dissociation: Standard single-lever PIT tasks cannot computationally distinguish between model-free and model-based strategies (which utilize an internal model of state transitions to infer value). While specific PIT (cue-specific outcome) is theorized to be model-based, single-lever PIT is often assumed to be purely model-free.
Cognitive State Influence: It is theoretically expected that model-based learning, being cognitively demanding (requiring working memory and executive function), should be impaired by internal states like mind wandering, whereas model-free learning should remain robust. This specific dissociation has not been empirically validated in the Pavlovian domain.

The study aims to develop a paradigm that computationally dissociates these mechanisms in a single-lever PIT setting and tests whether model-based Pavlovian responding exists and is sensitive to attentional states.

2. Methodology

Experimental Design: Two-Stage PIT Paradigm

The authors developed a novel trial-by-trial two-stage PIT task involving 71 healthy university students. The task consisted of:

Instrumental Training: Participants learned to collect "good" cards (approach) or avoid "bad" cards (non-approach) via probabilistic monetary feedback.
Pavlovian Training & Transfer (Interleaved):
- Stage 1 (CS): A conditioned stimulus (CS) (e.g., a casino worker) was presented.
- Stage 2 (Transition): The CS probabilistically transitioned to one of two slot machines (2nd-stage states). Transitions were either common (80%) or rare (20%).
- Outcome (US): The slot machine yielded a win or loss of €1. Reward probabilities drifted over time via Gaussian random walks.
- PIT Trial: Immediately following the Pavlovian trial, participants performed the instrumental task (collecting/avoiding cards) under nominal extinction (no feedback) while the CS was displayed in the background.
- Value Query: Participants explicitly judged whether the CS was currently associated with a win or a loss.
- Mind Wandering Probes: Every 60 trials, participants rated their attentional focus (on-task vs. off-task).

Computational Modeling

Three RL models were fitted to the trial-by-trial PIT response data using Expectation-Maximization (EM):

Model-Free (MF): Updates values based solely on the direct reinforcement history of the presented CS, ignoring transition structure.
Model-Based (MB): Updates values by incorporating the transition probabilities (the internal model), allowing inference about the value of non-presented states.
Hybrid: A weighted combination of MF and MB strategies.

Model selection was performed using Bayesian Information Criterion (BIC).

Statistical Analysis

Behavioral Markers:
- Model-Free Index: Main effect of CS-match (response changes only when the same CS is presented).
- Model-Based Index: Interaction between CS-match and transition type (response changes based on the inferred value of the other CS following a rare transition).
Bayesian Sequential Testing: Sample size was determined dynamically. Data collection stopped at $N=71$ once strong evidence ( $BF_{10} > 6$ ) for model-based PIT was established.
Mind Wandering Analysis: Correlations between state (probes) and trait (questionnaire) mind wandering and the derived model-based/behavioral indices.

3. Key Contributions

Novel Paradigm: Introduction of a single-lever PIT task that successfully computationally dissociates model-free and model-based learning using a two-stage probabilistic structure, a capability previously limited to multi-lever or instrumental two-step tasks.
Theoretical Challenge: Demonstrates that single-lever PIT is not exclusively model-free; under conditions of detailed instruction and high model certainty, it can be driven by flexible, model-based inference.
Cognitive Modulation: Provides the first evidence that internal attentional states (mind wandering) selectively impair model-based Pavlovian control while leaving model-free indices intact, validating the cognitive resource demands of model-based Pavlovian learning.

4. Results

Behavioral Evidence

Query Trials: Participants showed a robust model-based pattern in their explicit value judgments. Their responses were significantly influenced by the interaction of CS-match and transition type ( $BF_{10} = 4.89 \times 10^{10}$ ), indicating they inferred values based on the task structure rather than just direct reinforcement. There was no evidence for model-free updating in queries.
PIT Trials: The instrumental transfer effect also exhibited a significant model-based signature (interaction of CS-match $\times$ transition type, $BF_{10} = 6.42$ ). While the evidence for a pure model-free main effect was inconclusive, the data strongly supported the presence of model-based control.

Computational Modeling

Model Fit: The Model-Based RL model provided the best fit for the majority of participants (52 out of 71) and significantly outperformed both the Hybrid and Model-Free models based on BIC scores.
Parameters: The model-based weighting parameter ( $\beta_{MB}$ ) was significantly greater than zero, confirming systematic reliance on transition-aware strategies.

Mind Wandering Effects

State Mind Wandering: Higher self-reported mind wandering during task blocks significantly predicted lower model-based behavioral estimates ( $p = .024$ ).
Trait Mind Wandering: Deliberate mind-wandering traits were negatively correlated with model-based control ( $\rho = -0.27, p = .011$ ).
Specificity: No measure of mind wandering was associated with model-free indices, confirming the selective vulnerability of model-based processes to attentional lapses.

5. Significance

Refining Pavlovian Theory: The study challenges the long-held assumption that single-lever PIT is a purely automatic, model-free process. It suggests that Pavlovian responding can be a flexible, goal-directed system capable of complex inference when the environment is structured and instructions are clear.
Clinical Implications: Given the link between PIT and psychiatric conditions (addiction, OCD, anxiety), the finding that model-based Pavlovian control is sensitive to attentional states offers new avenues for understanding maladaptive behaviors. It suggests that interventions targeting attention or cognitive load could modulate the balance between habitual (model-free) and flexible (model-based) Pavlovian responses.
Methodological Advancement: The paradigm offers a scalable, single-lever tool for future research to investigate the neural and computational underpinnings of Pavlovian learning without the complexity of multi-choice instrumental tasks.

In conclusion, the authors demonstrate that Pavlovian responding is not a monolithic, reflexive system but a complex interplay of learning strategies that can be model-based, computationally dissociable, and critically dependent on the agent's internal attentional state.

Beyond model-free Pavlovian responding: a two-stage Pavlovian-instrumental transfer paradigm

The Big Question

The Experiment: The Casino Game

What They Found

Why This Matters

1. Problem Statement

2. Methodology

Experimental Design: Two-Stage PIT Paradigm

Computational Modeling

Statistical Analysis

3. Key Contributions

4. Results

Behavioral Evidence

Computational Modeling

Mind Wandering Effects

5. Significance

More like this