Adaptive integration of model-based and model-free strategies in human reinforcement learning of reachable space

Using a novel robotic maze task, this study demonstrates that humans adaptively integrate model-based and model-free reinforcement learning strategies to navigate reachable space, shifting toward more efficient model-free control as familiarity increases and revealing that spatial learning architectures are shared across scales but calibrated to the specific constraints of the effector system.

Original authors: Zhu, T., Syan, R., Vejandla, S., Gallivan, J. P., Wolpert, D. M., Flanagan, J. R.

Published 2026-03-04
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: How We Learn to Move Our Hands

Imagine you are trying to get a coffee cup from a crowded table. You have to reach out, dodge a stack of plates, and avoid knocking over a vase. This happens every day, but scientists haven't really studied how our brains learn to do it until now.

This paper asks a simple question: When we learn to navigate a tricky space with our hands, does our brain use a "GPS map" (planning ahead) or does it just rely on "muscle memory" (trial and error)?

The answer is: We use both, and we switch between them like a hybrid car.


The Experiment: The Robot Maze

To figure this out, the researchers built a special video game.

  • The Setup: Participants sat in front of a robotic arm. They held a handle that controlled a virtual ball on a screen.
  • The Goal: Move the ball from a starting point to a red target square.
  • The Obstacle: The maze was full of grey blocks. If you hit a block, the robot would push back against your hand (like hitting a wall).
  • The Twist: They tested two groups:
    1. The "Eyes-On" Group: They could see the whole maze and the blocks.
    2. The "Blind" Group: They couldn't see the maze or their hand. They had to feel their way around using only the robot's push-back feedback.

The Two Brain Modes: The Architect vs. The Habit-Former

The researchers used computer models to guess what was happening in the participants' brains. They found two distinct strategies:

  1. Model-Based (The Architect):

    • What it is: This is your brain building a mental map. It's like a GPS. You look at the maze, plan a route, and think, "If I go left, then up, I'll hit the wall, so I'll go right instead."
    • Pros: It's flexible. If the maze changes, you can instantly recalculate.
    • Cons: It's slow and tiring. It takes a lot of mental energy to plan every step.
  2. Model-Free (The Habit-Former):

    • What it is: This is your brain caching successful moves. It's like a squirrel remembering where it buried nuts. "I went left here before, and I got the reward, so I'll go left again." It doesn't know why it works, it just knows that it works.
    • Pros: It's super fast and automatic.
    • Cons: It's rigid. If the maze changes, the squirrel keeps digging in the wrong spot.

The Surprising Findings

1. We Start as Architects, Then Become Habit-Formers

At the very beginning of the game, everyone used the Architect strategy. They were planning carefully. But as they played more and more rounds, they slowly switched to the Habit-Former.

  • The Analogy: Think of learning to drive a new route. At first, you stare at the GPS (planning). After a week, you drive on autopilot without thinking (habit). The study shows our brains do this automatically to save energy.

2. The "Blind" Group Relied More on Habits

The group that couldn't see the maze relied on the Habit-Former strategy much more than the group that could see.

  • Why? When you can't see, building a perfect mental map is hard and uncertain. So, your brain says, "I'll just trust my muscle memory for this one."
  • The Twist: Even the group that could see the maze eventually switched to habits! This proves that our brains switch to habits not just because we are confused, but because planning is expensive. Once we know the way, we stop planning to save mental energy.

3. Speed vs. Safety

Here is the coolest part: The people who relied more on Habits moved faster.

  • The Analogy: The "Architect" is like a chess player thinking for 10 minutes before moving a piece. The "Habit-Former" is like a reflex.
  • The study found that when people moved faster, they were actually using their "habit" brain more. Interestingly, these fast movers also bumped into fewer blocks. Why? Because habits repeat what worked before, avoiding the risky, unexplored paths that the "Architect" might try to calculate.

The Big Comparison: Hands vs. Feet

The researchers compared their hand-maze game to a similar game where people "walked" through a virtual maze using a chair.

  • The Result: People relied much more on habits when using their hands than when using their feet.
  • The Reason: Walking is slow and tiring. If you take a wrong turn while walking, it costs you a lot of time. So, your brain forces you to plan carefully (Architect mode).
  • The Hand Difference: Moving your hand is fast and cheap. If you take a slightly wrong turn with your hand, it only costs a split second. Because the "penalty" for a mistake is so low, your brain feels safe enough to just use fast, automatic habits.

The Takeaway

Our brains are incredibly smart engineers. They don't just stick to one way of learning.

  • When we are learning something new, we plan (Architect).
  • Once we get the hang of it, we switch to habits (Habit-Former) to save energy and move faster.
  • We switch to habits even faster when the task is quick and easy (like moving a hand) compared to when it's slow and costly (like walking).

In short: We are all hybrid learners. We build maps when we need to, but as soon as we can, we let our habits take the wheel so we can move faster and free up our brains for other things.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →