Neural network-based encoding in free-viewing fMRI with gaze-aware models

This paper introduces a gaze-aware encoding model that integrates eye-tracking data with CNN features to effectively predict brain activity during naturalistic, free-viewing fMRI, achieving performance comparable to conventional models with significantly fewer parameters while enabling more ecologically valid neuroscience research.

Original authors: Gozukara, D., Ahmad, N., Seeliger, K., Oetringer, D., Geerligs, L.

Published 2026-03-11
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: Watching a Movie vs. Staring at a Dot

Imagine you are sitting in a movie theater. In a real life scenario, you are free to look around. You might look at the hero's face, then glance at the background scenery, then look at a car speeding by. Your eyes are constantly dancing, picking up the most interesting parts of the story.

However, for decades, scientists studying the brain with MRI machines have forced people to do the opposite. They tell participants: "Don't move your eyes. Stare at a tiny dot in the center of the screen for two hours."

This is like watching a movie while wearing a blindfold that only has a tiny hole in the center. You can see the dot, but you miss everything else. While this makes the data easier to analyze, it doesn't reflect how our brains actually work in the real world. It's also mentally exhausting to stare at a dot while a chaotic movie plays around you.

The Problem: The "Blind" Computer Model

Scientists use powerful computer programs (called Convolutional Neural Networks or CNNs) to guess what a person is thinking just by looking at their brain scan.

  • The Old Way: The computer looks at the entire movie frame, pixel by pixel, from top to bottom, left to right. It tries to guess which part of the image made a specific brain cell light up.
  • The Flaw: This is like trying to find a needle in a haystack by looking at the entire hayfield at once. The computer has to remember millions of details (parameters) to make a guess. It's computationally expensive, requires huge amounts of data, and ignores the fact that you were only looking at a tiny corner of the screen.

The Solution: The "Gaze-Aware" Spotlight

The authors of this paper proposed a smarter way. They asked: "What if we only feed the computer the parts of the movie that the person actually looked at?"

They used eye-tracking technology to see exactly where the participants' eyes landed. Then, they built a new model called a "Gaze-Aware Encoding Model."

Here is the analogy:

  • The Old Model is like a security guard trying to describe a room by memorizing every single brick, dust mote, and shadow in the entire building, even the parts no one looked at.
  • The New Model is like a spotlight. It only shines on the specific spot where the person's eyes are looking. It ignores the rest of the room.

How They Did It (The Recipe)

  1. The Dataset: They used a public dataset called "StudyForrest," where people watched the movie Forrest Gump (in German) without being told to stare at a dot. Their eye movements were recorded the whole time.
  2. The Computer Brain: They used a pre-trained AI (VGG-19) that is really good at recognizing images.
  3. The Trick: Instead of feeding the AI the whole image, they used the eye-tracking data to "crop" the image. If a person looked at a tree on the left, the AI only analyzed the tree. If they looked at a car on the right, the AI only analyzed the car.
  4. The Result: They created a "feature time series" that only contained the visual information relevant to where the eyes were looking at that exact moment.

The Surprising Results

The researchers compared their "Spotlight" model against the "Whole Room" model. Here is what they found:

  1. Same Accuracy, Less Work: The new model predicted brain activity just as well as the old, heavy model.
  2. Massive Efficiency: The new model used 112 times fewer parameters.
    • Analogy: Imagine the old model was a 10,000-page encyclopedia trying to describe a single sentence. The new model was a 90-page book that described the same sentence perfectly.
  3. Memory Savings: Because the model was so much smaller, it could run on a standard laptop. The old model required a massive supercomputer to handle the memory.
  4. Dynamic Viewers Win: The new model worked especially well for people who moved their eyes a lot. The more active the viewer, the better the model performed. This proves that tracking eye movements is crucial for understanding how we process dynamic scenes.

Why This Matters

This paper is a game-changer for two main reasons:

  • Realism: It allows scientists to study the brain in a way that feels like real life. We don't stare at dots in the real world; we explore. This model respects that natural behavior.
  • Accessibility: Because the model is so efficient, smaller labs with less money and less computing power can now run these complex brain studies. You don't need a supercomputer to understand how the brain sees the world anymore.

The Bottom Line

The authors showed that by simply paying attention to where people look, we can build brain models that are smarter, faster, and cheaper, while still being incredibly accurate. It's a shift from forcing the brain to behave unnaturally, to letting the brain be human and building models that understand that humanity.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →