fMRI-Based Prediction of Eye Gaze During Naturalistic Movie Viewing Reveals Eye-Movement-Related Brain Activity

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to understand how people's brains work while they watch a movie. Usually, scientists put a camera on the person's face to track exactly where their eyes are looking. This is like having a spotlight that shows you exactly what the person is paying attention to.

But here's the problem: Most old movie-watching brain scans don't have these eye cameras. They were taken years ago, or in hospitals where the cameras didn't fit. It's like having a recording of a concert but missing the audio of the singer's voice—you can see the crowd, but you don't know who they were looking at.

This paper asks a big question: Can we use Artificial Intelligence (AI) to "guess" where people were looking, just by looking at their brain scan data?

Here is the simple breakdown of what the researchers did and what they found, using some everyday analogies.

1. The Magic Trick: Reading Minds from Brain Scans

The researchers used a pre-trained AI model called DeepMReye. Think of this AI as a "mind-reading detective" that has been trained on a huge library of brain scans and eye-tracking data.

The Job: The AI looks at the tiny signals coming from the eyeball area inside the brain scan. Even though the brain scan is blurry and slow (like a low-frame-rate video), the movement of the eye creates tiny ripples in the signal.
The Challenge: The researchers wanted to see if this AI could work on new movies and new people without being re-trained first. This is called a "zero-shot" approach. It's like hiring a translator who speaks 10 languages and asking them to translate a new language they've never heard of, just by guessing based on context.

2. The Results: The "Crowd" vs. The "Individual"

The results were a mix of "Wow!" and "Not so fast."

The Group Level: The "Chorus" Effect

When the researchers averaged the eye-gaze predictions of everyone in the room together, the AI was amazing.

The Analogy: Imagine a choir singing. If you listen to just one singer, they might be slightly off-key or out of rhythm. But if you listen to the whole choir, the sound is perfect and powerful.
The Finding: When they looked at the average gaze of the group, the AI's guess matched the real eye-tracking data almost perfectly (about 80% accuracy). It successfully predicted that when a character in the movie moved left, the whole group looked left.

The Individual Level: The "Soloist" Problem

When they tried to guess the eye movements of just one person, the AI struggled.

The Analogy: Trying to hear one specific singer in a noisy room is hard. Everyone's eyes are shaped slightly differently, they move their heads differently, and the brain scanner creates different amounts of "static" for each person.
The Finding: For a single person, the AI was only about 25-35% accurate. It was too noisy to tell exactly where one specific person was looking at any given second.

3. What Did They Learn About the Brain?

Even though the AI wasn't perfect for individuals, the "group average" was good enough to map the brain.

The Map: They used the AI's "group guess" of where people were looking to see which parts of the brain lit up.
The Discovery: They found the brain's "Eye Control Center." This included the Frontal Eye Fields (the boss that tells the eyes where to go) and the Visual Cortex (the screen where the movie plays).
The Takeaway: Even without a real eye camera, the AI could reconstruct the brain's "eye-movement map" well enough to show us how our brains control where we look.

4. The Age Factor: Growing Up Changes How We Look

The researchers also looked at how age changes things, comparing children to adults.

The Finding: As people get older, their eye movements become more synchronized with the group.
The Analogy: Think of a toddler watching a cartoon. They might look at the dog, then the cat, then the ceiling, then the dog again—very randomly. An adult, however, tends to look at the main character, just like everyone else.
The Twist: This "growing up" of eye habits wasn't a straight line. It was like a hill: kids' brains get better at following the action as they enter their teenage years, but then the pattern shifts again as they become young adults. It's a complex journey, not a simple straight line.

The Bottom Line

Can we use AI to guess where people looked in old brain scans?

For a single person? Not really yet. It's too fuzzy. You still need a real camera for that.
For a group of people? Yes! It works surprisingly well.

Why does this matter?
There are thousands of old brain scans sitting in databases that we can't fully use because we don't know where the people were looking. This study shows that we can use AI to "fill in the blanks" for the group. It allows scientists to study how our brains handle attention and eye movements in movies, even for studies that were done years ago without eye-tracking cameras.

It's like finding a way to hear the melody of a song even if the original recording was missing the vocals, as long as you listen to the whole orchestra together.

1. Problem Statement

Functional MRI (fMRI) studies utilizing naturalistic stimuli (e.g., movie watching) often lack concurrent eye-tracking data, particularly in large-scale open-access datasets and legacy studies. This absence limits the ability to distinguish neural activity driven by visual attention and oculomotor control from general stimulus processing. While deep learning models like DeepMReye can theoretically infer gaze from fMRI signals of the eyeball region, their zero-shot generalizability (performance without dataset-specific fine-tuning) across heterogeneous datasets remains unproven. Furthermore, it is unclear whether such predictions are accurate enough for individual-level inference or if they are only viable for group-level analyses.

2. Methodology

The study employed a zero-shot implementation of the pre-trained DeepMReye framework across three independent naturalistic fMRI datasets:

Natural Viewing (NV): 22 adults; includes ground-truth camera-based eye tracking (EyeLink 1000 Plus) for validation.
Healthy Brain Network (HBN): 82 children/young adults; no eye tracking.
Partly Cloudy (PC): 82 children and adults; no eye tracking.

Key Analytical Steps:

Preprocessing: Standard fMRI preprocessing (realignment, normalization to MNI space, smoothing) was applied using SPM12.
Gaze Prediction: The pre-trained DeepMReye model (weights datasets_1to6.h5) was applied to eyeball voxel time series to predict horizontal ( $x$ ) and vertical ( $y$ ) gaze coordinates without any fine-tuning.
Validation (NV Dataset):
- Individual Level: Correlation and Mean Absolute Error (MAE) between predicted and ground-truth gaze.
- Group Level: Correlation between group-averaged predicted trajectories and group-averaged ground-truth trajectories.
Cross-Dataset Consistency: Compared HBN predictions against NV ground truth (using shared stimuli) to assess generalizability.
Brain Mapping:
- Generated a gaze displacement regressor (Euclidean distance between consecutive gaze points).
- Used this regressor in a General Linear Model (GLM) to identify brain regions activated by eye movements.
- Conducted analyses using both individual-level and group-averaged gaze regressors.
Developmental Analysis: Investigated age-related effects on gaze prediction accuracy and gaze-related brain activation using linear and quadratic regression models.

3. Key Contributions

Zero-Shot Evaluation: Provides the first comprehensive evaluation of a pre-trained DeepMReye model across multiple heterogeneous fMRI datasets without fine-tuning.
Group vs. Individual Discrepancy: Demonstrates a critical dissociation where group-averaged predictions are highly reliable, while individual-level predictions are noisy and less accurate.
Neural Correlates: Successfully maps eye-movement-related brain activity using fMRI-derived gaze signals, identifying canonical oculomotor networks without physical eye-tracking hardware.
Developmental Insights: Reveals that age-related modulation of gaze-related brain activity is non-linear and highly dependent on stimulus context (e.g., inverted U-shape in one dataset vs. decreasing trajectory in another).

4. Key Results

A. Prediction Accuracy

Group Level (High Accuracy): Group-averaged predictions showed strong correspondence with ground truth.
- NV Dataset: Correlations ( $r$ ) ranged from 0.73 to 0.84 ( $p < .001$ ).
- Cross-Dataset (HBN vs. NV): Correlations were even higher ( $r = 0.80–0.90$ ), indicating robust stimulus-driven gaze synchronization.
Individual Level (Low Accuracy): Individual predictions were modest and variable.
- Correlations ranged from 0.24 to 0.37.
- Median MAE was approximately 2.2°–2.75° of visual angle.
Age Effects on Prediction: In the HBN dataset, individual prediction accuracy (similarity to group norms) increased with age, particularly for vertical gaze, suggesting developmental stabilization of gaze behavior.

B. Brain Activation Maps

Group-Averaged Regressors: Successfully revealed widespread activation in the oculomotor control network, including:
- Frontal Eye Fields (FEF)
- Intraparietal Sulcus (IPS)
- Visual cortices (Cuneus, Lingual Gyrus).
- These maps were consistent across different datasets and stimuli.
Individual Regressors: Produced significantly smaller, sparse, and less consistent activation clusters, often restricted to visual cortex only.

C. Age-Related Neural Modulation

Non-Linear Trajectories: ROI analyses revealed that gaze-related brain activity does not follow a simple linear trend.
- HBN (DM stimulus): Showed an inverted U-shape (increasing activation from childhood to adolescence, then decreasing in young adulthood).
- PC Dataset: Showed a decreasing trajectory with age.
Vertical vs. Horizontal: Vertical eye movement synchrony showed a significant positive correlation with age, whereas horizontal synchrony remained stable, suggesting distinct developmental timelines for vertical oculomotor control.

5. Significance and Implications

Methodological Utility: The study validates that group-averaged fMRI-based gaze decoding is a viable strategy for "augmenting" large-scale naturalistic datasets (e.g., HCP, NSD) that lack eye-tracking hardware. It allows researchers to investigate attentional and oculomotor networks in legacy data.
Limitations of Individual Inference: The findings caution against using zero-shot models for single-subject clinical assessments or fine-grained behavioral analysis, as individual noise and domain shifts (scanner differences, anatomy) significantly degrade performance.
Neurodevelopmental Insights: The study highlights that gaze-related neural maturation is complex, context-dependent, and non-linear. It underscores the importance of using naturalistic paradigms to capture these dynamic developmental trajectories.
Future Directions: Suggests that while zero-shot models are useful for population-level trends, future work should focus on dataset-specific fine-tuning, improved motion correction, and integrating multimodal signals to enable reliable individual-level inference.

In conclusion, this paper establishes that while pre-trained deep learning models cannot yet replace hardware eye-tracking for individual subjects, they offer a powerful tool for recovering shared, stimulus-driven gaze dynamics and their associated neural mechanisms in large-scale fMRI studies.