SPGen: Stochastic scanpath generation for paintings using unsupervised domain adaptation

The paper introduces SPGen, a novel deep learning model that utilizes unsupervised domain adaptation and stochastic sampling to accurately predict human eye movement scanpaths on paintings, thereby advancing the analysis and preservation of cultural heritage.

Mohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani, Alessandro Bruno

Published 2026-02-26
📖 5 min read🧠 Deep dive

Imagine you are standing in front of a famous painting in a museum. You don't just stare at it like a statue; your eyes dance around. You might look at the face of the subject first, then jump to the bright red dress, then drift to the dark background, and finally land on a tiny detail in the corner. This journey your eyes take is called a scanpath.

The paper you shared, SPGen, is about teaching a computer to predict exactly how a human's eyes will dance across a painting. But here's the tricky part: computers are usually trained on photos of real life (like cats, cars, and trees), but paintings are different. They have different colors, styles, and rules.

Here is a simple breakdown of how the authors solved this puzzle, using some everyday analogies.

1. The Problem: The "Real World" vs. The "Art World"

Think of a computer model as a tour guide who has spent their whole life giving tours of a bustling city (natural photos). They know exactly where people look: at traffic lights, storefronts, and faces.

Now, you ask this same tour guide to lead a group through a fantasy art gallery (paintings). The guide gets confused! In the city, people look at the center of the street. In a painting, the "center" might be a quiet corner, or the most important part might be in the top left. The guide keeps trying to apply city rules to the art gallery, and the tour goes wrong.

The researchers needed a way to teach their "city guide" how to navigate the "art gallery" without having to hire a new guide who only knows art.

2. The Solution: SPGen (The Smart Eye-Tracker)

The authors built a new AI model called SPGen. Think of it as a super-smart robot eye that learns to mimic human curiosity. Here are its three main superpowers:

A. The "Bias Map" (The Internal Compass)

Humans have a natural habit of looking at the center of an image first (like looking at the middle of a menu before reading the sides). The model learns this habit using something called Gaussian Priors.

  • Analogy: Imagine the model has a magnetic compass that naturally pulls its attention toward the center of the room. But, unlike a real compass that always points North, this one is learnable. It can adjust its magnetism to fit the specific style of the painting it's looking at.

B. The "Randomness Switch" (The Temperature Control)

This is the most unique part. If you ask two different people to look at the same painting, they will look at different things. If you ask the same person to look at it twice, they might still look at different spots. Human attention is stochastic (random).

  • Analogy: Most AI models are like a robot that always takes the exact same route every time. SPGen has a "Temperature Knob."
    • Low Temperature: The robot is very focused and predictable (like a strict tour guide).
    • High Temperature: The robot gets a little "tipsy" or playful. It adds a bit of random noise, allowing it to generate different eye paths for the same painting. This mimics how real humans have different moods and attention spans.

C. The "Translator" (Unsupervised Domain Adaptation)

This is how they solved the "City vs. Art Gallery" problem. They didn't have enough data on how humans look at paintings to train the model from scratch. So, they used a trick called Unsupervised Domain Adaptation.

  • Analogy: Imagine the model is a student who studied hard for a Math test (Natural Photos) but is now taking an Art History test (Paintings).
    • The researchers added a "Domain Classifier" (a strict teacher) that tries to guess: "Is this a Math problem or an Art problem?"
    • They added a Gradient Reversal Layer. This is like a "reverse psychology" trick. When the teacher tries to tell the student "This is Art!", the student's brain flips the signal and says, "No, I will ignore the Art clues and focus only on the Math clues that are the same for both!"
    • Result: The model learns the universal rules of what catches the eye (like faces or bright colors) that apply to both cities and paintings, ignoring the specific "noise" that makes them different.

3. How They Tested It

They tested their robot guide on two types of maps:

  1. Natural Scenes (The City): They used the Salicon and MIT1003 datasets (photos of real life). The model did incredibly well, beating other top models.
  2. Paintings (The Art Gallery): They used datasets of famous paintings (like the Le Meur and AVAtt datasets).
    • Before the "Translator" trick: The model looked at paintings like it was looking at photos of cats. It got lost.
    • After the "Translator" trick: The model suddenly understood the art. It started looking at the important parts of the paintings, just like a human would.

4. Why Does This Matter?

Why do we care if a computer knows where our eyes go?

  • Preserving Culture: It helps us understand how people interact with art. We can analyze which parts of a masterpiece are most engaging to viewers.
  • Virtual Museums: Imagine a VR museum where the exhibit changes based on where you are looking. This technology could power those experiences.
  • Restoration: It can help restorers understand what details are most important to the human eye, ensuring they don't accidentally paint over the "soul" of the artwork.

The Bottom Line

SPGen is a clever AI that learns to "see" like a human. It uses a special trick to translate its knowledge from everyday photos to complex paintings, and it includes a "randomness" feature to mimic the unpredictable nature of human curiosity. It's a big step forward in helping computers understand not just what we see, but how we look.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →