From Latent to Observable Position-Based Click Models in Carousel Interfaces

This paper introduces novel position-based click models for carousel interfaces, including a latent-variable-free model leveraging eye-tracking data, and demonstrates through experiments that while gradient-based optimization improves prediction accuracy, click-only models fundamentally fail to capture realistic user examination patterns, highlighting the need for additional behavioral signals in complex recommender systems.

Original authors: Santiago de Leon-Martinez, Robert Moro, Branislav Kveton, Maria Bielikova

Published 2026-06-17
📖 4 min read☕ Coffee break read

Original authors: Santiago de Leon-Martinez, Robert Moro, Branislav Kveton, Maria Bielikova

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are walking through a massive, digital library. Instead of a single long list of books on a shelf, the library is organized into carousels—rows of books that you can swipe left and right, like turning pages in a magazine. This is how modern apps like Netflix or Spotify show you movies and songs.

The paper you're asking about is trying to solve a mystery: How do we figure out what a user actually looks at versus what they actually click on?

In the world of computer science, this is called a "Click Model." For a long time, these models were like old, single-lane roads. They assumed users just looked at a list from top to bottom. But carousels are complex; users might look at the top row, swipe to the second, look at the third, and then go back to the first. The old models didn't understand this dance.

Here is what the researchers did, broken down simply:

1. The Problem: The "Guessing Game"

In the past, when a computer tried to learn what a user liked, it only had one clue: The Click.

  • If you clicked a movie, the computer knew you liked it.
  • If you didn't click, the computer had to guess: "Did they not see it? Did they see it and hate it? Or did they see it and just forget to click?"

Because the computer couldn't see the user's eyes, it had to guess (or "infer") where the user looked. This is like trying to guess what a person is reading in a crowded room just by seeing which book they pick up, without ever seeing their face.

2. The New Idea: "The Eye-Tracker Glasses"

The researchers decided to stop guessing. They used a dataset where they actually had eye-tracking data. Imagine giving the users special glasses that record exactly where their eyes stop and stare (called "fixations").

They built three new types of "Click Models" (mathematical recipes) to understand carousel behavior:

  • The Standard Model (CPBM): A smart guesser that learns where people look based on clicks.
  • The Row-Column Model (RCPBM): A model that understands that people look at the top of the screen more than the bottom, and the left more than the right, treating rows and columns separately.
  • The "Eye-Open" Model (OEPBM): This is the star of the show. It's the first model that doesn't have to guess. It uses the actual eye-tracking data as a direct signal. It knows, "Ah, the user's eyes were here for 2 seconds, so they definitely examined this item," regardless of whether they clicked it.

3. The Race: Old Math vs. New Math

The researchers also tested how to teach these models.

  • The Old Way (EM/MLE): Like a student slowly working through a textbook, checking one answer at a time to get it right.
  • The New Way (Gradient Ascent): Like a hiker feeling the slope of a mountain and taking big, smart steps downhill to find the bottom (the best answer) much faster.

The Result: The "hiker" (Gradient Ascent) consistently found better answers than the "textbook student" (the old methods).

4. The Big Discovery: Clicks Lie

Here is the most important lesson from the paper, explained with a metaphor:

Imagine a teacher grading a student.

  • The Click Score: The student gets an 'A' if they get the right answer.
  • The Behavior Score: The teacher also checks if the student actually read the question before answering.

The researchers found that if you only grade the Click Score, the student might get an 'A' by guessing or luck, even if they didn't actually read the question. The model might look perfect at predicting clicks, but it has a terrible understanding of how humans actually browse.

The OEPBM (the Eye-Open model) was the only one that got both scores right. It predicted clicks well and it perfectly matched the real eye-tracking patterns (like the "F-pattern," where people scan the top and left side first).

5. The Conclusion

The paper concludes with a simple truth: In complex interfaces like carousels, clicks alone are not enough.

If you want to truly understand how people browse, you can't just watch their fingers (clicks); you need to watch their eyes (or use other signals). The best model they built (OEPBM) uses this extra information to create a realistic picture of user behavior, proving that sometimes, you need more than just a "click" to understand a human.

In short: They built a new, smarter way to predict what people look at in swipeable lists, proved that modern math methods work better than old ones, and showed that without eye-tracking data, computers will always be guessing about what users actually see.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →