Quantifying User Coherence: A Unified Framework for Analyzing Recommender Systems Across Domains

This paper introduces a unified framework using two novel information-theoretic measures, Mean Surprise and Mean Conditional Surprise, to quantify user profile coherence, revealing that recommendation performance is strongly predicted by user coherence and enabling more robust evaluation and targeted system design.

Michaël Soumm, Alexandre Fournier-Montgieux, Adrian Popescu, Bertrand Delezoide

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are a chef running a massive, high-tech restaurant. Your goal is to guess what dish each customer wants to order next based on what they've eaten before.

For some customers, this is easy. They always order a burger, then fries, then a milkshake. They are predictable. For others, it's a nightmare. One day they order sushi, the next day they ask for a pizza, and then they suddenly want a bowl of oatmeal. They are unpredictable.

This paper is about a new way for the chef (the Recommender System) to understand why they are good at guessing for some people and terrible at guessing for others.

The Problem: The "Average" Lie

Traditionally, chefs look at the "average" success rate. They say, "Hey, our guessing algorithm is 80% accurate!" But this hides a secret: the algorithm might be 99% accurate for the predictable customers but only 10% accurate for the unpredictable ones. The "average" makes the system look good, but it fails the people who need help the most.

The authors of this paper wanted to stop guessing blindly and start measuring the personality of the customer's taste.

The Two New "Taste Meters"

The researchers invented two simple tools (mathematical formulas, but think of them as meters) to measure every customer:

  1. The "Unusualness" Meter (Mean Surprise):

    • What it measures: Does this person eat what everyone else eats?
    • Analogy: If everyone orders the "Chef's Special," but this person only orders obscure, weird dishes from a tiny village in Peru, their "Unusualness" score is high. If they only eat the popular stuff, the score is low.
    • The Insight: It turns out, being "unusual" isn't the main problem. You can be a weirdo who always eats weird things, and the system can still learn you.
  2. The "Consistency" Meter (Mean Conditional Surprise):

    • What it measures: Does this person's choices make sense together?
    • Analogy:
      • Consistent (Low Score): A person who loves horror movies. They watch The Conjuring, then Scream, then Halloween. Even if these movies are rare, they fit together perfectly. The system can easily guess the next one.
      • Inconsistent (High Score): A person who watches a horror movie, then a romantic comedy, then a documentary about bees, then a heavy metal concert, then a cooking show. There is no pattern. It's like trying to predict the next word in a sentence that is just random gibberish.
    • The Big Discovery: This is the most important finding. The system fails miserably on "Inconsistent" people. No matter how smart the AI is (Deep Learning, fancy math), it cannot predict what a chaotic, random person will want next. The system only gets better at predicting for people who have a consistent "story" in their taste.

The "Magic" of the Findings

The authors tested this on 9 different types of data (movies, music, shopping, tourism) and 7 different AI models. Here is what they found:

  • The "Easy" vs. "Hard" Customers: The AI models are actually very good at learning the "Consistent" customers. But for the "Inconsistent" ones, even the most advanced AI performs no better than a random guess.
  • The Illusion of Progress: When we say "AI is getting better," we are mostly just getting better at serving the "Easy" (consistent) customers. We aren't actually solving the problem for the "Hard" (inconsistent) ones.
  • The "Noise" Problem: The paper suggests that the "Inconsistent" customers are actually noise in the data. They confuse the AI.

Practical Applications: What Can We Do With This?

The authors suggest three ways to use this new understanding:

  1. Stop Lying with Averages: Instead of saying "Our system is 80% accurate," we should say, "We are 95% accurate for consistent users, but only 10% for inconsistent ones." This helps developers know where to focus their energy.
  2. The "Chameleon" Strategy: Imagine a smart waiter who changes their approach based on the customer.
    • For the Consistent customer: "I know you love horror movies. Here is a new one you haven't seen." (Deep personalization).
    • For the Inconsistent customer: "I can't guess what you want, so let's just show you the most popular, safe items today." (Safe, broad recommendations).
  3. Training Smarter, Not Harder: The researchers proved that if you take a huge dataset and only train the AI on the "Consistent" customers, the AI actually gets better at predicting for that group, even though it has less data. It's like studying only the clearest examples to learn a language, rather than trying to learn from a dictionary full of typos and nonsense.

The Bottom Line

This paper tells us that not all users are created equal. Some have a clear, consistent story in their choices, and some are just a chaotic mix.

The future of recommendation systems isn't just about building bigger, smarter AI. It's about understanding the user first. If a user is chaotic, we shouldn't try to force a prediction; we should switch strategies. By measuring "coherence," we can build systems that are more honest, more efficient, and actually helpful to everyone, not just the easy-to-please ones.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →