Detecting Abnormal User Feedback Patterns through Temporal Sentiment Aggregation

This paper proposes a temporal sentiment aggregation framework using RoBERTa to extract and aggregate per-comment sentiment signals into time-window scores, enabling the effective detection of anomalous user feedback patterns through significant downward shifts in these scores.

Yalun Qi, Sichen Zhao, Zhiming Xue, Xianling Zeng, Zihan Yu

Published 2026-04-03
📖 4 min read☕ Coffee break read

Imagine you are the manager of a busy airline. Every day, thousands of passengers tweet, post on forums, or leave reviews about their flights. Some are happy, some are angry, and most are just complaining about a delayed flight or a lost suitcase.

If you tried to read every single comment one by one, you'd go crazy. Plus, if you just looked at one angry tweet, you might think, "Oh no, is the airline failing?" But maybe that person just had a bad day. You need a way to see the big picture without getting lost in the noise.

This paper is about building a "smart weather radar" for customer feelings. Here is how they did it, explained simply:

1. The Problem: Too Much Noise, Too Little Signal

Think of individual customer comments like raindrops hitting a tin roof.

  • One raindrop (one angry tweet) makes a loud ping.
  • But if you only listen to single pings, you can't tell if it's a light drizzle or a massive storm.
  • Traditional methods try to classify every single raindrop as "good" or "bad." But short comments are messy. A sarcastic "Great job, we're late again!" might be misread as "Great!" by a computer.

The authors realized that looking at individual drops isn't enough. You need to measure the flood level over time.

2. The Solution: The "Bucket" Method (Temporal Aggregation)

Instead of counting raindrops one by one, the authors propose a simple trick: The Bucket Strategy.

  • The Bucket: They group comments into time windows (like buckets that fill up every hour or every 100 comments).
  • The Aggregation: Inside each bucket, they mix all the feelings together. They take the "mood" of 100 people and average it out.
  • The Result: This smooths out the weird, noisy outliers. If one person screams about a lost sandwich, it doesn't crash the whole bucket's score. But if everyone in the bucket is suddenly angry because the flight was cancelled, the bucket's "mood score" drops dramatically.

3. The Engine: A Smart Translator (RoBERTa)

To understand what people are saying, they used a super-smart AI called RoBERTa.

  • Think of RoBERTa as a highly experienced translator who knows slang, emojis, and sarcasm better than a dictionary.
  • It reads every comment and gives it a score: +1 for happy, 0 for neutral, and -1 for angry.
  • The AI doesn't try to be perfect on every single sentence; it just gives a "best guess" score.

4. The Alarm System: Watching for the Drop

Once they have the "mood score" for each bucket (time window), they watch the line graph.

  • Normal Behavior: The line wiggles a little bit, like a heartbeat.
  • The Anomaly: Suddenly, the line takes a nose dive.
  • The Alarm: The system is programmed to scream "ALARM!" only when the mood drops sharper than usual. It's not looking for a bad day; it's looking for a sudden crash in happiness.

5. The "Why" Detective: Topic Awareness

Here is the clever part. Sometimes the mood drops, but why?

  • Did everyone hate the food?
  • Did the planes stop flying?
  • Was the customer service rude?

The authors added a sorting hat to their buckets. They didn't just mix all comments together; they separated them into categories (like "Lost Luggage," "Late Flights," "Rude Staff").

  • Now, instead of just knowing "The mood is bad," the system can say: "The mood is bad specifically because of Lost Luggage."
  • This is like a doctor who doesn't just say "You have a fever," but says "You have a fever because of an infection in your ear." It tells the airline exactly what to fix.

The Real-World Test

They tested this on real social media data from an airline.

  • The Result: The system successfully spotted moments when the mood crashed.
  • The Proof: When they looked at those crash moments, they found real, coherent stories. For example, when the system flagged a "crash," it turned out there was a massive wave of complaints about a specific flight delay or a baggage handling issue. It wasn't random noise; it was a real problem.

The Big Takeaway

This paper teaches us that stability is better than perfection.
You don't need a perfect AI that understands every single joke or typo. You just need a system that groups feelings together, smooths out the noise, and watches for sudden, dramatic changes.

In short: Don't listen to the shouting of one person; listen to the roar of the crowd, and pay attention when that roar suddenly turns into a scream. That's when you know something is wrong.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →