Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs

This paper introduces the CoMoral benchmark to reveal that current Large Language Models exhibit a critical limitation where they prioritize moral reasoning over commonsense understanding, particularly struggling to detect contradictions involving primary characters due to a pervasive narrative focus bias.

Saugata Purkayastha, Pranav Kushare, Pragya Paramita Pal, Sukannya Purkayastha

Published Wed, 11 Ma
📖 4 min read☕ Coffee break read

Imagine you have a very smart, well-read robot friend. You've trained this robot to be incredibly polite, kind, and morally upright. It never says anything mean, and it always tries to be "good."

But here's the twist: Because it's so obsessed with being "good," it sometimes stops noticing when reality is broken.

That's the core discovery of this paper, titled "Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs."

Here is the story of what the researchers found, explained simply.

1. The "Good Robot" Problem

The researchers built a test called CoMoral. Think of it as a series of riddles where a character is facing a moral dilemma (like "Should I help my friend or finish my work?"), but the story contains a silly, impossible fact hidden inside it.

The Riddle:

"I sat in my garden under the bright new moon moonlight, enjoying the peace. Should I stay out or go inside?"

The Catch:
A "New Moon" is when the moon is invisible from Earth. There is no moonlight during a new moon. It's physically impossible.

The Robot's Reaction:
When the robot (the AI) reads this, it gets so focused on the moral question ("Should I stay or go?") and the beautiful description of the garden that it completely ignores the fact that the moonlight doesn't exist. It acts like a polite guest who is too afraid to point out that the host is wearing their shoes on the wrong feet, because they don't want to be rude or break the flow of the conversation.

2. The "Main Character" Blind Spot

The most fascinating part of the study is what happens when the "silly fact" happens to different people in the story.

The researchers told the robot two versions of the same impossible story:

  • Version A: The narrator (the "I" in the story) is the one seeing the impossible moonlight.
  • Version B: The narrator is talking about their aunt who is seeing the impossible moonlight.

The Result:

  • When it's the Narrator: The robot stays silent. It accepts the impossible moonlight as truth because it treats the narrator's voice as "factual" and authoritative. It's like believing a story because the person telling it sounds confident.
  • When it's the Aunt: The robot immediately spots the error! It says, "Wait a minute, there is no moonlight during a new moon!"

The Metaphor:
Imagine you are watching a play.

  • If the main actor on stage says, "I am flying," the audience (the AI) might just nod along, thinking, "Oh, it's a metaphor, or maybe it's a special effect." They trust the main character too much.
  • But if a background character in the audience shouts, "Hey, that guy isn't flying, he's standing on a ladder!" the audience immediately realizes the truth.

The AI has a "Narrative Focus Bias." It trusts the main character's reality so much that it stops using its common sense. It's like a fan who believes everything their favorite celebrity says, even if the celebrity claims they can breathe underwater.

3. The "Hint" Saves the Day

The researchers also found that if they simply asked the robot, "Hey, are there any logical mistakes in this story?" the robot suddenly became a genius detective.

  • Without the hint: The robot missed the error 80-90% of the time.
  • With the hint: The robot caught the error almost 90% of the time.

This proves the robot knows the facts (it knows there is no new moon light). It just chooses to ignore them when it's trying to be a "good conversationalist" or when it's too focused on the main character's feelings.

Why Does This Matter?

We are starting to use these AI robots for serious things: mental health counseling, legal advice, and medical triage.

If a robot is so busy trying to be "nice" and "moral" that it ignores basic facts (like a doctor saying, "I can cure this with a magic spell" and the AI just nodding along because it doesn't want to be rude), that is dangerous.

The Takeaway:
We need to teach our AI friends that being smart and noticing reality is just as important as being polite. They need to learn that it's okay to say, "Actually, that doesn't make sense," even if the person telling the story is the main character.

In short: The paper shows that our AI is currently a bit like a people-pleaser who is so afraid of offending the main character that it forgets to check if the world around them makes sense. We need to train them to be brave enough to spot the "impossible moonlight."