On the Concept of Violence: A Comparative Study of Human and AI Judgments

This study systematically compares human judgments and Large Language Model classifications across 22 morally ambiguous scenarios to investigate how AI systems operationalize the concept of violence, revealing their role in shaping public reasoning about harm and social norms.

Original authors: Mariachiara Stellato, Francesco Lancia, Chiara Galeazzi, Nico Curti

Published 2026-02-20
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Idea: Who Defines "Violence"?

Imagine you are at a dinner party, and someone asks, "Is that action violent?"

  • Human Answer: "Well, it depends. If they said it while smiling, maybe not. If they said it while angry, yes. If they were interrupted before they could finish, maybe it doesn't count." Humans are messy, nuanced, and love to say, "It depends."
  • AI Answer: "I have processed the data. The answer is Violence." or "The answer is Not Violence." AI loves to pick a side and stick to it.

This study asked: When we ask computers to judge what is violent, do they think like us, or do they have a completely different moral compass?

How They Did It: The Radio Game Show

The researchers didn't start in a sterile lab. They started on an Italian radio show called Chiacchiericcio (which means "chatter").

  1. The Human Test: The host read 22 tricky scenarios to the listeners (like "A comedian insults an audience member" or "Protesters block a road"). Over 3,000 people voted on whether these were "Violence," "Not Violence," or "It Depends."
  2. The Robot Test: The researchers took those same 22 scenarios and asked 18 different AI chatbots (like Llama, Mistral, and others) to vote on them. They forced the AIs to pick just one answer, just like the humans.

What They Found: The "Robot vs. Human" Clash

The results were fascinating because the robots and humans agreed on the "easy" stuff but fought over the "gray areas."

1. The "It Depends" Problem

Humans love the middle ground. When things were ambiguous, humans often voted "It Depends."

  • The AI Twist: The AIs hated the middle ground. They rarely chose "It Depends." Instead, they squeezed those ambiguous situations into either "Violence" or "Not Violence."
  • Analogy: Imagine a human looking at a gray sky and saying, "It might rain." The AI looks at the same sky and forces a binary choice: "It is raining" OR "It is sunny." The AI is less comfortable with uncertainty.

2. The "Keyboard Warrior" Gap (Online Insults)

  • The Scenario: Someone sends a mean message or insults a group on social media.
  • Human View: 90% of humans said, "Yes, that is violence." They see the emotional harm and the psychological attack.
  • AI View: Only 50% of the AIs called it violence. Many AIs said, "That's just words. No one got hit with a fist."
  • The Takeaway: Humans understand that words can hurt just as much as fists. The AIs, however, seem to have a "physical bias." They are trained to recognize punches and kicks, but they struggle to recognize the violence of a nasty comment.

3. The "Interrupted Villain" Paradox

  • The Scenario: A speaker is about to say something terrible (like "We should eliminate a group of people"), but the host cuts them off before they can finish.
  • Human View: Humans said, "No, that's not violence. They didn't actually say it! They were stopped."
  • AI View: The AIs said, "Yes, that is violence."
  • The Takeaway: The AIs focused on the intent and the words that would have been said. Humans focused on the outcome (nothing bad actually happened because they were stopped). The AI is judging the idea; the human is judging the action.

Why Do the Robots Think This Way?

The study found that bigger, smarter AI models didn't necessarily act more like humans. A model with 10 billion parameters wasn't necessarily better at understanding "violence" than one with 1 billion.

Instead, it comes down to how they were trained.

  • The "Safety Filter" Effect: AI companies train their bots to be "safe" and "neutral." This often means they are taught to avoid controversial topics.
  • The Result: When an AI is unsure, it doesn't say, "I'm confused." It picks the "safest" answer based on its programming. This makes them look rigid and sometimes wrong compared to the fluid, context-aware human mind.

The Big Warning: Don't Trust the Robot Judge

The authors end with a crucial warning for all of us.

The Trap: Because AI speaks so confidently and fluently, we tend to trust it like a teacher or a judge. We think, "The computer said it's not violence, so it must be true."

The Reality: The AI isn't a moral expert. It's a statistical guesser. It's like a very well-read parrot that has memorized millions of books but doesn't actually understand the pain of a human heart.

  • Search Engines (The Old Way): If you Google a question, you get 10 different links. You have to read them, compare them, and decide for yourself. You see the disagreement.
  • Chatbots (The New Way): The AI gives you one perfect answer. It hides all the disagreement and uncertainty. It makes you feel like there is only one "correct" truth.

The Bottom Line

This study shows that while AI is getting better at mimicking humans, it still misses the nuance of human morality.

  • Humans see violence in words, in silence, and in context.
  • AI sees violence mostly in physical actions and struggles with the "gray areas."

The Lesson: When you ask an AI, "Is this violent?", don't treat its answer as the final verdict. Treat it as a second opinion from a very literal, slightly confused robot. The real judgment still belongs to us.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →