Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects

This study evaluates five large language models for judicial sentencing support and finds that while they exhibit a stronger virtuous victim effect and lack a significant penalty for adjacent consent compared to humans, they generally demonstrate reduced prestige-based halo effects, particularly regarding credentials, though current variability still limits their immediate deployment in legal settings.

Sierra S. Liu

Published Thu, 12 Ma
📖 5 min read🧠 Deep dive

Imagine you are hiring a new judge for a courtroom. But instead of a human, you're considering a super-smart computer program (an AI) to help make the final decisions on who goes to jail and for how long.

The big question is: Is this computer judge fairer than a human judge, or does it have the same bad habits?

Humans are notoriously bad at being perfectly fair. We get tired, we get hungry, and we let things like a person's job title or their "victim status" cloud our judgment. This paper is like a report card testing five different AI models to see if they make these same mistakes.

Here is the breakdown of the study using some everyday analogies:

1. The "Good Victim" Trap (The Virtuous Victim Effect)

The Human Flaw: Imagine a person gets hurt. If they are a "perfect" victim (someone who never did anything wrong), we tend to think they are a saint. But if that same person had a complicated history with the person who hurt them (like they were friends or had a prior relationship), we suddenly start blaming the victim more, even if the harm was exactly the same. It's like thinking, "Well, they knew the guy, so maybe they asked for it."

The AI Test: The researchers asked the AI to judge two scenarios: one where a student's iPad was broken by a stranger, and another where a woman was assaulted by someone she had previously been intimate with (but then stopped).
The Result: The AI was too nice to victims. It rated victims as "more moral" than non-victims, even more so than humans do. However, unlike humans, the AI did not punish the victims for having a prior relationship. It didn't fall for the "they knew him, so it's their fault" trap.

  • The Analogy: If a human judge is a grumpy parent who blames the kid for playing with the wrong friend, the AI is an over-protective robot who thinks all victims are angels, but refuses to play the "blame game" based on past relationships.

2. The "Fancy Resume" Trap (The Halo Effect)

The Human Flaw: Humans love status. If a criminal is a famous doctor or works for a huge, prestigious company (like Goldman Sachs), we tend to give them a lighter sentence or think they are less guilty. If they are a janitor or work for a small local shop, we are harsher. It's like judging a book by its shiny cover.

The AI Test: The researchers gave the AI cases where the only difference was the defendant's job or company.

  • Company Prestige: When the defendant was a "Prestigious Company," humans demanded 3x more money in damages. The AI demanded more too, but usually less than humans (about 1.5x to 2x).
  • Job Title: When the criminal was a "Doctor" vs. a "Receptionist," humans gave the doctor a much lighter sentence. The AI was mostly indifferent, though one model (DeepSeek) acted a bit like a human and gave the doctor a break.
  • Credentials: When an expert witness came from a fancy Ivy League school vs. a regular state school, humans were easily swayed. The AI was much less swayed by the fancy school name.
  • The Analogy: Humans are like fans who buy a ticket just because the band is famous. The AI is more like a music critic who listens to the actual song, though it still gets a little distracted by the band's fame.

3. The "Rollercoaster" Problem (Consistency)

This is the most worrying part. Even if the AI is on average fairer than humans, it is unpredictable.

  • In one run, the AI might say a company owes $20 million. In the next run, with the exact same facts, it might say $300 million.
  • Some models (like Claude) refused to answer certain questions because they were "too sensitive."
  • The Analogy: Imagine a human judge who is grumpy in the morning but happy in the afternoon. Now imagine an AI judge that is a slot machine. Sometimes it pays out a fair verdict; sometimes it pays out a crazy one. You can't build a justice system on a slot machine.

The Final Verdict

Are AI judges ready to replace humans?
Not yet.

  • The Good News: The AI is generally less obsessed with fancy job titles and degrees than humans are. It doesn't fall for the "blame the victim for their past" trap.
  • The Bad News: The AI is too "pro-victim" (it thinks victims are saints even when they aren't), and its decisions are all over the place. One minute it's fair, the next it's wild.

In short: The AI is a student who studied hard and knows the rules better than the teacher (the human) regarding status and resumes, but it's still a bit too emotional about victims and its math is inconsistent. We need to fix the "slot machine" problem before we let it sit on the bench.