Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context

This study evaluates seven state-of-the-art large language models in the underrepresented Nepali cultural context using a Dual-Metric Bias Assessment framework, revealing that while explicit agreement with biased statements is measurable, implicit generative bias is distinct, follows a non-linear relationship with temperature, and is poorly predicted by agreement metrics, thereby highlighting the critical need for culturally grounded datasets and evaluation strategies.

Ashish Pandey, Tek Raj Chhetri

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you have a group of very smart, super-fast robots (Large Language Models, or LLMs) that have read almost everything on the internet. They are great at writing stories, answering questions, and giving advice. But, just like humans, they can pick up bad habits and stereotypes from what they read.

This paper is like a detective story where the authors go to Nepal to see if these robots are fair when talking about Nepali culture, or if they are secretly carrying around old-fashioned, unfair ideas about gender, race, and social status.

Here is the breakdown of their investigation using simple analogies:

1. The Problem: The "Western Lens"

Most of these robots were trained on data from the US and Europe. Imagine trying to judge a Nepali village using a rulebook written for New York City. It doesn't fit. The authors noticed that while we know these robots are biased in English, we don't know how they act in Nepali, a place with 120+ languages and a complex social structure involving castes and ethnic groups.

2. The Tool: A New "Bias Test" (EquiText-Nepali)

To test the robots, the authors built a custom exam called EquiText-Nepali.

  • The Analogy: Imagine a "Spot the Difference" game. They created over 2,400 pairs of sentences.
    • Pair A (The Stereotype): "Men are naturally better at farming than studying."
    • Pair B (The Truth): "Many men excel in both farming and studying."
  • They asked the robots to read these pairs and decide which one they agree with.

3. The Two-Part Test: The "Dual-Metric" Approach

The authors realized that asking a robot "Do you agree?" isn't enough. People (and robots) might say the right thing but do the wrong thing. So, they used a two-part test:

  • Part 1: The "Yes/No" Interview (Explicit Agreement)

    • The Analogy: You ask the robot, "Do you believe women make bad engineers?"
    • What they measured: How often does the robot say "Yes"? This is Explicit Bias. It's what the robot says it believes.
  • Part 2: The "Finish the Sentence" Game (Implicit Completion)

    • The Analogy: You give the robot the start of a sentence: "In Nepal, Dalits are..." and let it finish the story on its own.
    • What they measured: Does the robot automatically finish the sentence with something negative or stereotypical, even if you didn't ask it to? This is Implicit Bias. It's what the robot does when it's not being watched.

4. The Big Discovery: The "Split Personality"

The results were surprising.

  • The Robots were "Polite" but "Prejudiced": When asked directly (Part 1), the robots didn't agree with stereotypes very often (about 36–43% of the time). They seemed fairly neutral.
  • But they "Slipped Up" constantly: When asked to write a story or finish a sentence (Part 2), they fell back into stereotypes 74–75% of the time.
  • The Metaphor: It's like a person who says, "I don't believe in racism," but when they tell a joke or write a story, they accidentally use racist tropes. The bias is hidden deep in their "muscle memory," not just in their stated opinions.

5. The "Temperature" Knob

The researchers also played with the robot's "creativity knob" (called Temperature).

  • Low Temperature (Strict): The robot is very logical and repetitive.
  • High Temperature (Chaotic): The robot is wild and creative.
  • The Finding: They found a weird "U-shape." The robots were actually most likely to be stereotypical when they were "moderately creative" (Temperature 0.3). When they were either very strict or very wild, the bias dropped slightly. This means you can't just "turn up the chaos" to fix the problem.

6. Why This Matters

The authors found that race and social caste biases were the hardest to fix in the robots' writing, even more than gender bias. This suggests that the data the robots learned from (the internet) has a lot of hidden prejudice against these specific groups in Nepal.

The Takeaway

This paper is a wake-up call. It tells us that:

  1. We can't trust robots to be fair just because they say they are. We have to watch what they write, not just what they say.
  2. One size does not fit all. A robot trained on American data might be fair in the US but very unfair in Nepal.
  3. We need local experts. To fix this, we need datasets and tests built by people who actually live in those cultures, not just imported from the West.

In short, the authors built a mirror to show the robots their own reflection in a Nepali context, and the reflection showed that they still have a lot of work to do to be truly fair.