Gender Bias in MT for a Genderless Language: New Benchmarks for Basque

This paper introduces two new benchmarks, WinoMTeus and FLORES+Gender, to evaluate gender bias in machine translation involving Basque, revealing that current large language models and MT systems exhibit a systematic preference for masculine forms when translating between genderless and gendered languages.

Amaia Murillo, Olatz-Perez-de-Viñaspre, Naiara Perez

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you have a group of very smart, but slightly biased, robots. These robots are trained by reading almost everything ever written on the internet. Because the internet has a lot of old-fashioned ideas about men and women, these robots often accidentally learn those stereotypes too.

For example, if you ask a robot to translate a sentence about a "nurse" from a language that doesn't care about gender (like Basque) into a language that does (like Spanish or French), the robot might guess, "Oh, nurses are usually women, so I'll use the female word." But if you ask about a "mechanic," it might guess, "Mechanics are usually men, so I'll use the male word."

The problem is, the robots often get this wrong based on reality. In the real world, there are plenty of male nurses and female mechanics, but the robots stick to their old stereotypes.

This paper is like a detective report from a team of researchers in the Basque Country. They wanted to see if these robots were being fair when dealing with the Basque language, which is unique because it doesn't have "male" or "female" words for jobs or people.

Here is how they investigated, using two creative "tests":

Test 1: The "Job Swap" (WinoMTeus)

The Setup: Imagine you have a list of jobs in Basque where the word is neutral (it doesn't say "male nurse" or "female nurse"). It just says "nurse."
The Experiment: The researchers asked the robots to translate these neutral jobs into Spanish and French.
The Trap: Since Spanish and French must pick a gender (you can't say "the nurse" without saying "the male nurse" or "the female nurse"), the robot has to make a guess.
The Reality Check: The researchers compared the robots' guesses against real-life statistics from the Basque Country. They asked: "Did the robot guess that 90% of nurses are men, even though in real life, 96% are women?"

The Verdict: The robots were guilty! They had a strong habit of defaulting to the "male" version, even for jobs that are mostly done by women in real life. It's like a robot that thinks every doctor is a man and every secretary is a woman, just because it read too many old books.

Test 2: The "Mirror Test" (FLORES+Gender)

The Setup: This time, they did the reverse. They took sentences from English and Spanish where the gender was clearly marked (e.g., "The male driver" vs. "The female driver").
The Experiment: They asked the robots to translate these into neutral Basque.
The Question: Does the robot translate the sentence better if the person in the story is a man? Does it stumble more if the person is a woman?
The Analogy: Imagine a translator who is slightly more confident and fluent when talking about men, but gets a little nervous and makes more mistakes when talking about women.

The Verdict: The results were a bit mixed, but there was a hint of bias. In some cases, the robots translated sentences about men slightly better than sentences about women. It's as if the robot's "muscle memory" is stronger for male stories because it has seen them more often in its training data.

The Big Picture

The researchers found that even though Basque is a language that naturally treats men and women equally, the robots translating into or out of Basque are bringing their own baggage with them. They are acting like a broken mirror that distorts reality to fit an old stereotype.

Why does this matter?
If we use these robots to translate job ads, news, or medical advice, they might accidentally tell a woman she can't be a mechanic or tell a man he can't be a nurse. This paper is a wake-up call: we need to build better "glasses" for these robots so they can see the real world, not just the biased world they were trained on.

In short: The robots are smart, but they are also a bit sexist. The researchers built new tools to catch them in the act, proving that we need to teach them to be fairer, especially for languages like Basque that deserve to be treated with respect.