Idiom Understanding as a Tool to Measure the Dialect Gap

This paper introduces three new benchmark datasets for French idioms to demonstrate that while large language models perform well on Metropolitan French, they exhibit a significant dialect gap by struggling with Quebec French, thereby establishing regional idiom understanding as a reliable metric for measuring dialectal competence disparities.

David Beauchemin, Yan Tremblay, Mohamed Amine Youssef, Richard Khoury

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you have a super-smart robot librarian who has read almost every book in the world. You'd think this robot could understand any story, right? Well, this paper says: "Not so fast."

The researchers discovered that while this robot is a genius at reading standard, "fancy" French (like the kind spoken in Paris), it gets completely lost when trying to understand the colorful, slang-filled French spoken in Quebec.

Here is the story of their discovery, broken down into simple parts:

1. The "Idiom Test" (The Secret Code)

To test the robot's true understanding, the researchers didn't ask it to solve math problems. Instead, they gave it idioms.

Think of idioms like secret handshakes or inside jokes of a culture.

  • If you say to an American, "It's raining cats and dogs," they know you mean a heavy storm.
  • If you say to a Quebecer, "Attache ta tuque avec de la broche" (literally: "Fasten your toque [hat] with a wire"), they know you mean "Brace yourself for trouble."

If you only taught the robot standard French, it would look at "toque" and "wire" and think you were talking about a hat repair shop. It would miss the real meaning because that meaning comes from local history and culture, not just grammar rules.

2. Building the "Trap" Datasets

The researchers built three giant "traps" to catch the robots off guard:

  • The Quebec Trap (QFrCoRE): 4,633 tricky Quebec phrases.
  • The Quebec Word Trap (QFrCoRT): 171 specific Quebec words.
  • The Paris Control (MFrCoE): 4,938 standard French phrases (to see how well the robots do when they aren't tricked).

They asked 111 different AI models (from big companies like OpenAI and Google to smaller open-source ones) to play a multiple-choice game: "Here is a phrase. Which definition is correct?"

3. The Big Reveal: The "Dialect Gap"

The results were shocking. It's like finding out the robot librarian can read Shakespeare perfectly but can't understand a joke told by a local comedian.

  • The "Elite" Robots: The most expensive, proprietary robots (the ones you pay for) did okay, but they still struggled.
  • The "Open-Source" Robots: The free, community-built robots did terrible. About 66% of all the models performed significantly worse on the Quebec phrases than on the standard French ones.
  • The "Random Guess" Problem: Over 40% of the models performed worse than if you had just closed your eyes and picked an answer at random!

The Analogy: Imagine a chef who is a master at cooking a perfect, classic French steak. But if you ask them to cook a traditional Quebec poutine, they might burn the cheese curds or forget the gravy entirely. They know the ingredients (words), but they don't know the recipe (culture).

4. Why Does This Happen?

The researchers found that making the robots "bigger" or "smarter" didn't fix the problem.

  • Size doesn't matter: A giant robot with 100 billion parameters still failed just as hard as a small one.
  • Reasoning doesn't help: Even robots designed to "think" harder couldn't figure out the jokes.
  • The Real Culprit: Training Data. The robots were trained mostly on data from France and the US. They simply hadn't read enough Quebec books, websites, or news articles to learn the local slang.

5. The Unfair World (The "AI Colonization")

This is the most serious part of the paper. The researchers call this "AI Colonization."

Here is the dilemma for a Quebecer:

  1. Option A: Use a free, open-source AI. Result: The AI misunderstands you, gets your slang wrong, and sounds like a clueless tourist.
  2. Option B: Use a paid, premium AI. Result: The AI understands you, but it costs money, and you have to send your private conversations to a big tech company.

The Conclusion: To be understood by AI, people are forced to stop speaking their natural dialect and start speaking the "prestige" dialect (Standard French). It's like being forced to stop speaking with a local accent just to get a job interview with a robot.

The Takeaway

This paper proves that knowing a language isn't just about grammar; it's about culture.

If we want AI to be truly helpful for everyone, we can't just teach it the "standard" version of a language. We have to teach it the local jokes, the regional slang, and the inside jokes of every community. Otherwise, the AI will always be a tourist in the world, never a local.