Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments

This paper introduces the Multilingual Reasoning Gym, a procedurally generated framework that extends the original Reasoning Gym to 14 languages with native-speaker validation, enabling the scalable creation of parallel, verifiable reasoning problems for training and evaluating multilingual models.

Konstantin Dobler, Simon Lehnerer, Federico Scozzafava, Jonathan Janke, Mohamed Ali

Published 2026-03-12
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a robot how to solve puzzles. For a long time, you've only been able to teach it using English puzzles. The robot gets really good at English math and logic, but if you switch the puzzle to Japanese, German, or Swahili, the robot gets confused because it hasn't practiced those languages.

This paper introduces a new tool called the Multilingual Reasoning Gym. Think of it as a giant, magical puzzle factory that can instantly create millions of unique brain-teasers in 14 different languages.

Here is how it works, broken down with some everyday analogies:

1. The Problem: The "Static Library" vs. The "Infinite Factory"

Before this, researchers had to use "Static Libraries" (like a bookshelf). They would take a book of math problems, translate the whole book into French, then translate the whole book into Spanish.

  • The Catch: There are only so many pages in a book. Once the robot has read all the translated books, it memorizes the answers instead of learning how to think. Also, some languages are "underserved," meaning there are very few translated books for them.

The Solution: The Multilingual Reasoning Gym is an Infinite Factory.
Instead of writing out every single puzzle, the researchers wrote a set of master templates (like a recipe).

  • The Recipe: "Take two numbers, add them, and ask for the sum."
  • The Magic: The factory can use this one recipe to bake 1,000,000 different cakes (puzzles) instantly. It can bake them in English, then immediately bake 1,000,000 different cakes in Japanese, using the exact same recipe logic.

2. The Challenge: It's Not Just "Google Translate"

You can't just run a machine translator on these recipes. If you translate a math problem literally, it often sounds weird or breaks the rules of the language.

  • The Analogy: Imagine a recipe that says, "Add three s to the end of the word." In English, that works for "cat" \rightarrow "cats." But in German or Japanese, you can't just add an "s." The word changes completely.
  • The Fix: The team didn't just translate the words; they re-wrote the instructions for each language.
    • They made sure Japanese used the correct punctuation (like full-width commas).
    • They swapped English math terms for the specific terms used in German schools.
    • They even changed how they asked for answers so the robot wouldn't get confused by grammar rules that don't exist in English.

They used a team of human experts (native speakers) to taste-test the recipes, ensuring the puzzles sounded natural and fair, not like a robot wrote them.

3. The Result: A Global Training Ground

Now, researchers can train their AI models using this gym.

  • Fair Play: Because the factory uses the same "seed" (the same starting point) for all languages, they can generate a puzzle in English and the exact same puzzle in Korean at the same time. This lets them test if the AI is actually smart, or if it's just good at English.
  • Adjustable Difficulty: Just like a video game, you can turn the dial from "Easy" (for a beginner robot) to "Hard" (for a pro robot) instantly, in any language.

Why Does This Matter?

Think of AI models as students.

  • Before: The student only studied in an English classroom. They were great at English math but failed when the teacher switched to Spanish.
  • Now: The Multilingual Reasoning Gym puts the student in a classroom where they can practice math, logic, and coding in 14 different languages simultaneously. It ensures that when the AI talks to a user in Swahili or Thai, it's just as smart as when it talks to a user in English.

In short: This paper gives us a machine that can generate infinite, high-quality logic puzzles in many languages, helping us build AI that is truly smart and fair for everyone, not just English speakers.