ReasonXL: Shifting LLM Reasoning Language Without Sacrificing Performance

Imagine you have a brilliant, multilingual genius named Reasoning-Rob. He can solve complex math problems, write code, and explain science, but he has a strange quirk: no matter what language you speak to him, he always thinks in English inside his head before speaking your language.

If you ask him a question in German, he translates your question to English, does all the hard thinking in English, and then translates the answer back to German.

The Problem:
This creates a few issues:

Trust: If you don't speak English, you can't see how he solved the problem. You just see the answer, which feels like magic rather than logic.
Errors: Sometimes, translating a tricky math problem into English and back loses the "flavor" or specific meaning of the original question.
Bias: He's just better at English. His English answers are usually more accurate than his German or French ones.

The paper introduces a project called ReasonXL to fix this. Here's how they did it, explained with some everyday analogies:

1. The Massive Library (The Dataset)

To teach Reasoning-Rob to think in other languages, the researchers couldn't just tell him "Please think in German." They needed to show him examples of thinking in German.

They built ReasonXL, a massive digital library containing over 2 million examples for five languages (English, German, French, Italian, and Spanish).

The Analogy: Imagine they took a huge pile of English textbooks where the author wrote out their step-by-step thinking process, and they hired a team of expert translators to rewrite every single step of the thinking process into the other languages.
The Result: Now, instead of just having "Question -> Answer," they have "Question -> Thinking Process in German -> Answer."

2. The Two-Step Training Camp (The Method)

They didn't just dump this library into the robot's brain. They used a two-step training camp to rewire his brain without breaking his intelligence.

Step A: The "Language Swap" (Supervised Fine-Tuning)

What happened: They showed the robot the new library and said, "From now on, when you see a German question, you must write your thoughts in German."
The Side Effect: It worked! The robot started thinking in German. But, like a student who is so focused on speaking the new language that they forget the math, his answers got a bit worse. He was thinking in German, but his logic was shaky.

Step B: The "Coach's Whistle" (Reinforcement Learning)

What happened: They gave the robot a second round of training. This time, they acted like a strict coach. The robot would try to solve a math problem in German. If the answer was right and the thinking was in German, the coach gave a high-five (a reward). If the answer was wrong or he slipped back into English, the coach gave a gentle tap on the wrist (a penalty).
The Result: This "coach" helped the robot regain his sharp logic while keeping his new habit of thinking in German. Suddenly, he was thinking in German and getting the answers right, often even better than before!

3. The "Brain Scan" (The Discovery)

The researchers didn't just stop at the results; they looked inside the robot's "brain" (the neural network) to see how the change happened. They found two fascinating things:

The "Switch" Layer: They discovered that the robot's brain has a specific "switch" located in the middle layers (around layers 6–8). This is where the decision to speak German or English is actually made. It's like a traffic light that decides which language highway the thoughts travel on.
The "Refinement" Layers: The top layers of the brain (the later ones) are where the actual heavy lifting happens—solving the math, writing the code, and refining the logic.
The Efficiency Surprise: They found that the second training step (the "Coach's Whistle") was incredibly efficient. It managed to completely change the robot's behavior by tweaking very few parts of the brain, whereas the first step (the "Language Swap") required changing a lot of weights. It's like the first step built a new road, but the second step just installed a few smart traffic signs that made the whole system run perfectly.

Why Does This Matter?

Before this, the AI world assumed that to be smart, you had to think in English. This paper proves that you don't.

For Users: You can now get AI assistants that explain their reasoning in your native language, making them easier to trust and understand.
For AI: It shows that we can teach AI to be "native" in any language without losing its smarts.

In a nutshell: The researchers built a giant library of "thinking in other languages," taught a robot to use it, and discovered that the robot's brain has a specific "switch" for language and a "workshop" for logic. They proved that an AI can be just as smart in German, French, or Spanish as it is in English, as long as we give it the right tools to learn.

ReasonXL: Shifting LLM Reasoning Language Without Sacrificing Performance

1. The Massive Library (The Dataset)

2. The Two-Step Training Camp (The Method)

3. The "Brain Scan" (The Discovery)

Why Does This Matter?

1. Problem Statement

2. Methodology

A. Dataset: ReasonXL

B. Training Pipeline: Two-Stage Adaptation

3. Key Contributions

4. Results

Performance on Target-Language Tasks

General Knowledge & Cross-Lingual Transfer

5. Significance and Implications

ReasonXL: Shifting LLM Reasoning Language Without Sacrificing Performance

1. The Massive Library (The Dataset)

2. The Two-Step Training Camp (The Method)

3. The "Brain Scan" (The Discovery)

Why Does This Matter?

1. Problem Statement

2. Methodology

A. Dataset: ReasonXL

B. Training Pipeline: Two-Stage Adaptation

3. Key Contributions

4. Results

Performance on Target-Language Tasks

General Knowledge & Cross-Lingual Transfer

5. Significance and Implications

More like this

Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision

LLMs Struggle with Abstract Meaning Comprehension More Than Expected

Benchmarking Deflection and Hallucination in Large Vision-Language Models

Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration