Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?

This paper investigates the ability of large language models to collaborate on shared reasoning trajectories through "off-trajectory reasoning," revealing that even high-performing models struggle to recover from misleading traces or build upon guidance from stronger collaborators, while identifying specific post-training factors like teacher model quality and data selection that critically influence these collaborative capabilities.

Aochong Oliver Li, Tanya Goyal

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are teaching a group of students how to solve a very difficult math problem. In the traditional classroom (which is how most AI models are trained today), each student works alone. They are given a problem, they think through it step-by-step on their own, and then they write down the answer. If they get it right, they get a gold star.

This paper asks a different question: What happens if we put these students in a group project where they have to share their thought process in real-time?

The researchers call this "Off-Trajectory Reasoning." It's like asking: Can a smart student look at a friend's messy, confusing, or even wrong notes, ignore the bad parts, and still solve the problem? Or, can a struggling student look at a genius's notes and actually learn from them to solve a problem they couldn't do alone?

To test this, the researchers created two fun "games" (tests) to see how well 15 different AI models (ranging from small, simple ones to massive, complex ones) handle these group dynamics.

The Two Games

1. The "Red Herring" Game (Recoverability)

Imagine you are solving a math problem. You are doing great! Then, suddenly, a classmate whispers a completely wrong idea into your ear, like, "Wait, actually, the answer is 350 years old because of carbon dating!" (even though you are solving an algebra equation).

  • The Test: Can you ignore that weird distraction, realize it doesn't make sense, and go back to your original correct path?
  • The Shocking Result: The researchers found that the smartest students (the biggest AI models) were actually the most fragile. When a smart model got distracted, it often got confused and gave up or followed the wrong path. The "smaller," less famous models were surprisingly better at shaking off the distraction and staying on track. It's like a genius student getting so flustered by a silly comment that they forget how to do basic math, while a regular student just shrugs it off and keeps working.

2. The "Mentor" Game (Guidability)

Now, imagine you are stuck on a problem you can't solve. A genius student hands you their first few steps of the solution. They show you the right way to start.

  • The Test: Can you take those correct first steps and finish the rest of the problem on your own?
  • The Shocking Result: Almost none of the models could do this effectively. Even when the "genius" gave them the perfect start, the struggling models couldn't build on it. They seemed to hit a "ceiling." It's like giving a student the first three lines of a poem written by Shakespeare; they still can't finish the poem in Shakespeare's style. They just couldn't leverage the help to go beyond their own limits.

The "Why" Behind the Failure

The researchers didn't just stop at the results; they played detective to find out why this was happening. They looked at how these AI models were trained.

  1. The "Bad Teacher" Effect: When a small model is trained by copying a "Teacher" model (a process called distillation), it doesn't just copy the answers; it copies the habits. If the Teacher model is easily distracted or fragile, the Student model inherits those bad habits, even if the Student is only shown the correct answers during training.
  2. The "Practice Makes Perfect" vs. "Practice Makes Robust" Problem: Most AI models are trained using a method called Supervised Fine-Tuning (SFT), which is like showing them a textbook of perfect solutions. This makes them great at solo tests. But the researchers found that using Reinforcement Learning (RL)—which is like letting the model try, fail, get corrected, and try again—made them much better at recovering from mistakes. It taught them how to fix errors, not just what the right answer looks like.
  3. The "Less is More" Trap: Some recent trends suggest that training AI on a tiny amount of "super high-quality" data is better than using a lot of data. The researchers found that while this made the models good at solo tests, it made them very unstable in group settings. They became unpredictable; sometimes they worked great, sometimes they failed miserably.

The Big Takeaway

The paper concludes that being good at taking a solo test does not mean you are good at collaborating.

Currently, we are building AI models that are incredible at working alone but terrible at working with others (or even with humans). They are easily confused by wrong turns and can't really learn from a partner's help.

The Lesson for the Future:
If we want AI to work in teams—where a human and an AI, or a big AI and a small AI, work together to solve problems—we can't just train them to be "smart" in isolation. We need to specifically train them to be resilient (able to bounce back from distractions) and coachable (able to learn from others' hints). We need to teach them that it's okay to get confused, and that the best way to learn is not just by memorizing the right answer, but by practicing how to recover when things go wrong.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →