TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

The paper introduces TERMINATOR, an inference-time early-stopping strategy for Large Reasoning Models that leverages predictable first-answer positions to train a model for identifying optimal reasoning lengths, thereby reducing Chain-of-Thought overthinking by 14%–55% across diverse benchmarks while maintaining performance.

Alliot Nagle, Jakhongir Saydaliev, Dhia Garbaya, Michael Gastpar, Ashok Vardhan Makkuva, Hyeji Kim

Published 2026-03-16
📖 4 min read☕ Coffee break read

Imagine you are asking a brilliant, over-enthusiastic student to solve a math problem.

The Problem: The "Overthinking" Student
This student (the AI) is incredibly smart. When you ask, "What is 2 plus 3?", they don't just say "5." Instead, they start a long monologue: "Let me think... 2 is a number. 3 is a number. If I put them together... hmm... wait, let me double-check. Is 2 plus 3 the same as 3 plus 2? Yes. Okay, I'm pretty sure it's 5. But just to be safe, let me write out the addition table again. And maybe check the history of numbers..."

They keep talking for thousands of words, even though they figured out the answer ("5") in the first sentence. This is called "overthinking." It wastes time, costs money (because computers use electricity to generate every word), and slows everything down.

The Goal: The "Stop" Button
Researchers wanted to teach this student when to stop talking. They knew there was a "perfect moment" to cut them off—right after they said "5" but before they started rambling about addition tables. If you cut them off there, they still get the right answer, but you save 50% of the time and energy.

The hard part? The student doesn't know when they are done. They just keep going until they run out of things to say.

The Solution: TERMINATOR
The paper introduces a new tool called TERMINATOR. Think of TERMINATOR not as a robot killer, but as a super-attentive TA (Teaching Assistant) sitting right next to the student.

Here is how it works, using a few analogies:

1. The "Hindsight" Training

First, the researchers needed to teach the TA what "done" looks like. They went back and looked at thousands of past conversations.

  • The Trick: They asked the student, "When was the very first time you actually said the answer?"
  • The Lesson: They marked that exact moment as the "Golden Stop Point." They taught the TA: "If the student says the answer, and then keeps talking, that's just fluff. Stop them immediately after the answer."

2. Reading the "Brain Waves"

You might think the TA just listens for the word "5." But the student might say "5" early on by accident, then change their mind. So, the TA looks deeper.

  • The Confidence Meter: The researchers noticed that when the student finally figures out the answer, their "confidence" spikes. It's like a sudden burst of energy.
  • The "Thinking" Tokens: The student uses specific filler words like "hmm," "wait," or "let me check" before they are sure. Once they have the answer, they stop using those words and start using words like "therefore" or "so."
  • The TA's Job: TERMINATOR is a tiny, super-fast detector that watches these "brain waves" (confidence levels) and "talking habits" (word choices) in real-time.

3. The "Sliding Window" Decision

TERMINATOR doesn't make a decision based on just one word. It looks at the last 10 words the student said.

  • If the TA sees a pattern where the student is confident and has stopped using "hmm" words, the TA raises a red flag.
  • Once the flag is raised enough times (a "majority vote"), the TA slams the Stop Button.
  • The student is forced to stop generating new words and immediately output the final answer.

Why is this a big deal?

Imagine you are paying for a taxi ride.

  • The Old Way: The driver takes you to your destination, but then keeps driving around the neighborhood for another hour just to "make sure they didn't miss a turn." You pay for the extra hour.
  • The TERMINATOR Way: A smart co-pilot sits in the back. The moment the driver says, "We're here," the co-pilot says, "Great, stop the car!" You arrive at the same time, but you only pay for the trip you actually needed.

The Results:
The paper tested this on hard math, coding, and science problems.

  • Speed: It cut the thinking time by 14% to 55%.
  • Accuracy: The answers were just as good as if the student had talked for the full hour.
  • Versatility: It worked on different types of "students" (different AI models) and different types of problems.

In a Nutshell:
TERMINATOR is a smart "stop-watch" for AI. It learns to recognize the exact moment an AI has solved a problem and cuts off the unnecessary rambling, saving time and money without losing any intelligence. It turns an over-enthusiastic student into an efficient one.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →