Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

This position paper argues that anthropomorphizing intermediate token generation as "reasoning traces" or "thoughts" is a dangerous misconception that obscures the true nature of language models, hinders their effective use, and leads to flawed research, urging the community to abandon such metaphors.

Subbarao Kambhampati, Karthik Valmeekam, Siddhant Bhambri, Vardhan Palod, Lucas Saldyt, Kaya Stechly, Soumya Rani Samineni, Durgesh Kalwar, Upasana Biswas

Published 2026-03-09
📖 6 min read🧠 Deep dive

The Core Message: Stop Treating AI "Thinking" Like Human Thinking

Imagine you have a very smart, very fast robot that can solve complex math problems or write code. When you ask it a question, it doesn't just spit out the answer immediately. Instead, it takes a moment to "talk to itself," generating a long stream of text before giving you the final result.

The AI research community has started calling this internal monologue "Reasoning" or "Thoughts." They treat it like a human sitting down with a pencil and paper, working through the steps, having "Aha!" moments, and correcting their own mistakes.

This paper argues that this is a dangerous lie.

The authors, a team of researchers from Arizona State University, are saying: "Stop anthropomorphizing (giving human traits to) these intermediate tokens." They believe that calling these text strings "thoughts" is not just a harmless metaphor; it's actively confusing and dangerous because it makes us trust the AI too much when we shouldn't.


The Analogy: The "Scripted Actor" vs. The "Real Thinker"

To understand why the authors are worried, let's use an analogy.

The Current View (The "Human" View):
Imagine you are watching a magician. Before pulling a rabbit out of a hat, the magician mumbles, "Let's see... the rabbit is hungry, I need to check the hat, okay, here we go!" You assume the mumbling is the magician actually thinking about the trick. You trust the rabbit because the mumbling sounded logical.

The Authors' View (The "Scripted" View):
The authors argue that the AI isn't a magician thinking. It's more like a method actor who has memorized a script.

  • The AI has read millions of human stories where people say "Hmm," "Wait a minute," or "Aha!" when solving problems.
  • The AI learned that if it outputs these specific words before the answer, it gets a "reward" (it gets the answer right more often).
  • So, it generates a long, rambling script that sounds like thinking. It might say "Aha!" not because it had a sudden realization, but because the word "Aha!" statistically leads to the correct answer in its training data.

The Danger:
If you think the AI is "thinking," you might trust a wrong answer just because the "thinking" part sounded convincing. It's like trusting a magician because he mumbled the right words, even if he's actually just pulling a rabbit out of a sleeve he didn't tell you about.


The Evidence: Why "Thinking" is a Myth

The paper provides several pieces of evidence to prove that these "thoughts" aren't real reasoning:

1. The "Swapped Script" Experiment
Researchers took models and trained them with "nonsense" scripts. Imagine teaching a student to solve math problems, but instead of showing them the correct steps, you show them a script that says "Add 2 + 2 and get 5," but then magically gives the correct answer of 4 at the end.

  • Result: The AI still learned to get the right answer (4), even though the "thinking" part was completely wrong.
  • Meaning: The AI doesn't care if the "thoughts" make sense. It just cares that the pattern of "Script + Answer" leads to a reward.

2. The "Aha!" Moment
DeepSeek's famous AI (R1) was praised for having "Aha!" moments in its text.

  • The Reality: The AI doesn't have an internal state that changes when it says "Aha!" It's just a token (a word) in a sequence. It's like a parrot saying "I'm happy!" when it sees a banana, not because it feels joy, but because that's the sound associated with bananas.

3. Length Doesn't Equal Effort
We often think, "Wow, this AI wrote 500 words of thinking; it must be working really hard!"

  • The Reality: The paper shows that AI models often generate longer and more confusing texts when they are trained to do so, even for simple problems. Sometimes, they babble for pages just to fill space. The length of the text is often a side effect of how the AI was trained, not a measure of how "smart" the solution is.

Why Does This Matter? (The "False Confidence" Trap)

The authors are worried about three main things:

  1. False Trust: If users believe the AI is "thinking," they will trust its answers blindly. If the AI says, "I calculated this carefully," but it actually just guessed and wrote a fancy story to justify it, the user might make a bad decision based on that answer.
  2. Bad Research: Scientists are wasting time trying to make these "thoughts" more human-readable or trying to fix the "logic" in the text. But if the text isn't actually logic, they are trying to fix a ghost. They should be focusing on making the final answer correct, not the intermediate chatter.
  3. The "Black Box" Problem: Some companies (like OpenAI) hide their intermediate tokens because they know they aren't interpretable. They show a "summary" instead. The authors argue this is honest. But other companies (like DeepSeek) show the full "thinking" text, which tricks people into thinking they understand how the AI works.

The Call to Action: What Should We Do?

The authors propose a simple shift in mindset:

  • Stop calling it "Thinking": Call it "Intermediate Tokens" or "Derivational Traces." It's just data the model generates to help itself, not a window into a human-like mind.
  • Don't trust the "Reasoning": If you need to trust an AI's answer, don't look at its "thought process." Look at the answer itself and verify it with a separate tool (like a calculator or a code checker).
  • Let the AI be weird: If the AI solves a problem best by generating gibberish or non-human symbols, let it do that! We shouldn't force it to sound like a human just to make us feel comfortable.

The Bottom Line

The AI is a super-fast pattern matcher, not a little person inside a computer.

When it generates a long chain of text before an answer, it's not "thinking" in the way we do. It's performing a complex dance it learned from its training data to maximize its chances of being right. Treating this dance as "human reasoning" is a dangerous illusion that makes us trust machines we don't truly understand.

The paper's final advice: Stop looking for a human soul in the machine's code. Focus on whether the answer is right, not on how the machine talks about getting there.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →