Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

This paper demonstrates that while linear probes suggest a Transformer trained on base-digit extraction computes staged arithmetic intermediates, causal tests reveal that the actual computational route relies on separate input streams that combine late, highlighting a significant divergence between representational evidence and causal mechanism.

Original authors: Ishita Darade, Sushrut Thorat

Published 2026-05-22✓ Author reviewed
📖 6 min read🧠 Deep dive

Original authors: Ishita Darade, Sushrut Thorat

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a very smart, but mysterious, robot chef. You give it a recipe card with three ingredients: a big number (NN), a base number (BB), and a specific "slot" number (DD). The chef's job is to figure out a specific digit from the big number, but only after converting it into the "base" language.

For example, if the big number is 255, the base is 16, and you ask for the 0th slot, the chef needs to do some math to tell you the answer.

The researchers in this paper wanted to peek inside the chef's kitchen to see how it solves this puzzle. They had a very specific theory about how the chef should be thinking, and they wanted to see if that's actually what was happening.

Here is the story of what they found, broken down into simple steps:

1. The Chef is a Genius at the Task

First, they checked if the robot could actually do the job. They trained it on thousands of examples and then tested it on new, unseen numbers.

  • The Result: The robot was nearly perfect (99.83% accuracy). It knew exactly what answer to give. So, we know it can solve the problem.

2. The "Official Recipe" Theory (What we thought was happening)

The math problem has a clear, step-by-step solution, like a strict recipe in a cookbook. To get the answer, you theoretically need to follow these steps:

  1. Calculate a helper number (BDB^D).
  2. Divide the big number by that helper.
  3. Round down.
  4. Take the remainder.

The researchers thought the robot was probably following this Official Recipe. They used a tool called a "Linear Probe" (think of it like a head chef peeking into the kitchen) to scan the robot's workspace.

  • The Finding: The chef looked inside and saw that the robot's kitchen did contain these exact numbers. The "helper number" and the "rounded-down number" were clearly visible sitting in bowls on the counter, just like intermediate dishes in a complex cooking process.
  • The Trap: Because they found these ingredients on the counter, they assumed the chef was using them to cook the dish. It looked like the robot was following the recipe perfectly.

3. The Reality Check (The Causal Test)

This is where the paper gets interesting. Just because the chef has the ingredients on the counter doesn't mean it's using them to make the final decision.

To find out what the chef was actually using, the researchers performed a "kitchen audit" using two methods:

  • Method A: The Closed Station (Ablation)
    They tried to "close" specific prep stations in the kitchen that were supposed to pass the "helper numbers" to the final dish.

    • The Result: Surprisingly, closing the stations that held the complex math didn't hurt the chef much. But when they closed the very first station where the chef looked at the "slot number" (DD), the chef immediately forgot how to answer. It didn't matter if the complex math ingredients were sitting on the counter or not; the chef ignored them.
  • Method B: The Swap (Patching)
    They took a "guest" chef who had a different "slot number" (DD) but the same big number and base. They swapped the prep station signals from the guest chef into the original robot's kitchen.

    • The Result: The original robot suddenly gave the guest chef's answer. But this only happened if the slot number (DD) was different. If they swapped the big number (NN) or the base (BB), the robot didn't care.
    • The Conclusion: The robot wasn't using the complex math (the Official Recipe) to decide the answer. It was only listening to the "slot number" (DD) directly.

4. The "Hidden Path" Discovery

Finally, they mapped out the actual path the information took through the kitchen.

  • What they expected: A single, organized assembly line where NN, BB, and DD all meet, get mixed together into a complex math formula, and then produce the answer.
  • What they found: The robot has three separate, small prep stations. One station handles the big number, one handles the base, and one handles the slot number. These stations work independently for almost the entire cooking process. They only combine their ingredients at the very last second, right at PLATING, just before the answer is written down. The robot didn't build the complex "helper numbers" and pass them along; it just kept the ingredients separate until the very end.

The Big Lesson: "Represented" is not "Computed"

The paper's main title says it all: "Represented Is Not Computed."

  • Represented: The robot's kitchen contained the complex math numbers. If you looked at the counter, you could see them clearly (like finding a recipe card on the counter).
  • Computed: The robot did not use those numbers to cook the dish. It took a shortcut.

The Analogy:
Imagine the chef has the official recipe card sitting on the counter, with every step clearly written out (the "represented" math).

  • The Probe: You walk into the kitchen and see the recipe on the counter and say, "Aha! You're using the recipe!"
  • The Reality: The chef actually memorized the dish years ago and is cooking on instinct. The recipe is sitting there, but the chef never looks at it. If you took the recipe away, the dish would still come out the same. If you swapped it for a different recipe, the chef wouldn't notice.

Summary:
The robot solved the math problem perfectly, and it even "thought" about the math steps in a way that looked like it was following the rules. But when they tested what actually caused the robot to give the answer, they found it was ignoring the complex steps and just reacting directly to the specific "slot" it was asked for.

The paper warns us: Just because we can find a piece of information inside a neural network (like finding a recipe on the counter), it doesn't mean the network is actually using that information to make decisions. We need to test the cause, not just look at the contents.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →