Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

This paper employs interpretability techniques on the off-by-one addition task to reveal that large language models achieve task-level generalization through a reusable "function induction" mechanism, where multiple attention heads collaboratively learn and compose abstract functions to solve unseen problems.

Qinyuan Ye, Robin Jia, Xiang Ren

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you have a very smart, well-read robot assistant. You've taught it how to do basic math: $1+1=2,, 2+2=4$. It's a whiz at this.

Now, you give it a new, weird rule: "Hey, from now on, whenever you add two numbers, add one extra to the answer." So, $1+1shouldbe should be 3,and, and 2+2shouldbe should be 5$. You show it a few examples:

  • $1+1=3$
  • $2+2=5$
  • $3+3=?$

Most humans would instantly get the pattern and say "7". The paper asks: How does the robot figure this out? Does it just memorize the examples, or does it actually learn a new "mental trick"?

The researchers used a special "X-ray vision" (called mechanistic interpretability) to look inside the robot's brain while it was solving this puzzle. Here is what they found, explained simply:

1. The Robot Has a "Pattern Detective" (The Induction Head)

In previous studies, scientists found that robots have a specific part of their brain that acts like a pattern detective. If you write "Apple, Banana, Apple, ___", this detective spots that "Apple" was followed by "Banana" and guesses the next word is "Banana."

The researchers found that for this math trick, the robot uses a super-charged version of this detective. Instead of just copying words, this detective learns to copy rules. It looks at the examples, realizes, "Oh, the rule here isn't just 'add the numbers,' it's 'add the numbers AND THEN ADD ONE.'"

2. The "Construction Crew" Analogy

The most fascinating discovery is how the robot builds this new rule. It doesn't use one single brain cell to do the whole job. Instead, it uses a construction crew of about six tiny workers (called "attention heads") working in parallel.

Think of the "+1" rule as a complex instruction manual. No single worker knows the whole manual. Instead:

  • Worker A writes down: "Add 1 to the tens place."
  • Worker B writes down: "Subtract 1 from the ones place."
  • Worker C writes down: "Make sure the number gets bigger."

Individually, their notes look like gibberish. But when you stack all their notes on top of each other, they perfectly form the complete instruction: "Add 1."

The robot is essentially composing a new function out of tiny, reusable parts. It's like building a new Lego castle using the same bricks you used to build a house, just arranged differently.

3. The "Universal Remote Control"

The researchers tested if this "construction crew" was a one-time thing or if the robot used it for other things too. They gave the robot different puzzles:

  • Shifting Letters: Instead of math, shift every letter in the alphabet by one (A becomes B, B becomes C).
  • Base-8 Math: Doing math in a different number system (like how computers think).

The Result: The exact same "construction crew" of brain cells jumped into action! They didn't need to learn new workers; they just rearranged their existing notes to solve these new problems.

Why Does This Matter?

This is a big deal for two reasons:

  1. It's Not Just Memorizing: The robot isn't just copying answers from its memory. It's actually reasoning. It's taking a basic skill (addition), spotting a twist in the rules, and building a new mental tool on the fly to handle it.
  2. It's Flexible: The robot has a "universal remote control" inside its brain. It can take a small, reusable mechanism (like "shift this value") and plug it into different tasks (math, language, puzzles).

The Big Picture

Imagine you are learning to cook. First, you learn to boil water. Then, someone tells you, "Now, add a pinch of salt to the water."

  • Old View: The robot just memorized "Boil water + Salt = Boiled Salt Water."
  • This Paper's View: The robot realized, "I have a 'boil' module and a 'add salt' module. I can snap them together to make a new dish."

The paper shows that Large Language Models (LLMs) are getting really good at this "snapping together" of mental modules. They are learning to be composable, meaning they can take old skills and mix them in new ways to solve problems they've never seen before. This is a huge step toward understanding how AI can be truly smart and adaptable, rather than just a giant database of facts.