Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
This paper employs interpretability techniques on the off-by-one addition task to reveal that large language models achieve task-level generalization through a reusable "function induction" mechanism, where multiple attention heads collaboratively learn and compose abstract functions to solve unseen problems.