Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer
This paper demonstrates that while linear probes suggest a Transformer trained on base-digit extraction computes staged arithmetic intermediates, causal tests reveal that the actual computational route relies on separate input streams that combine late, highlighting a significant divergence between representational evidence and causal mechanism.