Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation
This paper employs mechanistic interpretability to reveal that knowledge distillation not only compresses teacher models into smaller students but also fundamentally restructures their internal circuits by reorganizing, discarding, and relying more heavily on fewer components, necessitating new metrics to quantify these internal functional shifts beyond mere output similarity.