AMiD: Knowledge Distillation for LLMs with -mixture Assistant Distribution
This paper introduces AMiD, a unified framework for knowledge distillation in large language models that employs a novel -mixture assistant distribution to systematically generalize the interpolation path and divergence, thereby overcoming training instability and achieving superior performance compared to previous fragmented approaches.