MoE Lens -- An Expert Is All You Need
This paper analyzes the DeepSeekMoE model to reveal that Mixture of Experts architectures exhibit highly concentrated specialization where a single dominant expert can approximate full ensemble performance, suggesting significant opportunities for inference optimization through targeted expert pruning.