MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
MoE-SpAc is an efficient inference framework for Mixture-of-Experts models on heterogeneous edge devices that repurposes speculative decoding as a predictive sensor for memory management, achieving significant throughput improvements through dynamic workload balancing and asynchronous execution.