SpecFuse: Ensembling Large Language Models via Next-Segment Prediction
The paper introduces SpecFuse (referred to as SpecEM in the abstract), a training-free ensemble framework that enhances large language model performance by enabling segment-level semantic collaboration through speculative decoding and dynamically adjusting model weights via an online feedback mechanism to prioritize stronger contributors.