Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
This paper reveals that SpeechLLM backbones struggle with conversational disfluencies due to a bias toward semantic abstraction over structural fidelity, with performance varying by architecture and fine-tuning often compromising generalization despite achieving state-of-the-art results.