Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction
This paper introduces a systematic framework for Large Audio-Language Models that reformulates ambiguous emotion recognition as a distributional reasoning problem, utilizing an ambiguity-aware objective and structured chain-of-thought supervision to significantly improve performance on standard benchmarks.