MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations
This paper introduces MuSaG, the first German multimodal sarcasm dataset featuring aligned text, audio, and video annotations from television shows, and demonstrates through benchmarking that current models struggle to match human reliance on audio cues, thereby highlighting a critical gap for future multimodal sarcasm detection research.