SarcasmMiner: A Dual-Track Post-Training Framework for Robust Audio-Visual Sarcasm Reasoning
SarcasmMiner is a reinforcement learning-based post-training framework that employs a dual-track distillation strategy with a generative reward model and group relative policy optimization to significantly enhance robust audio-visual sarcasm reasoning and reduce hallucinations in foundation models.