When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?

This paper presents the first empirical study on training LLMs to abstain from answering in temporal question answering by combining Chain-of-Thought supervision with Reinforcement Learning, demonstrating that this approach significantly outperforms existing models in accuracy and reliability while revealing the limitations of implicit reasoning cues and supervised fine-tuning.

Xinyu Zhou, Chang Jin, Carsten Eickhoff + 2 more2026-03-05🤖 cs.AI

Rewards as Labels: Revisiting RLVR from a Classification Perspective

This paper proposes "Rewards as Labels" (REAL), a novel framework that reformulates Reinforcement Learning with Verifiable Rewards as a classification problem to address gradient misassignment and domination issues in methods like GRPO, thereby achieving superior training stability and performance on mathematical reasoning benchmarks compared to state-of-the-art baselines.

Zepeng Zhai, Meilin Chen, Jiaxuan Zhao + 3 more2026-03-05🤖 cs.LG

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

This paper introduces the first NLP dataset for the endangered Meenzerisch dialect of Mainz and demonstrates that current large language models struggle significantly to generate or define its words, achieving accuracy rates below 10% even with few-shot learning and rule extraction, thereby highlighting an urgent need for further research and resources to preserve German dialects.

Minh Duc Bui, Manuel Mager, Peter Herbert Kann + 1 more2026-03-05💬 cs.CL