When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark

This paper presents a cost- and latency-aware benchmark demonstrating that while tool-augmented planning significantly improves accuracy for complex knowledge-intensive tasks like Event-QA, it often incurs prohibitive latency costs and offers no benefit—or even degrades performance—for tasks like persuasive response generation where simple one-shot prompting is more efficient.

Subha Ghoshal, Ali Al-Bustami2026-03-06💻 cs

Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments

This paper introduces a new dataset derived from football highlight reels to evaluate foundation models' ability to identify contextually important video moments, revealing that current state-of-the-art models perform near chance levels due to their reliance on single dominant modalities and failure to effectively synthesize cross-modal information.

Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle2026-03-06💻 cs

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

This paper introduces On-Policy Self-Distillation (OPSD), a framework where a single large language model acts as both teacher and student by leveraging privileged reasoning traces to supervise its own weaker policy, thereby achieving superior mathematical reasoning performance and significantly higher token efficiency compared to traditional off-policy distillation and reinforcement learning methods.

Siyan Zhao, Zhihui Xie, Mengchen Liu + 4 more2026-03-06💻 cs

Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

This paper introduces a simulation-based clinical red teaming framework that pairs AI psychotherapists with dynamic patient agents to evaluate mental health support systems, revealing critical safety gaps such as the validation of delusions and failure to de-escalate suicide risk in AI agents tested against Alcohol Use Disorder scenarios.

Ian Steenstra, Paola Pedrelli, Weiyan Shi + 2 more2026-03-06💻 cs

Why Are Linear RNNs More Parallelizable?

This paper establishes a theoretical foundation for the superior parallelizability of linear RNNs by demonstrating that they correspond to log-depth arithmetic circuits (NC1\mathsf{NC}^1-complete), whereas nonlinear RNNs are fundamentally limited by their ability to solve L\mathsf{L}- and P\mathsf{P}-complete problems, thereby explaining why linear variants can be efficiently parallelized like transformers while traditional nonlinear RNNs cannot.

William Merrill, Hongjian Jiang, Yanhong Li + 2 more2026-03-06💻 cs