SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous driving

This paper proposes SToRM, a novel framework that employs a lightweight importance predictor, supervised training with pseudo-labels, and an anchor-context merging module to significantly reduce visual token redundancy in multi-modal LLMs for autonomous driving, achieving up to 30x computational savings while maintaining end-to-end performance comparable to using all tokens.

Seo Hyun Kim, Jin Bok Park, Do Yeon Koo, Hogun Park, Il Yong Chun2026-03-10💻 cs

Accelerating Robotic Reinforcement Learning with Agent Guidance

This paper introduces Agent-guided Policy Search (AGPS), a framework that replaces human supervisors with a multimodal agent acting as a semantic world model to provide precise corrective guidance, thereby significantly improving sample efficiency and scalability in robotic reinforcement learning compared to traditional Human-in-the-Loop methods.

Haojun Chen, Zili Zou, Chengdong Ma, Yaoxiang Pu, Haotong Zhang, Yuanpei Chen, Yaodong Yang2026-03-10💻 cs

To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models

This paper introduces M2RL, a comprehensive study comparing mixed multi-task training versus separate training with model merging for multi-domain Reinforcement Learning with Verifiable Rewards (RLVR), revealing that reasoning-intensive domains exhibit synergistic effects with minimal interference and providing mechanistic insights through extensive experiments.

Haoqing Wang, Xiang Long, Ziheng Li, Yilong Xu, Tingguang Li, Yehui Tang2026-03-10💻 cs

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

The paper introduces SkillsBench, a comprehensive benchmark demonstrating that while curated agent skills significantly boost LLM performance across diverse domains—often allowing smaller models to match larger ones—self-generated skills offer no benefit and effects vary widely by task.

Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, Shuyi Wang, Binxu Li, Qunhong Zeng, Di Wang, Xuandong Zhao, Yuanli Wang, Roey Ben Chaim, Zonglin Di, Yipeng Gao, Junwei He, Yizhuo He, Liqiang Jing, Luyang Kong, Xin Lan, Jiachen Li, Songlin Li, Yijiang Li, Yueqian Lin, Xinyi Liu, Xuanqing Liu, Haoran Lyu, Ze Ma, Bowei Wang, Runhui Wang, Tianyu Wang, Wengao Ye, Yue Zhang, Hanwen Xing, Yiqi Xue, Steven Dillmann, Han-chung Lee2026-03-10💻 cs

A Geometric Taxonomy of Hallucinations in LLMs

This paper proposes a geometric taxonomy of LLM hallucinations into three distinct types (unfaithfulness, confabulation, and factual error) and introduces corresponding detection metrics, the Semantic Grounding Index and Directional Grounding Index, which effectively identify unfaithful and confabulated outputs while revealing that apparent signals for factual errors in existing benchmarks often stem from stylistic annotation confounds rather than genuine geometric distinctions.

Javier Marín2026-03-10💬 cs.CL

Can a Lightweight Automated AI Pipeline Solve Research-Level Mathematical Problems?

This paper demonstrates that a lightweight, automated AI pipeline integrating next-generation large language models with citation-based verification can successfully generate and solve sophisticated, research-grade mathematical problems, including previously unpublished questions, with verified results and open-sourced tools.

Lve Meng (University of Science,Technology of China, Zhongguancun Academy), Weilong Zhao (Université Paris Cité), Yanzhi Zhang (Zhongguancun Academy), Haoxiang Guan (Zhongguancun Academy), Jiyan He (Zhongguancun Academy)2026-03-10🔢 math

Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

This paper introduces the Mean Velocity Policy (MVP), a novel one-step generative policy that employs an Instantaneous Velocity Constraint (IVC) to theoretically guarantee high expressiveness while achieving state-of-the-art performance and significantly faster training and inference speeds on challenging robotic manipulation tasks compared to existing flow-based baselines.

Guojian Zhan, Letian Tao, Pengcheng Wang, Yixiao Wang, Yiheng Li, Yuxin Chen, Hongyang Li, Masayoshi Tomizuka, Shengbo Eben Li2026-03-10🤖 cs.LG

Understand Then Memory: A Cognitive Gist-Driven RAG Framework with Global Semantic Diffusion

CogitoRAG is a novel Retrieval-Augmented Generation framework inspired by human episodic memory that enhances complex reasoning and reduces hallucinations by extracting semantic gists into a multi-dimensional knowledge graph, utilizing query decomposition and entity diffusion for associative retrieval, and employing a fusion-based reranking algorithm to deliver high-density evidence.

Pengcheng Zhou, Haochen Li, Zhiqiang Nie, JiaLe Chen, Qing Gong, Weizhen Zhang, Chun Yu2026-03-10💬 cs.CL

Condition-Gated Reasoning for Context-Dependent Biomedical Question Answering

This paper introduces CondMedQA, the first benchmark for conditional biomedical question answering, and proposes Condition-Gated Reasoning (CGR), a framework that constructs condition-aware knowledge graphs to dynamically prune reasoning paths based on patient-specific factors, thereby improving the reliability of medical decision-making.

Jash Rajesh Parekh, Wonbin Kweon, Joey Chan, Rezarta Islamaj, Robert Leaman, Pengcheng Jiang, Chih-Hsuan Wei, Zhizheng Wang, Zhiyong Lu, Jiawei Han2026-03-10💬 cs.CL

Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

This paper establishes a comprehensive multi-KPI benchmark for Multi-Agent Reinforcement Learning in urban energy management using the CityLearn environment, demonstrating that Decentralized Training with Decentralized Execution (DTDE) consistently outperforms Centralized Training with Decentralized Execution (CTDE) in both average and worst-case performance while offering greater resilience and sustainability.

Aymen Khouja, Imen Jendoubi, Oumayma Mahjoub, Oussama Mahfoudhi, Ruan De Kock, Siddarth Singh, Claude Formanek2026-03-10🤖 cs.LG

MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation

The paper introduces MrBERT, a family of efficient, open-source multilingual encoders built on the ModernBERT architecture that achieves state-of-the-art performance in specific languages and specialized domains while leveraging Matryoshka Representation Learning to reduce inference and storage costs.

Daniel Tamayo, Iñaki Lacunza, Paula Rivera-Hidalgo, Severino Da Dalt, Javier Aula-Blasco, Aitor Gonzalez-Agirre, Marta Villegas2026-03-10🤖 cs.LG

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

This paper introduces ARLArena, a unified framework that systematically analyzes training instability in agentic reinforcement learning to derive SAMPO, a stable optimization method that ensures consistent performance across diverse agentic tasks.

Xiaoxuan Wang, Han Zhang, Haixin Wang, Yidan Shi, Ruoyan Li, Kaiqiao Han, Chenyi Tong, Haoran Deng, Renliang Sun, Alexander Taylor, Yanqiao Zhu, Jason Cong, Yizhou Sun, Wei Wang2026-03-10💻 cs