VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use

VTool-R1 is a novel framework that leverages reinforcement learning to train vision-language models to generate multimodal chains of thought by strategically interleaving text with intermediate visual reasoning steps using Python-based editing tools, thereby enhancing performance on structured visual tasks without requiring process-based supervision.

Mingyuan Wu, Jingcheng Yang, Jize Jiang + 6 more2026-03-06💻 cs

Continuous Chain of Thought Enables Parallel Exploration and Reasoning

This paper introduces Continuous Chain of Thought (CoT2), a framework that replaces discrete token sampling with continuously-valued tokens to enable parallel exploration of multiple reasoning traces, offering theoretical guarantees for solving combinatorial problems and demonstrating improved performance through novel supervision and policy optimization strategies.

Halil Alperen Gozeten, M. Emrullah Ildiz, Xuechen Zhang + 3 more2026-03-06💻 cs

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

The paper introduces SealQA, a new benchmark comprising three challenging flavors (Seal-0, Seal-Hard, and LongSeal) designed to evaluate search-augmented language models on fact-seeking tasks with noisy or conflicting web results, revealing that even frontier models struggle significantly with reasoning accuracy, robustness to noise, and long-context document retrieval.

Thinh Pham, Nguyen Nguyen, Pratibha Zunjare + 3 more2026-03-06💻 cs

HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

This paper introduces Poly2Graph, an automated pipeline for generating HSG-12M, a pioneering 16.7-million-scale dataset of spatial multigraphs derived from non-Hermitian crystal energy spectra, which bridges condensed matter physics and geometry-aware graph learning by preserving vital geometric information often discarded in existing benchmarks.

Xianquan Yan, Hakan Akgün, Kenji Kawaguchi + 2 more2026-03-06🔬 cond-mat.mes-hall

EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements

This paper introduces EDINET-Bench, a challenging open-source benchmark derived from ten years of Japanese financial reports to evaluate LLMs on complex tasks like fraud detection and earnings forecasting, revealing that current models struggle significantly without specialized scaffolding and highlighting the need for more realistic evaluation frameworks.

Issa Sugiura, Takashi Ishida, Taro Makino + 4 more2026-03-06💻 cs

From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking

This paper proposes ExSUL, a novel online learning framework that enables selective generation for large language models to robustly control the False Discovery Rate (FDR) and achieve optimal regret bounds in non-stationary and adversarial environments by converting bandit regret into FDR guarantees and unlocking additional learning signals from partial user feedback.

Minjae Lee, Yoonjae Jung, Sangdon Park2026-03-06💻 cs

Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics

This paper introduces Structured Kolmogorov-Arnold Neural ODEs (SKANODEs), a framework that combines structured state-space modeling with Kolmogorov-Arnold Networks to accurately recover interpretable physical latent states and discover compact symbolic governing equations for nonlinear dynamical systems, outperforming black-box neural ODEs and classical identification methods across synthetic and real-world datasets.

Wei Liu, Kiran Bacsa, Loon Ching Tang + 1 more2026-03-06🔬 physics

Learning Physical Systems: Symplectification via Gauge Fixing in Dirac Structures

This paper introduces Presymplectification Networks (PSNs), a novel framework that restores non-degenerate symplectic geometry for constrained and dissipative mechanical systems by learning a symplectification lift via Dirac structures, thereby enabling accurate, structure-preserving long-term prediction of complex multibody dynamics like those of the ANYmal quadruped robot.

Aristotelis Papatheodorou, Pranav Vaidhyanathan, Natalia Ares + 1 more2026-03-06💻 cs

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

MuRating is a scalable framework that transfers high-quality English data-quality signals to a unified multilingual evaluator via pairwise comparisons and translation, enabling the selection of balanced, high-quality datasets that significantly improve the performance of multilingual large language models on both English and non-English benchmarks.

Zhixun Chen, Ping Guo, Wenhan Han + 10 more2026-03-06💻 cs