NC-Bench: An LLM Benchmark for Evaluating Conversational Competence

NC-Bench introduces a theory-grounded benchmark that evaluates the conversational competence of large language models by assessing their ability to manage the form and structure of natural interactions across basic, retrieval-augmented, and complex multi-turn scenarios, revealing that while models excel at basic answering, they struggle significantly with repair and complex sequence management tasks.

Robert J. Moore, Sungeun An, Farhan Ahmed, Jay Pankaj GalaTue, 10 Ma💬 cs.CL

DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models

DevBench is a realistic, telemetry-driven benchmark comprising 1,800 instances across six languages that evaluates LLMs on code completion tasks with a focus on ecological validity, contamination-free assessment, and detailed diagnostic insights to guide practical model selection and development.

Pareesa Ameneh Golnari, Adarsh Kumarappan, Wen Wen, Xiaoyu Liu, Gabriel Ryan, Yuting Sun, Shengyu Fu, Elsie NallipoguTue, 10 Ma🤖 cs.LG

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

This paper introduces MAS-Orchestra, a training-time framework that optimizes multi-agent system orchestration via function-calling reinforcement learning, alongside the MASBENCH benchmark, to demonstrate that multi-agent benefits are task-dependent and to achieve significant performance gains with over 10x efficiency on complex reasoning tasks.

Zixuan Ke, Yifei Ming, Austin Xu, Ryan Chin, Xuan-Phi Nguyen, Prathyusha Jwalapuram, Jiayu Wang, Semih Yavuz, Caiming Xiong, Shafiq JotyTue, 10 Ma💬 cs.CL

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

This paper proposes a novel data-rate-aware continuous-flow architecture for CNN inference on FPGAs that mitigates hardware underutilization caused by data reduction in pooling and strided convolution layers by interleaving signals and sharing resources, thereby enabling the high-throughput implementation of complex models like MobileNet on a single device.

Tobias Habermann, Michael Mecik, Zhenyu Wang, César David Vera, Martin Kumm, Mario GarridoTue, 10 Ma🤖 cs.LG

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

MeanCache is a training-free framework that accelerates Flow Matching inference by replacing instantaneous velocity caching with an average-velocity approach using cached Jacobian-vector products and a trajectory-stability scheduling strategy, achieving significant speedups (up to 4.56X) while maintaining high generation quality across models like FLUX.1 and HunyuanVideo.

Huanlin Gao, Ping Chen, Fuyuan Shi, Ruijia Wu, Li YanTao, Qiang Hui, Yuren You, Ting Lu, Chao Tan, Shaoan Zhao, Zhaoxiang Liu, Fang Zhao, Kai Wang, Shiguo LianTue, 10 Ma🤖 cs.LG

Impact of LLMs news Sentiment Analysis on Stock Price Movement Prediction

This paper evaluates the impact of LLM-based news sentiment analysis on stock price prediction, demonstrating that DeBERTa outperforms other models and that an ensemble approach achieves 80% accuracy, while sentiment features provide modest improvements to various time-series forecasting architectures.

Walid Siala (SnT, University of Luxembourg, Luxembourg), Ahmed Khanfir (RIADI, ENSI, University of Manouba, Tunisia, SnT, University of Luxembourg, Luxembourg), Mike Papadakis (SnT, University of Luxembourg, Luxembourg)Tue, 10 Ma💻 cs

Do Schwartz Higher-Order Values Help Sentence-Level Human Value Detection? A Study of Hierarchical Gating and Calibration

This paper investigates whether Schwartz higher-order values improve sentence-level human value detection, finding that while hierarchical gating offers limited benefits, calibration techniques and hybrid ensembles significantly boost performance, suggesting the value hierarchy is more effective as an inductive bias than a rigid routing mechanism.

Víctor Yeste, Paolo RossoTue, 10 Ma🤖 cs.LG

Diffusion-Guided Pretraining for Brain Graph Foundation Models

This paper proposes a unified diffusion-guided pretraining framework for brain graph foundation models that overcomes the limitations of existing methods by using diffusion to preserve semantic connectivity patterns during augmentation and to enable topology-aware global reconstruction, thereby achieving robust and transferable representations across diverse neuroimaging datasets.

Xinxu Wei, Rong Zhou, Lifang He, Yu ZhangTue, 10 Ma🤖 cs.LG

To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models

This paper introduces M2RL, a comprehensive study comparing mixed multi-task training versus separate training with model merging for multi-domain Reinforcement Learning with Verifiable Rewards (RLVR), revealing that reasoning-intensive domains exhibit synergistic effects with minimal interference and providing mechanistic insights through extensive experiments.

Haoqing Wang, Xiang Long, Ziheng Li, Yilong Xu, Tingguang Li, Yehui TangTue, 10 Ma💻 cs

A Geometric Taxonomy of Hallucinations in LLMs

This paper proposes a geometric taxonomy of LLM hallucinations into three distinct types (unfaithfulness, confabulation, and factual error) and introduces corresponding detection metrics, the Semantic Grounding Index and Directional Grounding Index, which effectively identify unfaithful and confabulated outputs while revealing that apparent signals for factual errors in existing benchmarks often stem from stylistic annotation confounds rather than genuine geometric distinctions.

Javier MarínTue, 10 Ma💬 cs.CL

Can a Lightweight Automated AI Pipeline Solve Research-Level Mathematical Problems?

This paper demonstrates that a lightweight, automated AI pipeline integrating next-generation large language models with citation-based verification can successfully generate and solve sophisticated, research-grade mathematical problems, including previously unpublished questions, with verified results and open-sourced tools.

Lve Meng (University of Science,Technology of China, Zhongguancun Academy), Weilong Zhao (Université Paris Cité), Yanzhi Zhang (Zhongguancun Academy), Haoxiang Guan (Zhongguancun Academy), Jiyan He (Zhongguancun Academy)Tue, 10 Ma🔢 math