TaoSR1: The Thinking Model for E-commerce Relevance Search

TaoSR1 is a novel framework that enables the direct deployment of Large Language Models for e-commerce relevance search by employing a three-stage training pipeline—incorporating Chain-of-Thought fine-tuning, DPO, and GRPO—to overcome reasoning errors and hallucinations while achieving superior performance in both offline benchmarks and online human evaluations.

Chenhe Dong, Shaowei Yao, Pengkun Jiao, Jianhui Yang, Yiming Jin, Zerui Huang, Xiaojiang Zhou, Dan Ou, Haihong Tang, Bo ZhengWed, 11 Ma🤖 cs.AI

EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering

This paper introduces EgoCross, a comprehensive benchmark comprising 1,000 QA pairs across four challenging domains (surgery, industry, extreme sports, and animal perspective) to evaluate and expose the poor cross-domain generalization capabilities of current Multimodal Large Language Models in egocentric video question answering.

Yanjun Li, Yuqian Fu, Tianwen Qian, Qi'ao Xu, Silong Dai, Danda Pani Paudel, Luc Van Gool, Xiaoling WangWed, 11 Ma🤖 cs.AI

Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method

This paper proposes SFDA-PFT, a lightweight source-free domain adaptation method that utilizes a pretrained translator to map subject-specific style features in the latent space, enabling effective facial expression recognition on unlabeled neutral target data without requiring source data or unstable image synthesis.

Masoumeh Sharafi, Soufiane Belharbi, Muhammad Osama Zeeshan, Houssem Ben Salem, Ali Etemad, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric GrangerWed, 11 Ma🤖 cs.AI

Debiasing International Attitudes: LLM Agents for Simulating US-China Perception Changes

This study introduces an LLM-agent framework to simulate U.S. citizens' attitudes toward China from 2005 to 2025, demonstrating that while subjective news framing has a modest impact on negative attitudes, a "devil's advocate" agent is the most effective mechanism for debiasing opinions and producing more human-like cognitive outcomes.

Nicholas Sukiennik, Yichuan Xu, Yuqing Kan, Jinghua Piao, Yuwei Yan, Chen Gao, Yong LiWed, 11 Ma🤖 cs.AI

On the mechanical creation of mathematical concepts

The paper proposes a model of mathematical problem-solving as a belief-update loop that distinguishes between implicit concept formation, which optimizes search within a fixed vocabulary, and explicit concept creation, which introduces new moves to resolve unsolvable problems and argues that while current AI excels at the former, achieving the latter is essential for machines to replicate the distinctive nature of mathematical discovery.

Asvin GWed, 11 Ma🤖 cs.AI

OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering

The paper introduces OPENXRD, a comprehensive benchmark framework featuring 217 expert-curated X-ray diffraction questions that evaluates how large language and multimodal models assimilate domain-specific context, revealing that mid-sized models benefit most from high-quality reference materials while very large models often exhibit saturation or interference.

Ali Vosoughi, Ayoub Shahnazari, Yufeng Xi, Zeliang Zhang, Griffin Hess, Chenliang Xu, Niaz AbdolrahimWed, 11 Ma🤖 cs.AI

Cooperative Game-Theoretic Credit Assignment for Multi-Agent Policy Gradients via the Core

This paper proposes CORA, a cooperative game-theoretic credit assignment method that utilizes core allocation and coalition sampling to effectively distribute global advantages among agents in multi-agent reinforcement learning, thereby overcoming the limitations of uniform sharing and enhancing coordinated optimal behavior.

Mengda Ji, Genjiu Xu, Keke Jia, Zekun Duan, Yong Qiu, Jianjun Ge, Mingqiang LiWed, 11 Ma🤖 cs.AI

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models

The paper introduces UltraEdit, a training-, subject-, and memory-free approach for lifelong language model editing that achieves unprecedented scalability and efficiency by computing parameter shifts in a single step, enabling 7B models to be edited on consumer GPUs with over 2 million updates while outperforming existing methods in speed, memory usage, and accuracy.

Xiaojie Gu, Ziying Huang, Jia-Chen Gu, Kai ZhangWed, 11 Ma🤖 cs.AI

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

This paper introduces Stepwise Guided Policy Optimization (SGPO), a framework that enhances Group Relative Policy Optimization (GRPO) by utilizing a step-wise judge model to provide learning signals from all-negative sample groups, thereby enabling large language models to learn from incorrect reasoning and improving performance across various reasoning benchmarks.

Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi LinWed, 11 Ma🤖 cs.AI

MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers

This paper introduces MCP Bridge, a lightweight, LLM-agnostic RESTful proxy that enables Model Context Protocol servers to run in resource-constrained environments with enhanced security, while also presenting a fine-tuned Qwen3 model that achieves state-of-the-art performance on the MCPToolBench++ benchmark through advanced reinforcement learning techniques.

Arash Ahmadi, Sarah Sharif, Yaser M. BanadWed, 11 Ma🤖 cs.AI

A Consequentialist Critique of Binary Classification Evaluation: Theory, Practice, and Tools

This paper critiques the prevalent reliance on fixed-threshold metrics in machine learning evaluation by advocating for a consequentialist framework that prioritizes proper scoring rules like the Brier score, supported by a new decision-theoretic mapping, a practical Python package called `briertools`, and a clipped Brier score variant to bridge the gap between theoretical utility and current practices.

Gerardo Flores, Abigail Schiff, Alyssa H. Smith, Julia A Fukuyama, Ashia C. WilsonWed, 11 Ma🤖 cs.AI

GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics

GateLens is a reasoning-enhanced LLM agent that utilizes Relational Algebra as a formal intermediate representation to bridge the gap between natural language and executable code, enabling fast, transparent, and highly accurate analysis of complex tabular data in automotive software release analytics without requiring few-shot examples or complex agent orchestration.

Arsham Gholamzadeh Khoee, Shuai Wang, Robert Feldt, Dhasarathy Parthasarathy, Yinan YuWed, 11 Ma🤖 cs.AI

LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains

The paper introduces LLM-Advisor, a prompt-based framework that leverages large language models as non-decisive post-processing advisors to significantly improve the cost efficiency of path planning across diverse terrains without modifying underlying planners, while addressing hallucination risks and demonstrating superior performance over zero-shot LLM approaches.

Ling Xiao, Toshihiko YamasakiWed, 11 Ma🤖 cs.AI