cs.AI papers | Gist.Science

TaoSR1: The Thinking Model for E-commerce Relevance Search

TaoSR1 is a novel framework that enables the direct deployment of Large Language Models for e-commerce relevance search by employing a three-stage training pipeline—incorporating Chain-of-Thought fine-tuning, DPO, and GRPO—to overcome reasoning errors and hallucinations while achieving superior performance in both offline benchmarks and online human evaluations.

Chenhe Dong, Shaowei Yao, Pengkun Jiao, Jianhui Yang, Yiming Jin, Zerui Huang, Xiaojiang Zhou, Dan Ou, Haihong Tang, Bo ZhengWed, 11 Ma🤖 cs.AI

Singing Syllabi with Virtual Avatars: Enhancing Student Engagement Through AI-Generated Music and Digital Embodiment

This paper proposes and evaluates a novel educational approach that uses AI-generated singing and virtual avatars to transform traditional text-based syllabi into engaging audiovisual performances, demonstrating that this method significantly improves student awareness and recall of critical course information.

Xinxing WuWed, 11 Ma🤖 cs.AI

EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering

This paper introduces EgoCross, a comprehensive benchmark comprising 1,000 QA pairs across four challenging domains (surgery, industry, extreme sports, and animal perspective) to evaluate and expose the poor cross-domain generalization capabilities of current Multimodal Large Language Models in egocentric video question answering.

Yanjun Li, Yuqian Fu, Tianwen Qian, Qi'ao Xu, Silong Dai, Danda Pani Paudel, Luc Van Gool, Xiaoling WangWed, 11 Ma🤖 cs.AI

Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method

This paper proposes SFDA-PFT, a lightweight source-free domain adaptation method that utilizes a pretrained translator to map subject-specific style features in the latent space, enabling effective facial expression recognition on unlabeled neutral target data without requiring source data or unstable image synthesis.

Masoumeh Sharafi, Soufiane Belharbi, Muhammad Osama Zeeshan, Houssem Ben Salem, Ali Etemad, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric GrangerWed, 11 Ma🤖 cs.AI

Debiasing International Attitudes: LLM Agents for Simulating US-China Perception Changes

This study introduces an LLM-agent framework to simulate U.S. citizens' attitudes toward China from 2005 to 2025, demonstrating that while subjective news framing has a modest impact on negative attitudes, a "devil's advocate" agent is the most effective mechanism for debiasing opinions and producing more human-like cognitive outcomes.

Nicholas Sukiennik, Yichuan Xu, Yuqing Kan, Jinghua Piao, Yuwei Yan, Chen Gao, Yong LiWed, 11 Ma🤖 cs.AI

Latent Policy Steering with Embodiment-Agnostic Pretrained World Models

This paper introduces Latent Policy Steering (LPS), a method that leverages embodiment-agnostic optical flow to pretrain a World Model on diverse datasets, which is then fine-tuned with limited target-embodiment demonstrations to steer and significantly improve visuomotor policies in low-data regimes.

Yiqi Wang, Mrinal Verghese, Jeff SchneiderWed, 11 Ma🤖 cs.AI

On the mechanical creation of mathematical concepts

The paper proposes a model of mathematical problem-solving as a belief-update loop that distinguishes between implicit concept formation, which optimizes search within a fixed vocabulary, and explicit concept creation, which introduces new moves to resolve unsolvable problems and argues that while current AI excels at the former, achieving the latter is essential for machines to replicate the distinctive nature of mathematical discovery.

Asvin GWed, 11 Ma🤖 cs.AI

OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering

The paper introduces OPENXRD, a comprehensive benchmark framework featuring 217 expert-curated X-ray diffraction questions that evaluates how large language and multimodal models assimilate domain-specific context, revealing that mid-sized models benefit most from high-quality reference materials while very large models often exhibit saturation or interference.

Ali Vosoughi, Ayoub Shahnazari, Yufeng Xi, Zeliang Zhang, Griffin Hess, Chenliang Xu, Niaz AbdolrahimWed, 11 Ma🤖 cs.AI

ConLID: Supervised Contrastive Learning for Low-Resource Language Identification

The paper proposes ConLID, a supervised contrastive learning approach that learns domain-invariant representations to significantly improve language identification performance for low-resource languages on out-of-domain data while maintaining accuracy for high-resource languages.

Negar Foroutan, Jakhongir Saydaliev, Ye Eun Kim, Antoine BosselutWed, 11 Ma🤖 cs.AI

Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness

The paper introduces ChannelTokenFormer, a unified Transformer-based framework that simultaneously addresses the challenges of complex inter-channel dependencies, asynchronous sampling, and missing values to achieve robust real-world multivariate time series forecasting.

Jinkwan Jang, Hyungjin Park, Jinmyeong Choi, Taesup KimWed, 11 Ma🤖 cs.AI

Cooperative Game-Theoretic Credit Assignment for Multi-Agent Policy Gradients via the Core

This paper proposes CORA, a cooperative game-theoretic credit assignment method that utilizes core allocation and coalition sampling to effectively distribute global advantages among agents in multi-agent reinforcement learning, thereby overcoming the limitations of uniform sharing and enhancing coordinated optimal behavior.

Mengda Ji, Genjiu Xu, Keke Jia, Zekun Duan, Yong Qiu, Jianjun Ge, Mingqiang LiWed, 11 Ma🤖 cs.AI

Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment

The paper proposes TSRating, a novel meta-learning framework that leverages LLMs to generate quality comparisons across diverse domains and trains an efficient TSRater model to accurately and adaptively evaluate time series data quality without requiring extensive hypergradient computations.

Shunyu Wu, Dan Li, Wenjie Feng, Haozheng Ye, Jian Lou, See-Kiong NgWed, 11 Ma🤖 cs.AI

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

The paper introduces Saturn, a reinforcement learning framework that leverages Boolean Satisfiability (SAT) problems to overcome scalability, verifiability, and difficulty control limitations in training large language models, resulting in significant reasoning improvements across SAT, math, and programming benchmarks.

Huanyu Liu, Ge Li, Jia Li, Hao Zhu, Kechi Zhang, Yihong DongWed, 11 Ma🤖 cs.AI

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models

The paper introduces UltraEdit, a training-, subject-, and memory-free approach for lifelong language model editing that achieves unprecedented scalability and efficiency by computing parameter shifts in a single step, enabling 7B models to be edited on consumer GPUs with over 2 million updates while outperforming existing methods in speed, memory usage, and accuracy.

Xiaojie Gu, Ziying Huang, Jia-Chen Gu, Kai ZhangWed, 11 Ma🤖 cs.AI

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

This paper introduces Stepwise Guided Policy Optimization (SGPO), a framework that enhances Group Relative Policy Optimization (GRPO) by utilizing a step-wise judge model to provide learning signals from all-negative sample groups, thereby enabling large language models to learn from incorrect reasoning and improving performance across various reasoning benchmarks.

Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi LinWed, 11 Ma🤖 cs.AI

MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers

This paper introduces MCP Bridge, a lightweight, LLM-agnostic RESTful proxy that enables Model Context Protocol servers to run in resource-constrained environments with enhanced security, while also presenting a fine-tuned Qwen3 model that achieves state-of-the-art performance on the MCPToolBench++ benchmark through advanced reinforcement learning techniques.

Arash Ahmadi, Sarah Sharif, Yaser M. BanadWed, 11 Ma🤖 cs.AI

A Consequentialist Critique of Binary Classification Evaluation: Theory, Practice, and Tools

This paper critiques the prevalent reliance on fixed-threshold metrics in machine learning evaluation by advocating for a consequentialist framework that prioritizes proper scoring rules like the Brier score, supported by a new decision-theoretic mapping, a practical Python package called `briertools`, and a clipped Brier score variant to bridge the gap between theoretical utility and current practices.

Gerardo Flores, Abigail Schiff, Alyssa H. Smith, Julia A Fukuyama, Ashia C. WilsonWed, 11 Ma🤖 cs.AI

GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics

GateLens is a reasoning-enhanced LLM agent that utilizes Relational Algebra as a formal intermediate representation to bridge the gap between natural language and executable code, enabling fast, transparent, and highly accurate analysis of complex tabular data in automotive software release analytics without requiring few-shot examples or complex agent orchestration.

Arsham Gholamzadeh Khoee, Shuai Wang, Robert Feldt, Dhasarathy Parthasarathy, Yinan YuWed, 11 Ma🤖 cs.AI

HyConEx: Hypernetwork classifier with counterfactual explanations for tabular data

The paper introduces HyConEx, a novel deep hypernetwork-based classifier for tabular data that uniquely integrates prediction and explanation by simultaneously generating class labels and local counterfactual examples to interpret model decisions.

Patryk Marszałek, Kamil Ksi\k{a}\.zek, Oleksii Furman, Ulvi Movsum-zada, Przemysław Spurek, Marek SmiejaWed, 11 Ma🤖 cs.AI

LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains

The paper introduces LLM-Advisor, a prompt-based framework that leverages large language models as non-decisive post-processing advisors to significantly improve the cost efficiency of path planning across diverse terrains without modifying underlying planners, while addressing hallucination risks and demonstrating superior performance over zero-shot LLM approaches.

Ling Xiao, Toshihiko YamasakiWed, 11 Ma🤖 cs.AI

← Previous Next →