Dynamic Knowledge Fusion for Multi-Domain Dialogue State Tracking

This paper proposes a dynamic knowledge fusion framework for multi-domain dialogue state tracking that addresses challenges in modeling dialogue history and data scarcity by using a contrastive learning-based encoder to select relevant slots and leveraging their structured information as contextual prompts to improve tracking accuracy and generalization.

Haoxiang Su, Ruiyu Fang, Liting Jiang, Xiaomeng Huang, Shuangyong SongThu, 12 Ma💬 cs.CL

Aligning Large Language Models with Searcher Preferences

This paper introduces SearchLLM, the first large language model designed for open-ended generative search on platforms like RedNote, which utilizes a hierarchical multi-dimensional reward system and Gated Aggregation Strategy with GRPO to balance safety, factual grounding, and user alignment, resulting in measurable improvements in generation quality and user engagement.

Wei Wu, Peilun Zhou, Liyi Chen, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Hui XiongThu, 12 Ma💬 cs.CL

Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent

The paper introduces PULSE, a medical reasoning agent that integrates a domain-tuned large language model with scientific literature retrieval to achieve expert-competitive diagnostic accuracy across varying disease incidences, while demonstrating both its potential to enhance physician decision-making and the risks of automation bias in collaborative workflows.

Zhongzhen Huang, Yan Ling, Hong Chen, Ye Feng, Li Wu, Linjie Mu, Shaoting Zhang, Xiaofan Zhang, Kun Qian, Xiaomu LiThu, 12 Ma💬 cs.CL

VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

The paper introduces VERI-DPO, an evidence-aware alignment framework that leverages claim verification to mine preference pairs for Direct Preference Optimization, significantly reducing unsupported claims and improving the faithfulness of clinical summarizations while maintaining informative length.

Weixin Liu, Congning Ni, Qingyuan Song, Susannah L. Rose, Christopher Symons, Murat Kantarcioglu, Bradley A. Malin, Zhijun YinThu, 12 Ma💬 cs.CL

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

The paper introduces IH-Challenge, a reinforcement learning dataset designed to enhance instruction hierarchy robustness in frontier LLMs, which significantly improves their ability to prioritize instructions against conflicts and adversarial attacks while maintaining helpfulness and minimizing capability regression.

Chuan Guo (Michael Pokorny), Juan Felipe Ceron Uribe (Michael Pokorny), Sicheng Zhu (Michael Pokorny), Christopher A. Choquette-Choo (Michael Pokorny), Steph Lin (Michael Pokorny), Nikhil Kandpal (Michael Pokorny), Milad Nasr (Michael Pokorny), Rai (Michael Pokorny), Sam Toyer, Miles Wang, Yaodong Yu, Alex Beutel, Kai XiaoThu, 12 Ma🤖 cs.AI

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

This paper introduces Group Relative Reward Rescaling (GR3^3), a novel reinforcement learning method that effectively mitigates length inflation in large language models by reframing length control as a multiplicative rescaling paradigm, thereby achieving lossless optimization and superior performance compared to existing baselines without compromising downstream capabilities.

Zichao Li, Jie Lou, Fangchen Dong, Zhiyuan Fan, Mengjie Ren, Hongyu Lin, Xianpei Han, Debing Zhang, Le Sun, Yaojie Lu, Xing YuThu, 12 Ma🤖 cs.LG

Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

This paper empirically demonstrates that contrary to the hypothesis that moral reasoning alignment requires diversity-seeking algorithms, standard reward-maximizing RLVR methods are equally or more effective because high-reward moral responses exhibit a concentrated distribution in semantic space similar to logical reasoning tasks.

Zhaowei Zhang, Xiaohan Liu, Xuekai Zhu, Junchao Huang, Ceyao Zhang, Zhiyuan Feng, Yaodong Yang, Xiaoyuan Yi, Xing XieThu, 12 Ma🤖 cs.AI

Emulating Clinician Cognition via Self-Evolving Deep Clinical Research

The paper introduces DxEvolve, a self-evolving diagnostic agent that emulates clinician cognition through an interactive deep clinical research workflow, autonomously requisitioning examinations and externalizing experience to achieve superior diagnostic accuracy and governed continual improvement compared to existing AI models.

Ruiyang Ren, Yuhao Wang, Yunsen Liang, Lan Luo, Jing Liu, Haifeng Wang, Cong Feng, Yinan Zhang, Chunyan Miao, Ji-Rong Wen, Wayne Xin ZhaoThu, 12 Ma🤖 cs.AI

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

This paper introduces EvoSchema, a comprehensive benchmark featuring a novel taxonomy of ten schema perturbation types to evaluate and enhance the robustness of text-to-SQL models against real-world database schema evolution, revealing that table-level changes significantly impact performance and demonstrating that training on diverse schema designs improves model resilience.

Tianshu Zhang, Kun Qian, Siddhartha Sahai, Yuan Tian, Shaddy Garg, Huan Sun, Yunyao LiThu, 12 Ma💬 cs.CL