Can RL Improve Generalization of LLM Agents? An Empirical Study

This paper empirically demonstrates that while Reinforcement Fine-Tuning (RFT) enables LLM agents to generalize well across varying task difficulties within a single environment, it struggles with cross-environment transfer due to interface and semantic shifts, though sequential and mixture training strategies can effectively mitigate forgetting and improve overall generalization.

Zhiheng Xi, Xin Guo, Jiaqi Liu, Jiazheng Zhang, Yutao Fan, Zhihao Zhang, Shichun Liu, Mingxu Chai, Xiaowei Shi, Yitao Zhai, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang2026-03-13🤖 cs.AI

An Intent of Collaboration: On Agencies between Designers and Emerging (Intelligent) Technologies

This paper argues that to maintain creative agency while collaborating with emerging intelligent technologies like LLMs, designers must engage in introspection, develop a structural understanding of the technology's capabilities, and deliberately adjust the human-technology working relationship.

Pei-Ying Lin, Julie Heij, Iris Borst, Britt Joosten, Kristina Andersen, Wijnand IJsselsteijn2026-03-13🤖 cs.AI

Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability

The paper proposes Slow-Fast Inference (SFI), a training-free framework that accelerates long-context autoregressive decoding by dynamically alternating between low-cost fast steps using stable sparse memory and occasional slow steps that refresh context at semantic boundaries, achieving significant throughput gains without compromising generation quality.

Xingyu Xie, Zhaochen Yu, Yue Liao, Tao Wang, Kim-Chuan Toh, Shuicheng Yan2026-03-13🤖 cs.LG

Paper Title: LoV3D: Grounding Cognitive Prognosis Reasoning in Longitudinal 3D Brain MRI via Regional Volume Assessments

LoV3D is a novel 3D vision-language pipeline that enhances Alzheimer's disease prognosis by grounding longitudinal MRI analysis in regional volume assessments and a clinically-weighted verifier, achieving state-of-the-art diagnostic accuracy and generalizability while significantly reducing hallucinations through automated, annotation-free training.

Zhaoyang Jiang, Zhizhong Fu, David McAllister, Yunsoo Kim, Honghan Wu2026-03-13🤖 cs.AI

Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

This paper proposes a resource-efficient, closed-loop Neural Architecture Search framework that leverages frozen large language models with a Markov-inspired feedback memory and dual-LLM specialization to iteratively generate and refine compact convolutional neural networks on a single consumer-grade GPU, achieving significant accuracy improvements on image classification benchmarks without requiring cloud infrastructure or model fine-tuning.

Xiaojie Gu, Dmitry Ignatov, Radu Timofte2026-03-13🤖 cs.LG

A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

This paper proposes a robust Multi-Agent Reinforcement Learning framework for traffic signal control that integrates turning ratio randomization, an exponential phase duration adjustment action space, and a neighbor-based MAPPO observation scheme to significantly reduce average waiting time and improve generalization in dynamic traffic scenarios.

Sheng-You Huang, Hsiao-Chuan Chang, Yen-Chi Chen, Ting-Han Wei, I-Hau Yeh, Sheng-Yao Kuan, Chien-Yao Wang, Hsuan-Han Lee, I-Chen Wu2026-03-13🤖 cs.AI

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

This paper identifies and addresses the "information self-locking" phenomenon in reinforcement learning-trained LLM agents, where deficient action selection and belief tracking create a feedback loop that stifles information gathering, by proposing a method that injects directional critiques to reallocate learning signals and significantly improve active reasoning performance.

Deyu Zou, Yongqiang Chen, Fan Feng, Mufei Li, Pan Li, Yu Gong, James Cheng2026-03-13🤖 cs.AI

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

This paper introduces Minimax Deep Deterministic Policy Gradient (MMDDPG), a framework that employs a fractional objective to stabilize the minimax optimization between a user policy and an adversarial disturbance policy, thereby learning robust control strategies that maintain performance under external perturbations and model uncertainties in continuous environments.

Taeho Lee, Donghwan Lee2026-03-13🤖 cs.LG

SommBench: Assessing Sommelier Expertise of Language Models

The paper introduces SommBench, a multilingual benchmark developed in collaboration with professional sommeliers to evaluate the sensory expertise of language models across wine theory, feature completion, and food-wine pairing tasks, revealing that while models excel at theoretical knowledge, they struggle with more complex sensory judgment challenges.

William Brach, Tomas Bedej, Jacob Nielsen, Jacob Pichna, Juraj Bedej, Eemeli Saarensilta, Julie Dupouy, Gianluca Barmina, Andrea Blasi Núñez, Peter Schneider-Kamp, Kristian Koštál, Michal Ries, Lukas Galke Poech2026-03-13💬 cs.CL