cs.AI papers | Gist.Science

Can RL Improve Generalization of LLM Agents? An Empirical Study

This paper empirically demonstrates that while Reinforcement Fine-Tuning (RFT) enables LLM agents to generalize well across varying task difficulties within a single environment, it struggles with cross-environment transfer due to interface and semantic shifts, though sequential and mixture training strategies can effectively mitigate forgetting and improve overall generalization.

Zhiheng Xi, Xin Guo, Jiaqi Liu, Jiazheng Zhang, Yutao Fan, Zhihao Zhang, Shichun Liu, Mingxu Chai, Xiaowei Shi, Yitao Zhai, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang2026-03-13🤖 cs.AI

Flowcean - Model Learning for Cyber-Physical Systems

Flowcean is a novel, modular framework that automates the data-driven generation and evaluation of Cyber-Physical System models by integrating diverse learning strategies and tools to enhance efficiency and accessibility.

Maximilian Schmidt, Swantje Plambeck, Markus Knitt, Hendrik Rose, Goerschwin Fey, Jan Christian Wieck, Stephan Balduin2026-03-13🤖 cs.LG

An Intent of Collaboration: On Agencies between Designers and Emerging (Intelligent) Technologies

This paper argues that to maintain creative agency while collaborating with emerging intelligent technologies like LLMs, designers must engage in introspection, develop a structural understanding of the technology's capabilities, and deliberately adjust the human-technology working relationship.

Pei-Ying Lin, Julie Heij, Iris Borst, Britt Joosten, Kristina Andersen, Wijnand IJsselsteijn2026-03-13🤖 cs.AI

Sim-to-reality adaptation for Deep Reinforcement Learning applied to an underwater docking application

This paper presents a systematic sim-to-reality adaptation framework using a high-fidelity Stonefish digital twin and PPO-based Deep Reinforcement Learning to achieve over 90% successful autonomous docking for the Girona AUV, validated by physical experiments demonstrating emergent control behaviors.

Alaaeddine Chaarani, Narcis Palomeras, Pere Ridao2026-03-13🤖 cs.AI

Just Use XML: Revisiting Joint Translation and Label Projection

This paper introduces LabelPigeon, a novel framework that uses XML tags to jointly perform machine translation and label projection, demonstrating that this approach not only outperforms existing baselines in cross-lingual transfer tasks but also actively improves translation quality across multiple languages.

Thennal D K, Chris Biemann, Hans Ole Hatzel2026-03-13💬 cs.CL

Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems

This paper introduces "Cascade," a framework demonstrating how traditional software and hardware vulnerabilities can be composed with LLM-specific algorithmic weaknesses to amplify adversarial threats and compromise the integrity and confidentiality of compound AI systems.

Sarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner, Jose Sanchez Vicarte, Mohit Tiwari2026-03-13🤖 cs.AI

Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability

The paper proposes Slow-Fast Inference (SFI), a training-free framework that accelerates long-context autoregressive decoding by dynamically alternating between low-cost fast steps using stable sparse memory and occasional slow steps that refresh context at semantic boundaries, achieving significant throughput gains without compromising generation quality.

Xingyu Xie, Zhaochen Yu, Yue Liao, Tao Wang, Kim-Chuan Toh, Shuicheng Yan2026-03-13🤖 cs.LG

XSkill: Continual Learning from Experience and Skills in Multimodal Agents

XSkill is a dual-stream framework that enables multimodal agents to continually improve their reasoning and tool orchestration in open-ended settings without parameter updates by extracting and retrieving complementary visual-grounded knowledge in the form of action-level experiences and task-level skills.

Guanyu Jiang (May), Zhaochen Su (May), Xiaoye Qu (May), Yi R. (May), Fung2026-03-13🤖 cs.AI

Coarse-Guided Visual Generation via Weighted h-Transform Sampling

This paper proposes a novel training-free method for coarse-guided visual generation that leverages the h-transform to steer diffusion model sampling toward high-fidelity outputs from low-quality references, utilizing a noise-level-aware schedule to balance guidance adherence with synthesis quality without requiring knowledge of the forward transformation operator.

Yanghao Wang, Ziqi Jiang, Zhen Wang, Long Chen2026-03-13🤖 cs.AI

Chemical Reaction Networks Learn Better than Spiking Neural Networks

This paper mathematically proves and numerically demonstrates that chemical reaction networks without hidden layers can outperform spiking neural networks with hidden layers in learning classification tasks, offering a theoretical foundation for efficient learning in chemical computers and biological cells.

Sophie Jaffard, Ivo F. Sbalzarini2026-03-13📊 stat

Beyond Convolution: A Taxonomy of Structured Operators for Learning-Based Image Processing

This paper presents a systematic taxonomy of five families of structured operators that extend or replace standard convolutions in learning-based image processing, providing formal definitions, structural comparisons, and critical analyses of their suitability for various tasks and future research directions.

Simone Cammarasana2026-03-13🤖 cs.AI

Paper Title: LoV3D: Grounding Cognitive Prognosis Reasoning in Longitudinal 3D Brain MRI via Regional Volume Assessments

LoV3D is a novel 3D vision-language pipeline that enhances Alzheimer's disease prognosis by grounding longitudinal MRI analysis in regional volume assessments and a clinically-weighted verifier, achieving state-of-the-art diagnostic accuracy and generalizability while significantly reducing hallucinations through automated, annotation-free training.

Zhaoyang Jiang, Zhizhong Fu, David McAllister, Yunsoo Kim, Honghan Wu2026-03-13🤖 cs.AI

A Multi-Label Temporal Convolutional Framework for Transcription Factor Binding Characterization

This paper proposes a multi-label Temporal Convolutional Network framework that improves transcription factor binding site prediction by modeling cooperative interactions among multiple factors, thereby revealing both known and novel regulatory patterns.

Pietro Demurtas, Ferdinando Zanchetta, Giovanni Perini, Rita Fioresi2026-03-13🧬 q-bio

Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

This paper proposes a resource-efficient, closed-loop Neural Architecture Search framework that leverages frozen large language models with a Markov-inspired feedback memory and dual-LLM specialization to iteratively generate and refine compact convolutional neural networks on a single consumer-grade GPU, achieving significant accuracy improvements on image classification benchmarks without requiring cloud infrastructure or model fine-tuning.

Xiaojie Gu, Dmitry Ignatov, Radu Timofte2026-03-13🤖 cs.LG

Human-Centred LLM Privacy Audits: Findings and Frictions

This paper introduces LMP2, a browser-based self-audit tool, and reports findings from two user studies demonstrating that LLMs can predict personal features with significant accuracy, while highlighting the operational challenges and nine key frictions in establishing reliable, human-centred privacy audits for generative AI.

Dimitri Staufer, Kirsten Morehouse, David Hartmann, Bettina Berendt2026-03-13💬 cs.CL

A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

This paper proposes a robust Multi-Agent Reinforcement Learning framework for traffic signal control that integrates turning ratio randomization, an exponential phase duration adjustment action space, and a neighbor-based MAPPO observation scheme to significantly reduce average waiting time and improve generalization in dynamic traffic scenarios.

Sheng-You Huang, Hsiao-Chuan Chang, Yen-Chi Chen, Ting-Han Wei, I-Hau Yeh, Sheng-Yao Kuan, Chien-Yao Wang, Hsuan-Han Lee, I-Chen Wu2026-03-13🤖 cs.AI

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

This paper identifies and addresses the "information self-locking" phenomenon in reinforcement learning-trained LLM agents, where deficient action selection and belief tracking create a feedback loop that stifles information gathering, by proposing a method that injects directional critiques to reallocate learning signals and significantly improve active reasoning performance.

Deyu Zou, Yongqiang Chen, Fan Feng, Mufei Li, Pan Li, Yu Gong, James Cheng2026-03-13🤖 cs.AI

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

This paper introduces Minimax Deep Deterministic Policy Gradient (MMDDPG), a framework that employs a fractional objective to stabilize the minimax optimization between a user policy and an adversarial disturbance policy, thereby learning robust control strategies that maintain performance under external perturbations and model uncertainties in continuous environments.

Taeho Lee, Donghwan Lee2026-03-13🤖 cs.LG

SommBench: Assessing Sommelier Expertise of Language Models

The paper introduces SommBench, a multilingual benchmark developed in collaboration with professional sommeliers to evaluate the sensory expertise of language models across wine theory, feature completion, and food-wine pairing tasks, revealing that while models excel at theoretical knowledge, they struggle with more complex sensory judgment challenges.

William Brach, Tomas Bedej, Jacob Nielsen, Jacob Pichna, Juraj Bedej, Eemeli Saarensilta, Julie Dupouy, Gianluca Barmina, Andrea Blasi Núñez, Peter Schneider-Kamp, Kristian Koštál, Michal Ries, Lukas Galke Poech2026-03-13💬 cs.CL

CRAFT: A Tendon-Driven Hand with Hybrid Hard-Soft Compliance

The paper introduces CRAFT, a low-cost, open-source, tendon-driven anthropomorphic hand that combines rigid links with soft joints and rolling-contact surfaces to achieve robust, repeatable, and versatile contact-rich manipulation.

Leo Lin, Shivansh Patel, Jay Moon, Svetlana Lazebnik, Unnat Jain2026-03-13🤖 cs.AI

← Previous Next →