cs.AI papers | Gist.Science

Resource-constrained Amazons chess decision framework integrating large language models and graph attention

This paper proposes a lightweight hybrid framework for the Game of the Amazons that integrates Graph Attention Autoencoders, Stochastic Graph Genetic Algorithms, and GPT-4o-mini to overcome resource constraints, achieving decision accuracy improvements of 15%–56% over baselines and outperforming its teacher model by effectively denoising LLM outputs through structural graph reasoning.

Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Hanjie Liu, Leszek Rutkowski2026-03-12🤖 cs.AI

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

The paper introduces IH-Challenge, a reinforcement learning dataset designed to enhance instruction hierarchy robustness in frontier LLMs, which significantly improves their ability to prioritize instructions against conflicts and adversarial attacks while maintaining helpfulness and minimizing capability regression.

Chuan Guo (Michael Pokorny), Juan Felipe Ceron Uribe (Michael Pokorny), Sicheng Zhu (Michael Pokorny), Christopher A. Choquette-Choo (Michael Pokorny), Steph Lin (Michael Pokorny), Nikhil Kandpal (Michael Pokorny), Milad Nasr (Michael Pokorny), Rai (Michael Pokorny), Sam Toyer, Miles Wang, Yaodong Yu, Alex Beutel, Kai Xiao2026-03-12🤖 cs.AI

UAV-MARL: Multi-Agent Reinforcement Learning for Time-Critical and Dynamic Medical Supply Delivery

This paper presents a Multi-Agent Reinforcement Learning framework using Proximal Policy Optimization to coordinate UAV fleets for time-critical medical supply delivery, demonstrating that classical PPO outperforms asynchronous and sequential strategies in dynamically prioritizing tasks and reallocating resources under uncertain conditions using real-world geographic data.

Islam Guven, Mehmet Parlak2026-03-12🤖 cs.LG

Prompting with the human-touch: evaluating model-sensitivity of foundation models for musculoskeletal CT segmentation

This study evaluates 11 promptable foundation models for musculoskeletal CT segmentation across four anatomical regions, revealing that while specific models like SAM and nnInteractive perform best under ideal conditions, all models exhibit significant sensitivity to human prompting variations, leading to performance drops and highlighting the challenge of selecting robust models for real-world clinical applications.

Caroline Magg, Maaike A. ter Wee, Johannes G. G. Dobbe, Geert J. Streekstra, Leendert Blankevoort, Clara I. Sánchez, Hoel Kervadec2026-03-12🤖 cs.AI

SCORE: Replacing Layer Stacking with Contractive Recurrent Depth

The paper introduces SCORE, a lightweight deep learning architecture that replaces traditional layer stacking with a weight-shared, ODE-inspired contractive recurrent update mechanism to improve training stability, accelerate convergence, and reduce parameter counts across various model types.

Guillaume Godin2026-03-12✓ Author reviewed ⓘ🤖 cs.LG

Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues

This paper introduces a novel language-guided framework that leverages pretrained vision-language models and a specialized adapter to achieve zero-shot, generative detection and localization of subsurface defects in carbon fiber-reinforced polymers using active infrared thermography, thereby eliminating the need for costly, task-specific training datasets while significantly improving signal-to-noise ratios and detection accuracy.

Mohammed Salah, Eman Ouda, Giuseppe Dell'Avvocato, Fabrizio Sarasini, Ester D'Accardi, Jorge Dias, Davor Svetinovic, Stefano Sfarra, Yusra Abdulrahman2026-03-12⚡ eess

Adaptive RAN Slicing Control via Reward-Free Self-Finetuning Agents

This paper proposes a novel self-finetuning framework that enables Generative AI agents to autonomously learn continuous control for dynamic Radio Access Network slicing by distilling long-horizon experiences into model parameters via a bi-perspective reflection mechanism, thereby outperforming traditional Reinforcement Learning and standard LLM-based agents in sample efficiency and multi-objective optimization without relying on handcrafted reward signals.

Yuanhao Li, Haozhe Wang, Geyong Min, Nektarios Georgalas, Wang Miao2026-03-12🤖 cs.AI

CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents

This paper presents CUAAudit, a large-scale meta-evaluation demonstrating that while Vision-Language Models can serve as autonomous auditors for Computer-Use Agents with strong accuracy and calibration, their significant performance degradation in complex environments and notable inter-model disagreement reveal fundamental limitations that necessitate explicit accounting for evaluator reliability and uncertainty in real-world deployments.

Marta Sumyk, Oleksandr Kosovan2026-03-12🤖 cs.AI

Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

This paper empirically demonstrates that contrary to the hypothesis that moral reasoning alignment requires diversity-seeking algorithms, standard reward-maximizing RLVR methods are equally or more effective because high-reward moral responses exhibit a concentrated distribution in semantic space similar to logical reasoning tasks.

Zhaowei Zhang, Xiaohan Liu, Xuekai Zhu, Junchao Huang, Ceyao Zhang, Zhiyuan Feng, Yaodong Yang, Xiaoyuan Yi, Xing Xie2026-03-12🤖 cs.AI

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

This paper establishes a mathematical framework called Gradient Flow Drifting that proves the equivalence between the recently proposed Drifting Model and the Wasserstein gradient flow of the forward KL divergence under KDE approximation, while extending the approach to a mixed-divergence strategy on Riemannian manifolds to simultaneously mitigate mode collapse and blurring.

Jiarui Cao, Zixuan Wei, Yuxin Liu2026-03-12🤖 cs.LG

Recover to Predict: Progressive Retrospective Learning for Variable-Length Trajectory Prediction

This paper proposes the Progressive Retrospective Framework (PRF), a plug-and-play method that utilizes a cascade of retrospective units and a rolling-start training strategy to effectively address the challenge of variable-length trajectory prediction in autonomous driving by progressively aligning features from incomplete observations with complete ones.

Hao Zhou, Lu Qi, Jason Li, Jie Zhang, Yi Liu, Xu Yang, Mingyu Fan, Fei Luo2026-03-12🤖 cs.AI

Trajectory-Informed Memory Generation for Self-Improving Agent Systems

This paper introduces a novel framework for self-improving LLM agents that automatically extracts structured learnings from execution trajectories—categorizing them into strategy, recovery, and optimization tips—and injects them via adaptive memory retrieval to significantly boost task completion rates, particularly on complex scenarios.

Gaodan Fang, Vatche Isahagian, K. R. Jayaram, Ritesh Kumar, Vinod Muthusamy, Punleuk Oum, Gegi Thomas2026-03-12🤖 cs.AI

Reinforcement Learning with Conditional Expectation Reward

This paper proposes Conditional Expectation Reward (CER), a novel reinforcement learning method that utilizes the large language model itself as an implicit verifier to provide soft, graded reward signals, thereby overcoming the limitations of rule-based verification and enabling effective reasoning training across both mathematical and general free-form answer domains.

Changyi Xiao, Caijun Xu, Yixin Cao2026-03-12🤖 cs.LG

Detecting and Eliminating Neural Network Backdoors Through Active Paths with Application to Intrusion Detection

This paper proposes a novel, explainable approach to detect and eliminate neural network backdoors by analyzing active paths within the model, demonstrating its effectiveness through experiments on intrusion detection systems.

Eirik Høyheim, Magnus Wiik Eckhoff, Gudmund Grov, Robert Flood, David Aspinall2026-03-12🤖 cs.AI

Interleaving Scheduling and Motion Planning with Incremental Learning of Symbolic Space-Time Motion Abstractions

This paper proposes a novel framework that interleaves task scheduling and motion planning through an incremental learning loop, where symbolic feedback from motion feasibility checks guides the scheduler to generate efficient, collision-free plans for multi-object navigation in shared workspaces.

Elisa Tosello, Arthur Bit-Monnot, Davide Lusuardi, Alessandro Valentini, Andrea Micheli2026-03-12🤖 cs.AI

Are Video Reasoning Models Ready to Go Outside?

This paper introduces ROVA, a training framework that enhances video reasoning models' robustness against real-world disturbances like weather and occlusion through difficulty-aware adaptive training and a robustness-aware consistency reward, validated by the new PVRBench benchmark which demonstrates significant performance improvements over existing models.

Yangfan He, Changgyu Boo, Jaehong Yoon2026-03-12🤖 cs.AI

FAME: Formal Abstract Minimal Explanation for Neural Networks

The paper introduces FAME, a novel abductive explanation method for large neural networks that utilizes dedicated perturbation domains and LiRPA-based bounds to efficiently generate formal abstract minimal explanations, demonstrating superior performance in explanation size and runtime compared to VERIX+.

Ryma Boumazouza, Raya Elsaleh, Melanie Ducoffe, Shahaf Bassan, Guy Katz2026-03-12🤖 cs.AI

Emulating Clinician Cognition via Self-Evolving Deep Clinical Research

The paper introduces DxEvolve, a self-evolving diagnostic agent that emulates clinician cognition through an interactive deep clinical research workflow, autonomously requisitioning examinations and externalizing experience to achieve superior diagnostic accuracy and governed continual improvement compared to existing AI models.

Ruiyang Ren, Yuhao Wang, Yunsen Liang, Lan Luo, Jing Liu, Haifeng Wang, Cong Feng, Yinan Zhang, Chunyan Miao, Ji-Rong Wen, Wayne Xin Zhao2026-03-12🤖 cs.AI

A Platform-Agnostic Multimodal Digital Human Modelling Framework: Neurophysiological Sensing in Game-Based Interaction

This paper presents a platform-agnostic Digital Human Modelling framework that integrates OpenBCI Galea biosensing with a reproducible SuperTux game environment to provide structured, temporally aligned multimodal data for future AI-driven, ethics-approved research in accessibility and inclusive interaction design.

Daniel J. Buxton, Mufti Mahmud, Jordan J. Bird, Thomas Hughes-Roberts, David J. Brown2026-03-12🤖 cs.AI

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

This paper introduces Contract And Conquer (CAC), a black-box adversarial attack method that combines knowledge distillation on an expanding dataset with precise search space contraction to provably compute adversarial examples within a fixed number of iterations, outperforming existing state-of-the-art approaches on ImageNet.

Anna Chistyakova, Mikhail Pautov2026-03-12🤖 cs.LG

← Previous Next →