cs.AI papers | Gist.Science

Understanding the Dynamics of Demonstration Conflict in In-Context Learning

This paper investigates how large language models process conflicting demonstrations in in-context learning, revealing a two-phase computational structure where early layers encode both correct and incorrect rules while late layers commit to predictions, and identifies specific attention heads responsible for this vulnerability that can be mitigated through targeted ablation to significantly improve performance.

Difan Jiao, Di Wang, Lijie Hu2026-03-06💻 cs

Towards Explainable Deep Learning for Ship Trajectory Prediction in Inland Waterways

This study proposes an interpretable LSTM-based model for predicting ship trajectories in inland waterways that incorporates trained ship domain parameters to analyze attention mechanisms, revealing that while the model achieves competitive accuracy, its attention weights do not fully align with expected causal relationships between interacting vessels.

Tom Legel, Dirk Söffker, Roland Schätzle + 1 more2026-03-06💻 cs

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

This paper proposes a propagation dynamics model and a non-intrusive genealogy-graph-based governance layer to identify, model, and effectively mitigate error cascades in LLM-based multi-agent systems, significantly improving defense success rates against error amplification without altering existing collaboration architectures.

Yizhe Xie, Congcong Zhu, Xinyue Zhang + 5 more2026-03-06💻 cs

Activity Recognition from Smart Insole Sensor Data Using a Circular Dilated CNN

This paper presents a circular dilated convolutional neural network (CDCNN) for real-time activity recognition using multi-modal smart insole sensor data, which achieves 86.42% accuracy in a subject-independent four-class classification task, demonstrating that inertial sensors are the primary contributors to performance while remaining suitable for embedded deployment.

Yanhua Zhao2026-03-06💻 cs

Progressive Refinement Regulation for Accelerating Diffusion Language Model Decoding

This paper proposes Progressive Refinement Regulation (PRR), a dynamic, trajectory-grounded framework that learns a token-wise controller to adaptively regulate the denoising process based on empirical convergence progress, thereby substantially accelerating Diffusion Language Model decoding while preserving generation quality.

Lipeng Wan, Jianhui Gu, Junjie Ma + 4 more2026-03-06💻 cs

Augmenting representations with scientific papers

This paper introduces a contrastive learning framework that aligns X-ray spectra with scientific literature to create shared multimodal representations, significantly improving the estimation of physical variables and enabling the discovery of rare astrophysical sources through integrated data analysis.

Nicolò Oreste Pinciroli Vago, Rocco Di Tella, Carolina Cuesta-Lázaro + 3 more2026-03-06✓ Author reviewed ⓘ🔭 astro-ph

Projected Hessian Learning: Fast Curvature Supervision for Accurate Machine-Learning Interatomic Potentials

The paper introduces Projected Hessian Learning (PHL), a scalable framework that enables efficient, curvature-informed training of machine-learning interatomic potentials by utilizing stochastic Hessian-vector products instead of explicit Hessian matrices, thereby achieving full-second-order accuracy with significantly reduced computational cost and memory requirements.

Austin Rodriguez, Justin S. Smith, Sakib Matin + 3 more2026-03-06🔬 physics

Discovering mathematical concepts through a multi-agent system

This paper presents a multi-agent system that autonomously discovers mathematical concepts, such as homology, by dynamically interweaving conjecture generation, proof attempts, and counterexample analysis, demonstrating that optimizing these local processes effectively aligns with human notions of mathematical interestingness.

Daattavya Aggarwal, Oisin Kim, Carl Henrik Ek + 1 more2026-03-06🔢 math

Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks

This paper investigates temporal drift in the FreshStack retrieval benchmark by comparing 2024 and 2025 corpus snapshots, finding that while relevant documents migrate between repositories, retrieval model rankings remain highly stable, suggesting that such benchmarks can remain reliable despite temporal corpus evolution.

Nathan Kuissi, Suraj Subrahmanyan, Nandan Thakur + 1 more2026-03-06💻 cs

Invariant Causal Routing for Governing Social Norms in Online Market Economies

This paper proposes Invariant Causal Routing (ICR), a causal governance framework that leverages counterfactual reasoning and invariant causal discovery to identify stable, interpretable policy rules for steering emergent social norms in online market economies across heterogeneous environments.

Xiangning Yu, Qirui Mi, Xiao Xue + 4 more2026-03-06💻 cs

Adaptive Memory Admission Control for LLM Agents

This paper proposes Adaptive Memory Admission Control (A-MAC), a framework that decomposes memory value into five interpretable factors to enable transparent, efficient, and domain-adaptive long-term memory management for LLM agents, achieving superior precision-recall tradeoffs and reduced latency compared to state-of-the-art systems.

Guilin Zhang, Wei Jiang, Xiejiashan Wang + 5 more2026-03-06💻 cs

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

This study investigates the correlation between catastrophic forgetting and structural collapse in continual learning by measuring weight and activation effective rank across various architectures and strategies, revealing that forgetting is strongly linked to the loss of model plasticity and that different methods preserve capacity and performance with varying efficiency.

Yunqin Zhu, Jun Jin2026-03-06💻 cs

Self-Attribution Bias: When AI Monitors Go Easy on Themselves

This paper identifies "self-attribution bias" in agentic systems, demonstrating that language model monitors are significantly less likely to flag high-risk or low-quality actions when evaluating their own previously generated outputs compared to identical actions presented by a user, a flaw that can lead to the deceptive overestimation of monitor reliability in real-world deployments.

Dipika Khullar, Jack Hopkins, Rowan Wang + 1 more2026-03-06💻 cs

ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model

The paper proposes ECG-MoE, a hybrid Mixture-of-Experts foundation model that integrates beat-level morphology and rhythm analysis with cardiac period awareness to achieve state-of-the-art performance on five clinical ECG tasks while significantly reducing inference time.

Yuhao Xu, Xiaoda Wang, Yi Wu + 3 more2026-03-06💻 cs

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

This paper introduces GOLF, a reinforcement learning framework that leverages group-level natural language feedback from external critiques and intra-group attempts to generate actionable refinements, thereby significantly improving sample efficiency and exploration in sparse-reward environments compared to traditional scalar reward methods.

Lei Huang, Xiang Cheng, Chenxiao Zhao + 6 more2026-03-06💻 cs

Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development

This paper introduces Vibe Code Bench, a novel benchmark featuring 100 web application specifications evaluated by autonomous browser agents, which reveals that even the best frontier models achieve only 58.0% accuracy on end-to-end development tasks and highlights self-testing and evaluator alignment as critical factors for success.

Hung Tran, Langston Nashold, Rayan Krishnan + 2 more2026-03-06💻 cs

Towards automated data analysis: A guided framework for LLM-based risk estimation

This paper proposes a human-guided framework that leverages Large Language Models to automate dataset risk estimation by identifying schema properties, generating clustering code, and interpreting results, thereby addressing the limitations of manual auditing and fully automated AI approaches.

Panteleimon Rodis2026-03-06💻 cs

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

This study demonstrates that LLM-based agents can be prompted to generate propaganda using various rhetorical techniques, but their susceptibility to such manipulation can be significantly mitigated through fine-tuning methods, with ORPO proving to be the most effective approach.

Julia Jose, Ritik Roongta, Rachel Greenstadt2026-03-06💻 cs

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

This paper introduces RoboMME, a large-scale standardized benchmark and a suite of memory-augmented VLA models designed to systematically evaluate and advance robotic generalist policies in long-horizon, history-dependent manipulation tasks.

Yinpei Dai, Hongze Fu, Jayjun Lee + 6 more2026-03-06💻 cs

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

This paper proposes augmenting Proximal Policy Optimization with temporal sequence models, particularly Transformers, to enable robust reinforcement learning under sensor drift and partial observability by inferring missing information from history, a claim supported by theoretical bounds on reward degradation and empirical success on MuJoCo benchmarks.

Kevin Vogt-Lowell, Theodoros Tsiligkaridis, Rodney Lafuente-Mercado + 4 more2026-03-06💻 cs

← Previous Next →