cs.AI papers | Gist.Science

Progressive Refinement Regulation for Accelerating Diffusion Language Model Decoding

This paper proposes Progressive Refinement Regulation (PRR), a dynamic, trajectory-grounded framework that learns a token-wise controller to adaptively regulate the denoising process based on empirical convergence progress, thereby substantially accelerating Diffusion Language Model decoding while preserving generation quality.

Lipeng Wan, Jianhui Gu, Junjie Ma + 4 more2026-03-06💻 cs

Augmenting representations with scientific papers

This paper introduces a contrastive learning framework that aligns X-ray spectra with scientific literature to create shared multimodal representations, significantly improving the estimation of physical variables and enabling the discovery of rare astrophysical sources through integrated data analysis.

Nicolò Oreste Pinciroli Vago, Rocco Di Tella, Carolina Cuesta-Lázaro + 3 more2026-03-06✓ Author reviewed ⓘ🔭 astro-ph

Projected Hessian Learning: Fast Curvature Supervision for Accurate Machine-Learning Interatomic Potentials

The paper introduces Projected Hessian Learning (PHL), a scalable framework that enables efficient, curvature-informed training of machine-learning interatomic potentials by utilizing stochastic Hessian-vector products instead of explicit Hessian matrices, thereby achieving full-second-order accuracy with significantly reduced computational cost and memory requirements.

Austin Rodriguez, Justin S. Smith, Sakib Matin + 3 more2026-03-06🔬 physics

Discovering mathematical concepts through a multi-agent system

This paper presents a multi-agent system that autonomously discovers mathematical concepts, such as homology, by dynamically interweaving conjecture generation, proof attempts, and counterexample analysis, demonstrating that optimizing these local processes effectively aligns with human notions of mathematical interestingness.

Daattavya Aggarwal, Oisin Kim, Carl Henrik Ek + 1 more2026-03-06🔢 math

Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks

This paper investigates temporal drift in the FreshStack retrieval benchmark by comparing 2024 and 2025 corpus snapshots, finding that while relevant documents migrate between repositories, retrieval model rankings remain highly stable, suggesting that such benchmarks can remain reliable despite temporal corpus evolution.

Nathan Kuissi, Suraj Subrahmanyan, Nandan Thakur + 1 more2026-03-06💻 cs

Invariant Causal Routing for Governing Social Norms in Online Market Economies

This paper proposes Invariant Causal Routing (ICR), a causal governance framework that leverages counterfactual reasoning and invariant causal discovery to identify stable, interpretable policy rules for steering emergent social norms in online market economies across heterogeneous environments.

Xiangning Yu, Qirui Mi, Xiao Xue + 4 more2026-03-06💻 cs

Adaptive Memory Admission Control for LLM Agents

This paper proposes Adaptive Memory Admission Control (A-MAC), a framework that decomposes memory value into five interpretable factors to enable transparent, efficient, and domain-adaptive long-term memory management for LLM agents, achieving superior precision-recall tradeoffs and reduced latency compared to state-of-the-art systems.

Guilin Zhang, Wei Jiang, Xiejiashan Wang + 5 more2026-03-06💻 cs

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

This study investigates the correlation between catastrophic forgetting and structural collapse in continual learning by measuring weight and activation effective rank across various architectures and strategies, revealing that forgetting is strongly linked to the loss of model plasticity and that different methods preserve capacity and performance with varying efficiency.

Yunqin Zhu, Jun Jin2026-03-06💻 cs

Self-Attribution Bias: When AI Monitors Go Easy on Themselves

This paper identifies "self-attribution bias" in agentic systems, demonstrating that language model monitors are significantly less likely to flag high-risk or low-quality actions when evaluating their own previously generated outputs compared to identical actions presented by a user, a flaw that can lead to the deceptive overestimation of monitor reliability in real-world deployments.

Dipika Khullar, Jack Hopkins, Rowan Wang + 1 more2026-03-06💻 cs

ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model

The paper proposes ECG-MoE, a hybrid Mixture-of-Experts foundation model that integrates beat-level morphology and rhythm analysis with cardiac period awareness to achieve state-of-the-art performance on five clinical ECG tasks while significantly reducing inference time.

Yuhao Xu, Xiaoda Wang, Yi Wu + 3 more2026-03-06💻 cs

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

This paper introduces GOLF, a reinforcement learning framework that leverages group-level natural language feedback from external critiques and intra-group attempts to generate actionable refinements, thereby significantly improving sample efficiency and exploration in sparse-reward environments compared to traditional scalar reward methods.

Lei Huang, Xiang Cheng, Chenxiao Zhao + 6 more2026-03-06💻 cs

Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development

This paper introduces Vibe Code Bench, a novel benchmark featuring 100 web application specifications evaluated by autonomous browser agents, which reveals that even the best frontier models achieve only 58.0% accuracy on end-to-end development tasks and highlights self-testing and evaluator alignment as critical factors for success.

Hung Tran, Langston Nashold, Rayan Krishnan + 2 more2026-03-06💻 cs

Towards automated data analysis: A guided framework for LLM-based risk estimation

This paper proposes a human-guided framework that leverages Large Language Models to automate dataset risk estimation by identifying schema properties, generating clustering code, and interpreting results, thereby addressing the limitations of manual auditing and fully automated AI approaches.

Panteleimon Rodis2026-03-06💻 cs

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

This study demonstrates that LLM-based agents can be prompted to generate propaganda using various rhetorical techniques, but their susceptibility to such manipulation can be significantly mitigated through fine-tuning methods, with ORPO proving to be the most effective approach.

Julia Jose, Ritik Roongta, Rachel Greenstadt2026-03-06💻 cs

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

This paper introduces RoboMME, a large-scale standardized benchmark and a suite of memory-augmented VLA models designed to systematically evaluate and advance robotic generalist policies in long-horizon, history-dependent manipulation tasks.

Yinpei Dai, Hongze Fu, Jayjun Lee + 6 more2026-03-06💻 cs

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

This paper proposes augmenting Proximal Policy Optimization with temporal sequence models, particularly Transformers, to enable robust reinforcement learning under sensor drift and partial observability by inferring missing information from history, a claim supported by theoretical bounds on reward degradation and empirical success on MuJoCo benchmarks.

Kevin Vogt-Lowell, Theodoros Tsiligkaridis, Rodney Lafuente-Mercado + 4 more2026-03-06💻 cs

GIANT - Global Path Integration and Attentive Graph Networks for Multi-Agent Trajectory Planning

This paper introduces GIANT, a novel multi-robot trajectory planning framework that combines global path integration with attentive graph neural networks to achieve robust, high-success collision avoidance in complex, dynamic environments.

Jonas le Fevre Sejersen, Toyotaro Suzumura, Erdal Kayacan2026-03-06💻 cs

Neuro-Symbolic Financial Reasoning via Deterministic Fact Ledgers and Adversarial Low-Latency Hallucination Detector

This paper introduces VeNRA, a neuro-symbolic financial reasoning system that replaces probabilistic text retrieval with a strictly typed Universal Fact Ledger and employs an adversarially trained Sentinel model to audit execution traces, thereby eliminating hallucinations and arithmetic errors in high-stakes financial domains.

Pedram Agand2026-03-06💻 cs

Using Vision + Language Models to Predict Item Difficulty

This study demonstrates that a multimodal approach combining vision and language models (GPT-4.1-nano) to analyze both visualization images and text features significantly outperforms unimodal methods in predicting the difficulty of data literacy test items for U.S. adults, achieving a mean absolute error of 0.224.

Samin Khan2026-03-06💻 cs

Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks

This paper introduces PulseFocus, a training-free inference-time method that mitigates diffuse attention patterns and positional biases in reasoning VLMs by structuring chain-of-thought generation into interleaved planning and focus blocks, thereby significantly improving performance on multi-image benchmarks.

Chenjun Li2026-03-06💻 cs

← Previous Next →