cs.AI papers | Gist.Science

Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

This paper argues that AI agents equipped with specialized skills can augment, but not fully replace, social scientists by executing codifiable research tasks autonomously through "vibe researching," while highlighting the enduring necessity of human theoretical originality and tacit knowledge alongside the profession's emerging risks of stratification and pedagogical crisis.

Yongjun Zhang2026-03-10💻 cs

A Mathematical Theory of Agency and Intelligence

This paper introduces "bipredictability" (P) as a fundamental, bounded measure of shared information between observations, actions, and outcomes to distinguish mere agency from true intelligence, demonstrating that current AI systems lack the self-monitoring feedback loops necessary for adaptive learning and proposing a thalamocortical-inspired architecture to restore it.

Wael Hafez, Chenan Wei, Rodrigo Pena, Amir Nazeri, Cameron Reid2026-03-10🔢 math

Autoregressive Visual Decoding from EEG Signals

The paper introduces AVDE, a lightweight and efficient autoregressive framework that leverages contrastive learning and multi-scale token prediction to decode EEG signals into coherent images, outperforming state-of-the-art methods with significantly fewer parameters while mimicking the hierarchical nature of human visual perception.

Sicheng Dai, Hongwang Xiao, Shan Yu, Qiwei Ye2026-03-10🤖 cs.LG

Decomposing Physician Disagreement in HealthBench

This paper analyzes physician disagreement in the HealthBench dataset, revealing that while the majority of variance is structural and irreducible, a small but actionable portion stems from reducible uncertainties like missing context, suggesting that improving evaluation design to close information gaps could meaningfully reduce disagreement on borderline medical AI cases.

Satya Borgohain, Roy Mariathas2026-03-10💻 cs

CeRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion

CeRA overcomes the linear performance ceiling of Low-Rank Adaptation (LoRA) in complex reasoning tasks by introducing a weight-level parallel adapter with SiLU gating and structural dropout to induce manifold expansion, thereby achieving superior spectral efficiency and preventing rank collapse.

Hung-Hsuan Chen2026-03-10🤖 cs.LG

On Sample-Efficient Generalized Planning via Learned Transition Models

This paper proposes a sample-efficient approach to generalized planning that learns explicit neural transition models to predict intermediate world states, demonstrating superior out-of-distribution performance and data efficiency compared to direct action-sequence prediction methods.

Nitin Gupta, Vishal Pallagani, John A. Aydin, Biplav Srivastava2026-03-10💻 cs

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

This paper addresses the scarcity of expert textual relevance labels in large-scale app store search by leveraging a specialized, fine-tuned LLM to generate millions of high-quality labels, which, when used to augment the production ranker, significantly improves both offline metrics and real-world conversion rates, particularly for tail queries lacking reliable behavioral data.

Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad, Sean Suchter, Venkat Sundaranatha2026-03-10🤖 cs.LG

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

This paper introduces Attn-QAT, the first systematic 4-bit quantization-aware training framework for attention mechanisms that ensures stable FP4 training and inference by matching low-precision recomputation in the backward pass and correcting implicit precision assumptions, thereby eliminating quality drops and delivering up to 1.5x speedup on FP4-capable GPUs without relying on outlier-mitigation heuristics.

Peiyuan Zhang, Matthew Noto, Wenxuan Tan, Chengquan Jiang, Will Lin, Wei Zhou, Hao Zhang2026-03-10🤖 cs.LG

PEPA: a Persistently Autonomous Embodied Agent with Personalities

This paper introduces PEPA, a three-layer cognitive architecture that leverages personality traits to enable embodied agents to autonomously generate goals and sustain long-term operation in dynamic environments without relying on external task specifications.

Kaige Liu, Yang Li, Lijun Zhu, Weinan Zhang2026-03-10💻 cs

How Well Do Multimodal Models Reason on ECG Signals?

This paper introduces a reproducible, scalable framework for evaluating multimodal models on ECG signals by decomposing reasoning into "Perception" (verified via code generation) and "Deduction" (verified via retrieval against clinical criteria) to address the limitations of existing manual or superficial evaluation methods.

Maxwell A. Xu, Harish Haresamudram, Catherine W. Liu, Patrick Langer, Jathurshan Pradeepkumar, Wanting Mao, Sunita J. Ferns, Aradhana Verma, Jimeng Sun, Paul Schmiedmayer, Xin Liu, Daniel McDuff, Emily B. Fox, James M. Rehg2026-03-10🤖 cs.LG

Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

This paper proposes a conformal prediction framework that ensures safe, domain-specific deployment of LLMs for medical entity extraction by adapting calibration thresholds to counteract the distinct underconfidence observed in structured FDA labels and overconfidence in free-text radiology reports, thereby achieving target coverage guarantees with manageable rejection rates across diverse clinical settings.

Manil Shrestha, Edward Kim2026-03-10💬 cs.CL

Extended Empirical Validation of the Explainability Solution Space

This technical report extends the empirical validation of the Explainability Solution Space (ESS) framework by demonstrating its domain-independent applicability and systematic adaptability to diverse governance roles and stakeholder configurations through a cross-domain evaluation involving both employee attrition and urban resource allocation systems.

Antoni Mestre, Manoli Albert, Miriam Gil, Vicente Pelechano2026-03-10💻 cs

Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

This paper proposes a tractable two-layer framework combining a Hidden Markov Model for inferring rival energy states and a Deep Q-Network for decision-making to optimize 2026 Formula 1 energy strategies under partial observability, specifically addressing the "counter-harvest trap" where opponents deliberately mask their deployment signals.

Kalliopi Kleisarchaki2026-03-10🤖 cs.LG

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

HarmonyCell is an end-to-end agent framework that automates single-cell perturbation modeling by combining an LLM-driven semantic unifier to resolve metadata incompatibilities and an adaptive Monte Carlo Tree Search engine to synthesize architectures that handle distribution shifts, thereby achieving high execution success and outperforming expert baselines without manual engineering.

Wenxuan Huang, Mingyu Tsoi, Yanhao Huang, Xinjie Mao, Xue Xia, Hao Wu, Jiaqi Wei, Yuejin Yang, Lang Yu, Cheng Tan, Xiang Zhang, Zhangyang Gao, Siqi Sun2026-03-10💻 cs

LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

This paper proposes a novel LLM-driven closed-loop framework that maps natural language instructions to executable rules and semantically annotates options to enhance the data efficiency, interpretability, and cross-environment transferability of Deep Reinforcement Learning, with experimental validation showing superior performance in constraint compliance and skill reuse.

Chang Yao, Jinghui Qin, Kebing Jin, Hankz Hankui Zhuo2026-03-10💻 cs

A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology Assessment

This paper presents a computationally efficient, detection-gated deep learning pipeline that achieves state-of-the-art robustness and cross-dataset generalization in glottal segmentation from high-speed videoendoscopy, enabling reliable extraction of clinical biomarkers for distinguishing healthy from pathological vocal function.

Harikrishnan Unnikrishnan2026-03-10🤖 cs.LG

Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta

This paper proposes a robust framework combining the hybrid CoAtNet architecture with model soups ensembling to effectively classify Intangible Cultural Heritage images from the Mekong Delta, achieving state-of-the-art performance on the ICH-17 dataset by reducing variance and enhancing generalization in data-scarce, high-similarity settings.

Quoc-Khang Tran, Minh-Thien Nguyen, Nguyen-Khang Pham2026-03-10🤖 cs.LG

Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

This paper introduces a diagnostic framework demonstrating that in memory-augmented LLM agents, retrieval quality is the dominant factor influencing performance on the LoCoMo benchmark, often outweighing the impact of sophisticated write strategies and suggesting that raw chunked storage can outperform expensive, lossy alternatives.

Boqin Yuan, Yue Su, Kun Yao2026-03-10🤖 cs.AI

Agentified Assessment of Logical Reasoning Agents

This paper introduces an agentified assessment framework that utilizes an assessor agent to ensure reproducible and robust evaluation of logical reasoning systems, demonstrating its effectiveness by benchmarking an auto-formalization agent that achieves 86.70% accuracy on a solver-verified FOLIO dataset, significantly outperforming a chain-of-thought baseline.

Zhiyu Ni, Yifeng Xiao, Zheng Liang2026-03-10💻 cs

Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers

This paper introduces GramCol and a motion-feature selection algorithm to generate Interpretable Motion-Attentive Maps (IMAPs) that effectively localize both motion and non-motion concepts in Video Diffusion Transformers without requiring gradient calculations or parameter updates.

Youngjun Jun, Seil Kang, Woojung Han, Seong Jae Hwang2026-03-10🤖 cs.LG

← Previous Next →