cs.AI papers | Gist.Science

ABD: Default Exception Abduction in Finite First Order Worlds

This paper introduces ABD, a benchmark for default-exception abduction in finite first-order worlds that evaluates ten frontier LLMs on their ability to generate sparse, satisfiability-restoring formulas across three observation regimes, revealing that while models achieve high validity, they struggle with parsimony and exhibit distinct generalization failures.

Serafim Batzoglou2026-03-10✓ Author reviewed ⓘ💻 cs

INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic

This paper introduces INDUCTION, a benchmark designed to evaluate the ability of AI models to synthesize compact, generalizable first-order logic formulas that explain target predicates across small finite relational worlds, revealing distinct performance patterns and generalization strategies among recent elite models.

Serafim Batzoglou2026-03-10💻 cs

Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

This paper establishes a comprehensive multi-KPI benchmark for Multi-Agent Reinforcement Learning in urban energy management using the CityLearn environment, demonstrating that Decentralized Training with Decentralized Execution (DTDE) consistently outperforms Centralized Training with Decentralized Execution (CTDE) in both average and worst-case performance while offering greater resilience and sustainability.

Aymen Khouja, Imen Jendoubi, Oumayma Mahjoub, Oussama Mahfoudhi, Ruan De Kock, Siddarth Singh, Claude Formanek2026-03-10🤖 cs.LG

MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation

The paper introduces MrBERT, a family of efficient, open-source multilingual encoders built on the ModernBERT architecture that achieves state-of-the-art performance in specific languages and specialized domains while leveraging Matryoshka Representation Learning to reduce inference and storage costs.

Daniel Tamayo, Iñaki Lacunza, Paula Rivera-Hidalgo, Severino Da Dalt, Javier Aula-Blasco, Aitor Gonzalez-Agirre, Marta Villegas2026-03-10🤖 cs.LG

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

This paper introduces ARLArena, a unified framework that systematically analyzes training instability in agentic reinforcement learning to derive SAMPO, a stable optimization method that ensures consistent performance across diverse agentic tasks.

Xiaoxuan Wang, Han Zhang, Haixin Wang, Yidan Shi, Ruoyan Li, Kaiqiao Han, Chenyi Tong, Haoran Deng, Renliang Sun, Alexander Taylor, Yanqiao Zhu, Jason Cong, Yizhou Sun, Wei Wang2026-03-10💻 cs

CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints

CryoNet.Refine is a novel one-step diffusion model that automates and accelerates the refinement of atomic structures against cryo-EM density maps, outperforming traditional tools like Phenix in both model-map correlation and geometric quality while supporting diverse protein and nucleic acid complexes.

Fuyao Huang, Xiaozhu Yu, Kui Xu, Qiangfeng Cliff Zhang2026-03-10💻 cs

Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

This paper argues that AI agents equipped with specialized skills can augment, but not fully replace, social scientists by executing codifiable research tasks autonomously through "vibe researching," while highlighting the enduring necessity of human theoretical originality and tacit knowledge alongside the profession's emerging risks of stratification and pedagogical crisis.

Yongjun Zhang2026-03-10💻 cs

A Mathematical Theory of Agency and Intelligence

This paper introduces "bipredictability" (P) as a fundamental, bounded measure of shared information between observations, actions, and outcomes to distinguish mere agency from true intelligence, demonstrating that current AI systems lack the self-monitoring feedback loops necessary for adaptive learning and proposing a thalamocortical-inspired architecture to restore it.

Wael Hafez, Chenan Wei, Rodrigo Pena, Amir Nazeri, Cameron Reid2026-03-10🔢 math

Autoregressive Visual Decoding from EEG Signals

The paper introduces AVDE, a lightweight and efficient autoregressive framework that leverages contrastive learning and multi-scale token prediction to decode EEG signals into coherent images, outperforming state-of-the-art methods with significantly fewer parameters while mimicking the hierarchical nature of human visual perception.

Sicheng Dai, Hongwang Xiao, Shan Yu, Qiwei Ye2026-03-10🤖 cs.LG

Decomposing Physician Disagreement in HealthBench

This paper analyzes physician disagreement in the HealthBench dataset, revealing that while the majority of variance is structural and irreducible, a small but actionable portion stems from reducible uncertainties like missing context, suggesting that improving evaluation design to close information gaps could meaningfully reduce disagreement on borderline medical AI cases.

Satya Borgohain, Roy Mariathas2026-03-10💻 cs

CeRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion

CeRA overcomes the linear performance ceiling of Low-Rank Adaptation (LoRA) in complex reasoning tasks by introducing a weight-level parallel adapter with SiLU gating and structural dropout to induce manifold expansion, thereby achieving superior spectral efficiency and preventing rank collapse.

Hung-Hsuan Chen2026-03-10🤖 cs.LG

On Sample-Efficient Generalized Planning via Learned Transition Models

This paper proposes a sample-efficient approach to generalized planning that learns explicit neural transition models to predict intermediate world states, demonstrating superior out-of-distribution performance and data efficiency compared to direct action-sequence prediction methods.

Nitin Gupta, Vishal Pallagani, John A. Aydin, Biplav Srivastava2026-03-10💻 cs

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

This paper addresses the scarcity of expert textual relevance labels in large-scale app store search by leveraging a specialized, fine-tuned LLM to generate millions of high-quality labels, which, when used to augment the production ranker, significantly improves both offline metrics and real-world conversion rates, particularly for tail queries lacking reliable behavioral data.

Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad, Sean Suchter, Venkat Sundaranatha2026-03-10🤖 cs.LG

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

This paper introduces Attn-QAT, the first systematic 4-bit quantization-aware training framework for attention mechanisms that ensures stable FP4 training and inference by matching low-precision recomputation in the backward pass and correcting implicit precision assumptions, thereby eliminating quality drops and delivering up to 1.5x speedup on FP4-capable GPUs without relying on outlier-mitigation heuristics.

Peiyuan Zhang, Matthew Noto, Wenxuan Tan, Chengquan Jiang, Will Lin, Wei Zhou, Hao Zhang2026-03-10🤖 cs.LG

PEPA: a Persistently Autonomous Embodied Agent with Personalities

This paper introduces PEPA, a three-layer cognitive architecture that leverages personality traits to enable embodied agents to autonomously generate goals and sustain long-term operation in dynamic environments without relying on external task specifications.

Kaige Liu, Yang Li, Lijun Zhu, Weinan Zhang2026-03-10💻 cs

How Well Do Multimodal Models Reason on ECG Signals?

This paper introduces a reproducible, scalable framework for evaluating multimodal models on ECG signals by decomposing reasoning into "Perception" (verified via code generation) and "Deduction" (verified via retrieval against clinical criteria) to address the limitations of existing manual or superficial evaluation methods.

Maxwell A. Xu, Harish Haresamudram, Catherine W. Liu, Patrick Langer, Jathurshan Pradeepkumar, Wanting Mao, Sunita J. Ferns, Aradhana Verma, Jimeng Sun, Paul Schmiedmayer, Xin Liu, Daniel McDuff, Emily B. Fox, James M. Rehg2026-03-10🤖 cs.LG

Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

This paper proposes a conformal prediction framework that ensures safe, domain-specific deployment of LLMs for medical entity extraction by adapting calibration thresholds to counteract the distinct underconfidence observed in structured FDA labels and overconfidence in free-text radiology reports, thereby achieving target coverage guarantees with manageable rejection rates across diverse clinical settings.

Manil Shrestha, Edward Kim2026-03-10💬 cs.CL

Extended Empirical Validation of the Explainability Solution Space

This technical report extends the empirical validation of the Explainability Solution Space (ESS) framework by demonstrating its domain-independent applicability and systematic adaptability to diverse governance roles and stakeholder configurations through a cross-domain evaluation involving both employee attrition and urban resource allocation systems.

Antoni Mestre, Manoli Albert, Miriam Gil, Vicente Pelechano2026-03-10💻 cs

Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

This paper proposes a tractable two-layer framework combining a Hidden Markov Model for inferring rival energy states and a Deep Q-Network for decision-making to optimize 2026 Formula 1 energy strategies under partial observability, specifically addressing the "counter-harvest trap" where opponents deliberately mask their deployment signals.

Kalliopi Kleisarchaki2026-03-10🤖 cs.LG

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

HarmonyCell is an end-to-end agent framework that automates single-cell perturbation modeling by combining an LLM-driven semantic unifier to resolve metadata incompatibilities and an adaptive Monte Carlo Tree Search engine to synthesize architectures that handle distribution shifts, thereby achieving high execution success and outperforming expert baselines without manual engineering.

Wenxuan Huang, Mingyu Tsoi, Yanhao Huang, Xinjie Mao, Xue Xia, Hao Wu, Jiaqi Wei, Yuejin Yang, Lang Yu, Cheng Tan, Xiang Zhang, Zhangyang Gao, Siqi Sun2026-03-10💻 cs

← Previous Next →