cs.AI papers | Gist.Science

Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

This paper proposes a conformal prediction framework that ensures safe, domain-specific deployment of LLMs for medical entity extraction by adapting calibration thresholds to counteract the distinct underconfidence observed in structured FDA labels and overconfidence in free-text radiology reports, thereby achieving target coverage guarantees with manageable rejection rates across diverse clinical settings.

Manil Shrestha, Edward Kim2026-03-10💬 cs.CL

Extended Empirical Validation of the Explainability Solution Space

This technical report extends the empirical validation of the Explainability Solution Space (ESS) framework by demonstrating its domain-independent applicability and systematic adaptability to diverse governance roles and stakeholder configurations through a cross-domain evaluation involving both employee attrition and urban resource allocation systems.

Antoni Mestre, Manoli Albert, Miriam Gil, Vicente Pelechano2026-03-10💻 cs

Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

This paper proposes a tractable two-layer framework combining a Hidden Markov Model for inferring rival energy states and a Deep Q-Network for decision-making to optimize 2026 Formula 1 energy strategies under partial observability, specifically addressing the "counter-harvest trap" where opponents deliberately mask their deployment signals.

Kalliopi Kleisarchaki2026-03-10🤖 cs.LG

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

HarmonyCell is an end-to-end agent framework that automates single-cell perturbation modeling by combining an LLM-driven semantic unifier to resolve metadata incompatibilities and an adaptive Monte Carlo Tree Search engine to synthesize architectures that handle distribution shifts, thereby achieving high execution success and outperforming expert baselines without manual engineering.

Wenxuan Huang, Mingyu Tsoi, Yanhao Huang, Xinjie Mao, Xue Xia, Hao Wu, Jiaqi Wei, Yuejin Yang, Lang Yu, Cheng Tan, Xiang Zhang, Zhangyang Gao, Siqi Sun2026-03-10💻 cs

LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

This paper proposes a novel LLM-driven closed-loop framework that maps natural language instructions to executable rules and semantically annotates options to enhance the data efficiency, interpretability, and cross-environment transferability of Deep Reinforcement Learning, with experimental validation showing superior performance in constraint compliance and skill reuse.

Chang Yao, Jinghui Qin, Kebing Jin, Hankz Hankui Zhuo2026-03-10💻 cs

A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology Assessment

This paper presents a computationally efficient, detection-gated deep learning pipeline that achieves state-of-the-art robustness and cross-dataset generalization in glottal segmentation from high-speed videoendoscopy, enabling reliable extraction of clinical biomarkers for distinguishing healthy from pathological vocal function.

Harikrishnan Unnikrishnan2026-03-10🤖 cs.LG

Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta

This paper proposes a robust framework combining the hybrid CoAtNet architecture with model soups ensembling to effectively classify Intangible Cultural Heritage images from the Mekong Delta, achieving state-of-the-art performance on the ICH-17 dataset by reducing variance and enhancing generalization in data-scarce, high-similarity settings.

Quoc-Khang Tran, Minh-Thien Nguyen, Nguyen-Khang Pham2026-03-10🤖 cs.LG

Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

This paper introduces a diagnostic framework demonstrating that in memory-augmented LLM agents, retrieval quality is the dominant factor influencing performance on the LoCoMo benchmark, often outweighing the impact of sophisticated write strategies and suggesting that raw chunked storage can outperform expensive, lossy alternatives.

Boqin Yuan, Yue Su, Kun Yao2026-03-10🤖 cs.AI

Agentified Assessment of Logical Reasoning Agents

This paper introduces an agentified assessment framework that utilizes an assessor agent to ensure reproducible and robust evaluation of logical reasoning systems, demonstrating its effectiveness by benchmarking an auto-formalization agent that achieves 86.70% accuracy on a solver-verified FOLIO dataset, significantly outperforming a chain-of-thought baseline.

Zhiyu Ni, Yifeng Xiao, Zheng Liang2026-03-10💻 cs

Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers

This paper introduces GramCol and a motion-feature selection algorithm to generate Interpretable Motion-Attentive Maps (IMAPs) that effectively localize both motion and non-motion concepts in Video Diffusion Transformers without requiring gradient calculations or parameter updates.

Youngjun Jun, Seil Kang, Woojung Han, Seong Jae Hwang2026-03-10🤖 cs.LG

Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails

This paper provides the first theoretical proof that Adam's second-moment normalization yields significantly sharper high-probability convergence guarantees ( $\delta^{-1/2}$ dependence) compared to SGD ( $\delta^{-1}$ dependence) under the classical bounded variance model, thereby explaining its empirical superiority.

Ruinan Jin, Yingbin Liang, Shaofeng Zou2026-03-10🤖 cs.LG

Information Routing in Atomistic Foundation Models: How Task Alignment and Equivariance Shape Linear Disentanglement

This paper introduces Compositional Probe Decomposition (CPD) to demonstrate that linear disentanglement of geometric and compositional information in atomistic foundation models is primarily driven by task alignment rather than architecture, revealing a significant performance gradient where models trained on specific properties like HOMO-LUMO gaps outperform energy-trained models and exhibit symmetry-dependent information routing.

Joshua Steier2026-03-10🤖 cs.LG

No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models

This paper demonstrates that Contamination Detection via output Distribution (CDD) is largely ineffective for small language models (70M–410M parameters) because it fails to detect verbatim memorization, whereas probability-based methods like perplexity and Min-k% Prob consistently outperform it across various benchmarks.

Omer Sela (Tel Aviv University)2026-03-10💬 cs.CL

Agentic SPARQL: Evaluating SPARQL-MCP-powered Intelligent Agents on the Federated KGQA Benchmark

This paper explores the potential of SPARQL-MCP-powered intelligent agents for federated Knowledge Graph Question Answering by extending an existing benchmark to evaluate agentic capabilities in endpoint discovery, schema exploration, and query formulation across multiple data sources.

Daniel Dobriy, Frederik Bauer, Amr Azzam + 2 more2026-03-10🤖 cs.AI

Right Move, Right Time: Multi-Sport Space Evaluation Platform for Ultimate Frisbee, Basketball, and Soccer

This paper introduces an open, sport-agnostic platform that standardizes player tracking data to evaluate usable space and optimal off-ball run timing across Ultimate Frisbee, basketball, and soccer, demonstrating a practical path for consistent spatial analysis in invasion sports.

Shunsuke Iwashita, Titouan Jeannot, Braden Eberhard + 4 more2026-03-10🤖 cs.AI

Autonomous AI Agents for Option Hedging: Enhancing Financial Stability through Shortfall Aware Reinforcement Learning

This paper introduces two shortfall-aware reinforcement learning frameworks, RLOP and QLBS, which outperform traditional parametric models in reducing tail risk and hedging shortfalls for SPY and XOP options, thereby offering a more robust approach to autonomous derivatives risk management.

Minxuan Hu, Ziheng Chen, Jiayu Yi + 1 more2026-03-10💰 q-fin

Isotonic Layer: A Universal Framework for Generic Recommendation Debiasing

This paper introduces the Isotonic Layer, a novel differentiable framework that integrates piecewise linear fitting and learnable embeddings into neural architectures to enforce global monotonicity, thereby enabling granular, context-aware debiasing and improved calibration for large-scale recommendation systems.

Hailing Cheng, Yafang Yang, Hemeng Tao, Fengyu Zhang2026-03-10🤖 cs.LG

ARC-AGI-2 Technical Report

This paper presents a transformer-based system that significantly advances ARC performance by integrating a compact task encoding, symmetry-based data augmentation, test-time LoRA adaptation, and multi-perspective decoding to enable efficient neural inference and human-level generalization from few examples.

Wallyson Lemes de Oliveira, Mekhron Bobokhonov, Matteo Caorsi, Aldo Podestà, Gabriele Beltramo, Luca Crosato, Matteo Bonotto, Federica Cecchetto, Hadrien Espic, Dan Titus Salajan, Stefan Taga, Luca Pana, Joe Carthy2026-03-10💬 cs.CL

A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

This paper demonstrates that current LLM-as-a-Judge frameworks fail to reliably measure adversarial robustness due to unaccounted distribution shifts that degrade performance to near-random levels, often leading to inflated attack success rates, and proposes new benchmarks to address these evaluation flaws.

Leo Schwinn, Moritz Ladenburger, Tim Beyer, Mehrnaz Mofakhami, Gauthier Gidel, Stephan Günnemann2026-03-10💬 cs.CL

Distributionally Robust Geometric Joint Chance-Constrained Optimization: Neurodynamic Approaches

This paper introduces a two-time scale neurodynamic duplex approach utilizing projection equations to solve distributionally robust geometric joint chance-constrained optimization problems with unknown distributions, demonstrating convergence to the global optimum through neural networks in applications such as shape optimization and telecommunications.

Ange Valli (L2S), Siham Tassouli (OPTIM), Abdel Lisser (L2S)2026-03-10🔢 math

← Previous Next →