cs.AI papers | Gist.Science

Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets

This paper introduces XTF, an explainable framework that improves LLM fine-tuning performance by decomposing token contributions into reasoning importance, knowledge novelty, and task relevance to identify and mask noisy tokens, achieving up to a 13.7% improvement across math, code, and medical tasks.

Yuchen Yang, Wenze Lin, Enhao Huang, Zhixuan Chu, Hongbin Zhou, Lan Tao, Yiming Li, Zhan Qin, Kui Ren2026-03-10💬 cs.CL

LongAudio-RAG: Event-Grounded Question Answering over Multi-Hour Long Audio

LongAudio-RAG is a hybrid edge-cloud framework that enables precise, low-hallucination question answering over multi-hour audio streams by converting recordings into timestamped event records for SQL-based retrieval, which then grounds Large Language Model responses in structured evidence rather than raw audio.

Naveen Vakada, Kartik Hegde, Arvind Krishna Sridhar, Yinyi Guo, Erik Visser2026-03-10🤖 cs.LG

Understand Then Memory: A Cognitive Gist-Driven RAG Framework with Global Semantic Diffusion

CogitoRAG is a novel Retrieval-Augmented Generation framework inspired by human episodic memory that enhances complex reasoning and reduces hallucinations by extracting semantic gists into a multi-dimensional knowledge graph, utilizing query decomposition and entity diffusion for associative retrieval, and employing a fusion-based reranking algorithm to deliver high-density evidence.

Pengcheng Zhou, Haochen Li, Zhiqiang Nie, JiaLe Chen, Qing Gong, Weizhen Zhang, Chun Yu2026-03-10💬 cs.CL

Condition-Gated Reasoning for Context-Dependent Biomedical Question Answering

This paper introduces CondMedQA, the first benchmark for conditional biomedical question answering, and proposes Condition-Gated Reasoning (CGR), a framework that constructs condition-aware knowledge graphs to dynamically prune reasoning paths based on patient-specific factors, thereby improving the reliability of medical decision-making.

Jash Rajesh Parekh, Wonbin Kweon, Joey Chan, Rezarta Islamaj, Robert Leaman, Pengcheng Jiang, Chih-Hsuan Wei, Zhizheng Wang, Zhiyong Lu, Jiawei Han2026-03-10💬 cs.CL

Conformal Tradeoffs: Guarantees Beyond Coverage

This paper introduces a framework for operational certification of split conformal predictors that moves beyond marginal coverage by providing finite-sample guarantees for critical deployment metrics like commitment frequency and error exposure through Small-Sample Beta Correction, an independent audit-based auditing protocol, and a geometric analysis of Pareto trade-offs.

Petrus H. Zwart2026-03-10🤖 cs.LG

ABD: Default Exception Abduction in Finite First Order Worlds

This paper introduces ABD, a benchmark for default-exception abduction in finite first-order worlds that evaluates ten frontier LLMs on their ability to generate sparse, satisfiability-restoring formulas across three observation regimes, revealing that while models achieve high validity, they struggle with parsimony and exhibit distinct generalization failures.

Serafim Batzoglou2026-03-10✓ Author reviewed ⓘ💻 cs

INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic

This paper introduces INDUCTION, a benchmark designed to evaluate the ability of AI models to synthesize compact, generalizable first-order logic formulas that explain target predicates across small finite relational worlds, revealing distinct performance patterns and generalization strategies among recent elite models.

Serafim Batzoglou2026-03-10💻 cs

Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

This paper establishes a comprehensive multi-KPI benchmark for Multi-Agent Reinforcement Learning in urban energy management using the CityLearn environment, demonstrating that Decentralized Training with Decentralized Execution (DTDE) consistently outperforms Centralized Training with Decentralized Execution (CTDE) in both average and worst-case performance while offering greater resilience and sustainability.

Aymen Khouja, Imen Jendoubi, Oumayma Mahjoub, Oussama Mahfoudhi, Ruan De Kock, Siddarth Singh, Claude Formanek2026-03-10🤖 cs.LG

MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation

The paper introduces MrBERT, a family of efficient, open-source multilingual encoders built on the ModernBERT architecture that achieves state-of-the-art performance in specific languages and specialized domains while leveraging Matryoshka Representation Learning to reduce inference and storage costs.

Daniel Tamayo, Iñaki Lacunza, Paula Rivera-Hidalgo, Severino Da Dalt, Javier Aula-Blasco, Aitor Gonzalez-Agirre, Marta Villegas2026-03-10🤖 cs.LG

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

This paper introduces ARLArena, a unified framework that systematically analyzes training instability in agentic reinforcement learning to derive SAMPO, a stable optimization method that ensures consistent performance across diverse agentic tasks.

Xiaoxuan Wang, Han Zhang, Haixin Wang, Yidan Shi, Ruoyan Li, Kaiqiao Han, Chenyi Tong, Haoran Deng, Renliang Sun, Alexander Taylor, Yanqiao Zhu, Jason Cong, Yizhou Sun, Wei Wang2026-03-10💻 cs

CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints

CryoNet.Refine is a novel one-step diffusion model that automates and accelerates the refinement of atomic structures against cryo-EM density maps, outperforming traditional tools like Phenix in both model-map correlation and geometric quality while supporting diverse protein and nucleic acid complexes.

Fuyao Huang, Xiaozhu Yu, Kui Xu, Qiangfeng Cliff Zhang2026-03-10💻 cs

Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

This paper argues that AI agents equipped with specialized skills can augment, but not fully replace, social scientists by executing codifiable research tasks autonomously through "vibe researching," while highlighting the enduring necessity of human theoretical originality and tacit knowledge alongside the profession's emerging risks of stratification and pedagogical crisis.

Yongjun Zhang2026-03-10💻 cs

A Mathematical Theory of Agency and Intelligence

This paper introduces "bipredictability" (P) as a fundamental, bounded measure of shared information between observations, actions, and outcomes to distinguish mere agency from true intelligence, demonstrating that current AI systems lack the self-monitoring feedback loops necessary for adaptive learning and proposing a thalamocortical-inspired architecture to restore it.

Wael Hafez, Chenan Wei, Rodrigo Pena, Amir Nazeri, Cameron Reid2026-03-10🔢 math

Autoregressive Visual Decoding from EEG Signals

The paper introduces AVDE, a lightweight and efficient autoregressive framework that leverages contrastive learning and multi-scale token prediction to decode EEG signals into coherent images, outperforming state-of-the-art methods with significantly fewer parameters while mimicking the hierarchical nature of human visual perception.

Sicheng Dai, Hongwang Xiao, Shan Yu, Qiwei Ye2026-03-10🤖 cs.LG

Decomposing Physician Disagreement in HealthBench

This paper analyzes physician disagreement in the HealthBench dataset, revealing that while the majority of variance is structural and irreducible, a small but actionable portion stems from reducible uncertainties like missing context, suggesting that improving evaluation design to close information gaps could meaningfully reduce disagreement on borderline medical AI cases.

Satya Borgohain, Roy Mariathas2026-03-10💻 cs

CeRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion

CeRA overcomes the linear performance ceiling of Low-Rank Adaptation (LoRA) in complex reasoning tasks by introducing a weight-level parallel adapter with SiLU gating and structural dropout to induce manifold expansion, thereby achieving superior spectral efficiency and preventing rank collapse.

Hung-Hsuan Chen2026-03-10🤖 cs.LG

On Sample-Efficient Generalized Planning via Learned Transition Models

This paper proposes a sample-efficient approach to generalized planning that learns explicit neural transition models to predict intermediate world states, demonstrating superior out-of-distribution performance and data efficiency compared to direct action-sequence prediction methods.

Nitin Gupta, Vishal Pallagani, John A. Aydin, Biplav Srivastava2026-03-10💻 cs

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

This paper addresses the scarcity of expert textual relevance labels in large-scale app store search by leveraging a specialized, fine-tuned LLM to generate millions of high-quality labels, which, when used to augment the production ranker, significantly improves both offline metrics and real-world conversion rates, particularly for tail queries lacking reliable behavioral data.

Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad, Sean Suchter, Venkat Sundaranatha2026-03-10🤖 cs.LG

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

This paper introduces Attn-QAT, the first systematic 4-bit quantization-aware training framework for attention mechanisms that ensures stable FP4 training and inference by matching low-precision recomputation in the backward pass and correcting implicit precision assumptions, thereby eliminating quality drops and delivering up to 1.5x speedup on FP4-capable GPUs without relying on outlier-mitigation heuristics.

Peiyuan Zhang, Matthew Noto, Wenxuan Tan, Chengquan Jiang, Will Lin, Wei Zhou, Hao Zhang2026-03-10🤖 cs.LG

PEPA: a Persistently Autonomous Embodied Agent with Personalities

This paper introduces PEPA, a three-layer cognitive architecture that leverages personality traits to enable embodied agents to autonomously generate goals and sustain long-term operation in dynamic environments without relying on external task specifications.

Kaige Liu, Yang Li, Lijun Zhu, Weinan Zhang2026-03-10💻 cs

← Previous Next →