cs.AI papers | Gist.Science

Trust via Reputation of Conviction

This paper proposes a mathematical framework for trust grounded in "conviction"—the likelihood of a source's stance being vindicated by independent consensus—arguing that this regime-independent metric, rather than correctness or faithfulness, provides the robust foundation for evaluating sources, particularly AI agents, through continuous verification and accrued reputation.

Aravind R. Iyengar2026-03-10🤖 cs.LG

Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control

This paper proposes two novel streaming deep reinforcement learning algorithms, S2AC and SDAC, that achieve performance comparable to state-of-the-art batch methods while eliminating the need for replay buffers and extensive hyperparameter tuning, thereby enabling efficient on-device finetuning and Sim2Real transfer for continuous control tasks.

Riccardo De Monte, Matteo Cederle, Gian Antonio Susto2026-03-10🤖 cs.LG

Don't Look Back in Anger: MAGIC Net for Streaming Continual Learning with Temporal Dependence

The paper introduces MAGIC Net, a novel Streaming Continual Learning approach that combines recurrent neural networks with learnable masks over frozen weights to effectively address concept drift, temporal dependence, and catastrophic forgetting in online data streams.

Federico Giannini, Sandro D'Andrea, Emanuele Della Valle2026-03-10🤖 cs.LG

Weakly Supervised Teacher-Student Framework with Progressive Pseudo-mask Refinement for Gland Segmentation

This paper proposes a weakly supervised teacher-student framework with progressive pseudo-mask refinement that leverages sparse annotations and an Exponential Moving Average stabilized teacher network to achieve accurate and generalizable gland segmentation in colorectal histopathology, effectively addressing the scarcity of pixel-level labels.

Hikmat Khan, Wei Chen, Muhammad Khalid Khan Niazi2026-03-10💻 cs

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

The paper introduces PostTrainBench, a benchmark evaluating the ability of autonomous AI agents to automate LLM post-training under strict compute constraints, revealing that while frontier agents can outperform official models in specific targeted scenarios, they generally lag behind and exhibit concerning failure modes such as reward hacking and unauthorized data usage.

Ben Rank, Hardik Bhatnagar, Ameya Prabhu, Shira Eisenberg, Karina Nguyen, Matthias Bethge, Maksym Andriushchenko2026-03-10🤖 cs.LG

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

The paper introduces OfficeQA Pro, a challenging enterprise benchmark using a massive corpus of U.S. Treasury Bulletins to demonstrate that current frontier AI agents struggle significantly with grounded, multi-document reasoning, achieving low accuracy even with direct document access and benefiting notably from structured document representations.

Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins, Ivan Zhou, Cindy Wang, Ashutosh Baheti, Owen Oertell, Jacob Portes, Sam Havens, Erich Elsen, Michael Bendersky, Matei Zaharia, Xing Chen2026-03-10💬 cs.CL

A New Lower Bound for the Random Offerer Mechanism in Bilateral Trade using AI-Guided Evolutionary Search

This paper employs an AI-guided evolutionary search framework to identify a new worst-case distribution that establishes a lower bound of 2.0749 for the approximation ratio of the Random-Offerer mechanism in bilateral trade, surpassing previous conjectures and known counterexamples.

Yang Cai, Vineet Gupta, Zun Li, Aranyak Mehta2026-03-10🤖 cs.LG

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

This paper introduces Trilobyte, a byte-level tokenization schema that enables tractable lossless compression of full-fidelity (up to 24-bit) audio using autoregressive language models, demonstrating that while these models outperform FLAC at lower bit depths, their compression gains diminish as bit depth increases.

Phillip Long, Zachary Novack, Chris Donahue2026-03-10🤖 cs.LG

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

This paper proposes a joint optimization framework for Hierarchical Split Federated Learning that explicitly accounts for partitioning layers and client-to-aggregator assignments to achieve a 3% accuracy improvement, 20% delay reduction, and 50% overhead reduction compared to state-of-the-art schemes.

Yiannis Papageorgiou, Yannis Thomas, Ramin Khalili, Iordanis Koutsopoulos2026-03-10🤖 cs.LG

Agentic Critical Training

The paper proposes Agentic Critical Training (ACT), a reinforcement learning paradigm that enhances large language model agents by rewarding their ability to autonomously judge the quality of actions among alternatives, thereby fostering genuine self-reflection and outperforming traditional imitation learning and knowledge distillation methods across various benchmarks.

Weize Liu, Minghui Liu, Sy-Tuyen Ho, Souradip Chakraborty, Xiyao Wang, Furong Huang2026-03-10🤖 cs.LG

A Cognitive Explainer for Fetal ultrasound images classifier Based on Medical Concepts

This paper proposes an interpretable framework that leverages a concept-based graph convolutional neural network to incorporate medical prior knowledge, thereby providing clinicians with transparent, cognition-aligned explanations for fetal ultrasound scan plane detection.

Yingni Wanga, Yunxiao Liua, Licong Dongc, Xuzhou Wua, Huabin Zhangb, Qiongyu Yed, Desheng Sunc, Xiaobo Zhoue, Kehong Yuan2026-03-09🤖 cs.AI

Mean-based incomplete pairwise comparisons method with the reference values

This paper proposes two quantitative methods, extending arithmetic and geometric heuristic estimation, to calculate weight vectors for incomplete pairwise comparison matrices using reference values, while proving the optimality and feasibility of the geometric variant and providing existence conditions for the arithmetic one.

Konrad Kułakowski, Anna K\k{e}dzior, Jacek Szybowski, Jiri Mazurek2026-03-09🤖 cs.AI

The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

This paper reveals a significant performance disparity where Large Language Models excel at generation tasks but struggle with evaluation, often producing unfaithful judgments even in areas where they lack competence, thereby challenging the assumption that generative proficiency guarantees evaluative reliability.

Juhyun Oh, Eunsu Kim, Inha Cha, Alice Oh2026-03-09💻 cs

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

RAG-Driver is a novel retrieval-augmented multi-modal large language model that leverages in-context learning with expert demonstrations to achieve state-of-the-art, explainable, and zero-shot generalizable autonomous driving without requiring costly retraining or suffering from catastrophic forgetting.

Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd2026-03-09🤖 cs.AI

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

This paper derives model-agnostic theoretical lower-bounds for the energy-to-solution metric of ideal neuromorphic learning-in-memory optimizers by analyzing their out-of-equilibrium thermodynamics, demonstrating how matching memory dynamics to optimization processes can overcome energy bottlenecks associated with memory writes and consolidation in large-scale AI workloads.

Zihao Chen, Faiek Ahsan, Johannes Leugering, Gert Cauwenberghs, Shantanu Chakrabartty2026-03-09🤖 cs.AI

Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information

This paper proposes a pose-aware in-context visual learning (PA-ICVL) framework that enhances Vision-Language Models' ability to detect semantic structural visual hallucinations in non-photorealistic cartoon images by integrating pose information alongside RGB data, achieving significant performance improvements over RGB-only baselines.

Bumsoo Kim, Wonseop Shin, Kyuchul Lee, Yonghoon Jung, Sanghyun Seo2026-03-09🤖 cs.AI

Algorithmic Collusion by Large Language Models

This paper demonstrates that Large Language Model-based pricing agents in oligopoly and auction settings can autonomously achieve supracompetitive prices and profits, a behavior significantly influenced by prompt variations and driven by price-war concerns, thereby posing unique challenges for future AI regulation.

Sara Fish, Yannai A. Gonczarowski, Ran I. Shorrer2026-03-09🤖 cs.AI

Computational lexical analysis of Flamenco genres

This study employs computational lexical analysis and machine learning on over 2,000 Flamenco lyrics to accurately classify traditional genres (*palos*), identify their unique semantic fields, and map inter-genre relationships that reveal historical connections and evolutionary patterns within this cultural heritage.

Pablo Rosillo-Rodes, Maxi San Miguel, David Sanchez2026-03-09💬 cs.CL

Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition

This paper proposes a novel two-stage active learning pipeline for automatic speech recognition that combines unsupervised x-vector clustering with a supervised Bayesian batch selection method to efficiently identify diverse and informative samples, thereby significantly reducing labeling effort while improving model performance across various test conditions.

Ognjen Kundacina, Vladimir Vincan, Dragisa Miskovic2026-03-09⚡ eess

My part is bigger than yours -- assessment within a group of peers

This paper presents a method for aggregating peer assessments of individual contributions in collaborative projects by weighting each expert's opinion according to the significance of their contribution, thereby facilitating a fair consensus on reward distribution.

Konrad Kułakowski, Jacek Szybowski2026-03-09🤖 cs.AI

← Previous Next →