cs.AI papers | Gist.Science

From Next Token Prediction to (STRIPS) World Models

This paper investigates whether next-token prediction can learn symbolic STRIPS world models for planning, finding that while a specialized STRIPS Transformer offers theoretical alignment, a standard transformer with stick-breaking attention achieves superior training accuracy and generalization, enabling effective planning across unseen states and goals.

Carlos Núñez-Molina, Vicenç Gómez, Hector Geffner2026-03-12🤖 cs.AI

Global Minimizers of Sigmoid Contrastive Loss

This paper theoretically characterizes the global minimizers of sigmoid contrastive loss as $(\mathsf{m}, \mathsf{b}_{\mathsf{rel}})$ -Constellations, providing a rigorous explanation for the success of SigLIP models, the origin of the modality gap, and the necessary dimensionality for high-quality representations while proposing an improved reparameterization for training dynamics.

Kiril Bangachev, Guy Bresler, Iliyas Noman, Yury Polyanskiy2026-03-12🤖 cs.LG

RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

RADAR is a lightweight, interpretable routing framework that optimizes the performance-cost tradeoff for reasoning LLMs by leveraging psychometric-inspired item response modeling to dynamically match query difficulties with appropriate model-budget pairs across diverse benchmarks.

Nigel Fernandez, Branislav Kveton, Ryan A. Rossi, Andrew S. Lan, Zichao Wang2026-03-12🤖 cs.AI

BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

This paper introduces a benchmark to reveal significant tool-selection bias in large language models driven by metadata alignment and pre-training exposure, and proposes a lightweight filtering-and-sampling strategy to mitigate these fairness issues while maintaining task coverage.

Thierry Blankenstein, Jialin Yu, Zixuan Li, Vassilis Plachouras, Sunando Sengupta, Philip Torr, Yarin Gal, Alasdair Paren, Adel Bibi2026-03-12🤖 cs.AI

MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations

This paper introduces MonitorVLM, a novel vision-language framework that leverages a specialized mining dataset and innovative modules for clause filtering and behavior magnification to significantly outperform baseline models in automatically detecting safety violations from surveillance video streams in mining operations.

Jiang Wu, Sichao Wu, Yinsong Ma, Guangyuan Yu, Haoyuan Xu, Lifang Zheng, Jingliang Duan2026-03-12🤖 cs.AI

A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG

This paper presents the first systematic evaluation of self-supervised learning for label-efficient sleep staging using wearable EEG, demonstrating that a specialized SSL pipeline significantly outperforms supervised baselines and general-purpose foundation models by achieving clinical-grade accuracy with only 5–10% of labeled data.

Emilio Estevan, María Sierra-Torralba, Eduardo López-Larraz, Luis Montesano2026-03-12🤖 cs.AI

HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection

The paper proposes HyWA, a novel Personalized Voice Activity Detection (PVAD) approach that utilizes a hypernetwork to generate personalized weights for selected layers of a standard VAD model, demonstrating consistent performance improvements and enhanced deployment flexibility compared to existing speaker-conditioning methods.

Mahsa Ghazvini Nejad, Hamed Jafarzadeh Asl, Amin Edraki, Mohammadreza Sadeghi, Masoud Asgharian, Yuanhao Yu, Vahid Partovi Nia2026-03-12⚡ eess

Reveal-to-Revise: Explainable Bias-Aware Generative Modeling with Multimodal Attention

This paper introduces "Reveal-to-Revise," an explainable, bias-aware generative framework that unifies cross-modal attention, Grad-CAM++ attribution, and iterative feedback to achieve state-of-the-art performance and fairness in multimodal image generation and text classification tasks.

Noor Islam S. Mohammad, Md Muntaqim Meherab2026-03-12🤖 cs.LG

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

The paper introduces MVCustom, a novel diffusion-based framework that unifies multi-view camera pose control and prompt-based customization by leveraging a feature-field representation for training and employing depth-aware rendering with consistent latent completion during inference to ensure both geometric consistency and subject fidelity.

Minjung Shin, Hyunin Cho, Sooyeon Go, Jin-Hwa Kim, Youngjung Uh2026-03-12🤖 cs.AI

Predicting kernel regression learning curves from only raw data statistics

This paper introduces the Hermite eigenstructure ansatz (HEA), a theoretical framework that accurately predicts kernel regression learning curves on real datasets using only the empirical data covariance and target function decomposition, by approximating kernel eigenstructures as Hermite polynomials and demonstrating that MLPs in the feature-learning regime follow similar learning patterns.

Dhruva Karkada, Joseph Turnbull, Yuxi Liu, James B. Simon2026-03-12🤖 cs.LG

KV Cache Transform Coding for Compact Storage in LLM Inference

KVTC is a lightweight, model-agnostic transform coder that achieves up to 20 $\times$ (or higher) compression of Key-Value caches for large language models by combining PCA-based decorrelation, adaptive quantization, and entropy coding, thereby enabling memory-efficient serving with reusable caches while maintaining high reasoning and long-context accuracy.

Konrad Staniszewski, Adrian Łancucki2026-03-12💬 cs.CL

Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study

This study evaluates the ability of six LLM-based systems to answer expert-level questions about high-temperature superconductivity using a curated database of 1,726 papers, finding that retrieval-augmented generation (RAG) systems outperform closed models in providing comprehensive, well-supported answers while highlighting both the potential and current limitations of LLMs in specialized scientific domains.

Haoyu Guo, Maria Tikhanovskaya, Paul Raccuglia + 20 more2026-03-12🤖 cs.AI

DeepEyesV2: Toward Agentic Multimodal Model

This paper introduces DeepEyesV2, an agentic multimodal model that employs a two-stage training pipeline combining cold-start data curation and reinforcement learning to effectively integrate external tools like code execution and web search for complex real-world reasoning tasks.

Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, Xing Yu2026-03-12🤖 cs.AI

What We Don't C: Manifold Disentanglement for Structured Discovery

The paper introduces "What We Don't C," a novel latent flow matching approach that disentangles latent subspaces by explicitly removing information from conditional guidance to generate meaningful residual representations, thereby enabling the discovery and analysis of factors of variation not captured in the conditioning variables.

Brian Rogers, Micah Bowles, Chris J. Lintott, Steve Croft, Oliver N. F. King, James Kostas Ray2026-03-12🤖 cs.AI

D-GAP: Improving Out-of-Domain Robustness via Dataset-Agnostic and Gradient-Guided Augmentation in Frequency and Pixel Spaces

The paper proposes D-GAP, a dataset-agnostic and gradient-guided augmentation method that adaptively blends frequency amplitudes and pixel values to reduce domain-specific learning biases and restore spatial details, thereby significantly improving out-of-domain robustness in computer vision models.

Ruoqi Wang, Haitao Wang, Shaojie Guo, Qiong Luo2026-03-12🤖 cs.AI

STREAM-VAE: Dual-Path Routing for Slow and Fast Dynamics in Vehicle Telemetry Anomaly Detection

This paper introduces STREAM-VAE, a dual-path variational autoencoder that separates slow drifts and fast spikes in vehicle telemetry data to overcome the limitations of standard reconstruction-based methods and achieve robust anomaly detection across diverse operating modes.

Kadir-Kaan Özer, René Ebeling, Markus Enzweiler2026-03-12🤖 cs.LG

REMSA: Foundation Model Selection for Remote Sensing via a Constraint-Aware Agent

This paper introduces REMSA, a constraint-aware agent built upon the newly constructed RSFM Database (RS-FMD) that automates the selection of suitable remote sensing foundation models from natural language queries by integrating structured metadata retrieval with task-driven decision workflows, achieving superior performance over baselines in a novel expert-verified benchmark.

Binger Chen, Tacettin Emre Bök, Behnood Rasti, Volker Markl, Begüm Demir2026-03-12🤖 cs.AI

Hierarchical Dual-Strategy Unlearning for Biomedical and Healthcare Intelligence Using Imperfect and Privacy-Sensitive Medical Data

This paper proposes a hierarchical dual-strategy framework that achieves precise selective unlearning of privacy-sensitive medical knowledge in large language models while preserving fundamental competencies, demonstrated by high forgetting and preservation rates on clinical datasets with minimal parameter modification.

Yi Zhang, Chao Zhang, Zijian Li, Tianxiang Xu, Kunyu Zhang, Zhan Gao, Meinuo Li, Xiaohan Zhang, Qichao Qi, Bing Chen2026-03-12🤖 cs.LG

CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

This paper introduces CostNav, the first physics-grounded navigation benchmark that evaluates autonomous agents using real-world economic data to reveal that current methods, despite varying in hardware and architecture, all fail to achieve economic viability due to negative contribution margins.

Haebin Seong, Sungmin Kim, Yongjun Cho, Myunchul Joe, Geunwoo Kim, Yubeen Park, Sunhoo Kim, Yoonshik Kim, Suhwan Choi, Jaeyoon Jung, Jiyong Youn, Jinmyung Kwak, Sunghee Ahn, Jaemin Lee, Younggil Do, Seungyeop Yi, Woojin Cheong, Minhyeok Oh, Minchan Kim, Seongjae Kang, Samwoo Seong, Youngjae Yu, Yunsung Lee2026-03-12🤖 cs.AI

IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

This paper introduces IndiMathBench, a human-verified benchmark of 312 formal Lean 4 theorems derived from Indian Mathematics Olympiads, which utilizes an AI-powered human-assisted pipeline to address the scarcity of high-quality training data and reveals significant challenges in current autoformalization and theorem proving capabilities.

Param Biyani, Shashank Kirtania, Yasharth Bajpai, Sumit Gulwani, Ashish Tiwari2026-03-12🤖 cs.AI

← Previous Next →

cs.AI