cs.AI papers | Gist.Science

IntrinsicWeather: Controllable Weather Editing in Intrinsic Space

IntrinsicWeather is a diffusion-based framework that achieves controllable weather editing by decomposing images into intrinsic maps (geometry, material, and lighting) for enhanced spatial control and utilizing CLIP-space interpolation for fine-grained weather manipulation, outperforming existing methods on both synthetic and real-world datasets.

Yixin Zhu, Zuo-Liang Zhu, Jian Yang + 3 more2026-03-12🤖 cs.AI

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

This paper reveals that the Key-Value (KV) cache used to accelerate Large Language Model inference is vulnerable to privacy attacks that allow attackers to reconstruct sensitive user inputs, and it proposes KV-Cloak, a lightweight and efficient obfuscation defense that effectively prevents such leakage without compromising model accuracy or performance.

Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan Qin2026-03-12💬 cs.CL

The Yokai Learning Environment: Tracking Beliefs Over Space and Time

This paper introduces the Yokai Learning Environment (YLE), a new open-source benchmark for zero-shot coordination that overcomes the saturation of the Hanabi Learning Environment by requiring agents to track moving cards and reason under ambiguous hints, thereby revealing that current state-of-the-art methods fail to maintain consistent internal models when paired with unseen partners.

Constantin Ruhdorfer, Matteo Bortoletto, Johannes Forkel, Jakob Foerster, Andreas Bulling2026-03-12🤖 cs.AI

From Next Token Prediction to (STRIPS) World Models

This paper investigates whether next-token prediction can learn symbolic STRIPS world models for planning, finding that while a specialized STRIPS Transformer offers theoretical alignment, a standard transformer with stick-breaking attention achieves superior training accuracy and generalization, enabling effective planning across unseen states and goals.

Carlos Núñez-Molina, Vicenç Gómez, Hector Geffner2026-03-12🤖 cs.AI

Global Minimizers of Sigmoid Contrastive Loss

This paper theoretically characterizes the global minimizers of sigmoid contrastive loss as $(\mathsf{m}, \mathsf{b}_{\mathsf{rel}})$ -Constellations, providing a rigorous explanation for the success of SigLIP models, the origin of the modality gap, and the necessary dimensionality for high-quality representations while proposing an improved reparameterization for training dynamics.

Kiril Bangachev, Guy Bresler, Iliyas Noman, Yury Polyanskiy2026-03-12🤖 cs.LG

RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

RADAR is a lightweight, interpretable routing framework that optimizes the performance-cost tradeoff for reasoning LLMs by leveraging psychometric-inspired item response modeling to dynamically match query difficulties with appropriate model-budget pairs across diverse benchmarks.

Nigel Fernandez, Branislav Kveton, Ryan A. Rossi, Andrew S. Lan, Zichao Wang2026-03-12🤖 cs.AI

BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

This paper introduces a benchmark to reveal significant tool-selection bias in large language models driven by metadata alignment and pre-training exposure, and proposes a lightweight filtering-and-sampling strategy to mitigate these fairness issues while maintaining task coverage.

Thierry Blankenstein, Jialin Yu, Zixuan Li, Vassilis Plachouras, Sunando Sengupta, Philip Torr, Yarin Gal, Alasdair Paren, Adel Bibi2026-03-12🤖 cs.AI

MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations

This paper introduces MonitorVLM, a novel vision-language framework that leverages a specialized mining dataset and innovative modules for clause filtering and behavior magnification to significantly outperform baseline models in automatically detecting safety violations from surveillance video streams in mining operations.

Jiang Wu, Sichao Wu, Yinsong Ma, Guangyuan Yu, Haoyuan Xu, Lifang Zheng, Jingliang Duan2026-03-12🤖 cs.AI

A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG

This paper presents the first systematic evaluation of self-supervised learning for label-efficient sleep staging using wearable EEG, demonstrating that a specialized SSL pipeline significantly outperforms supervised baselines and general-purpose foundation models by achieving clinical-grade accuracy with only 5–10% of labeled data.

Emilio Estevan, María Sierra-Torralba, Eduardo López-Larraz, Luis Montesano2026-03-12🤖 cs.AI

HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection

The paper proposes HyWA, a novel Personalized Voice Activity Detection (PVAD) approach that utilizes a hypernetwork to generate personalized weights for selected layers of a standard VAD model, demonstrating consistent performance improvements and enhanced deployment flexibility compared to existing speaker-conditioning methods.

Mahsa Ghazvini Nejad, Hamed Jafarzadeh Asl, Amin Edraki, Mohammadreza Sadeghi, Masoud Asgharian, Yuanhao Yu, Vahid Partovi Nia2026-03-12⚡ eess

Reveal-to-Revise: Explainable Bias-Aware Generative Modeling with Multimodal Attention

This paper introduces "Reveal-to-Revise," an explainable, bias-aware generative framework that unifies cross-modal attention, Grad-CAM++ attribution, and iterative feedback to achieve state-of-the-art performance and fairness in multimodal image generation and text classification tasks.

Noor Islam S. Mohammad, Md Muntaqim Meherab2026-03-12🤖 cs.LG

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

The paper introduces MVCustom, a novel diffusion-based framework that unifies multi-view camera pose control and prompt-based customization by leveraging a feature-field representation for training and employing depth-aware rendering with consistent latent completion during inference to ensure both geometric consistency and subject fidelity.

Minjung Shin, Hyunin Cho, Sooyeon Go, Jin-Hwa Kim, Youngjung Uh2026-03-12🤖 cs.AI

Predicting kernel regression learning curves from only raw data statistics

This paper introduces the Hermite eigenstructure ansatz (HEA), a theoretical framework that accurately predicts kernel regression learning curves on real datasets using only the empirical data covariance and target function decomposition, by approximating kernel eigenstructures as Hermite polynomials and demonstrating that MLPs in the feature-learning regime follow similar learning patterns.

Dhruva Karkada, Joseph Turnbull, Yuxi Liu, James B. Simon2026-03-12🤖 cs.LG

KV Cache Transform Coding for Compact Storage in LLM Inference

KVTC is a lightweight, model-agnostic transform coder that achieves up to 20 $\times$ (or higher) compression of Key-Value caches for large language models by combining PCA-based decorrelation, adaptive quantization, and entropy coding, thereby enabling memory-efficient serving with reusable caches while maintaining high reasoning and long-context accuracy.

Konrad Staniszewski, Adrian Łancucki2026-03-12💬 cs.CL

Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study

This study evaluates the ability of six LLM-based systems to answer expert-level questions about high-temperature superconductivity using a curated database of 1,726 papers, finding that retrieval-augmented generation (RAG) systems outperform closed models in providing comprehensive, well-supported answers while highlighting both the potential and current limitations of LLMs in specialized scientific domains.

Haoyu Guo, Maria Tikhanovskaya, Paul Raccuglia + 20 more2026-03-12🤖 cs.AI

DeepEyesV2: Toward Agentic Multimodal Model

This paper introduces DeepEyesV2, an agentic multimodal model that employs a two-stage training pipeline combining cold-start data curation and reinforcement learning to effectively integrate external tools like code execution and web search for complex real-world reasoning tasks.

Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, Xing Yu2026-03-12🤖 cs.AI

What We Don't C: Manifold Disentanglement for Structured Discovery

The paper introduces "What We Don't C," a novel latent flow matching approach that disentangles latent subspaces by explicitly removing information from conditional guidance to generate meaningful residual representations, thereby enabling the discovery and analysis of factors of variation not captured in the conditioning variables.

Brian Rogers, Micah Bowles, Chris J. Lintott, Steve Croft, Oliver N. F. King, James Kostas Ray2026-03-12🤖 cs.AI

D-GAP: Improving Out-of-Domain Robustness via Dataset-Agnostic and Gradient-Guided Augmentation in Frequency and Pixel Spaces

The paper proposes D-GAP, a dataset-agnostic and gradient-guided augmentation method that adaptively blends frequency amplitudes and pixel values to reduce domain-specific learning biases and restore spatial details, thereby significantly improving out-of-domain robustness in computer vision models.

Ruoqi Wang, Haitao Wang, Shaojie Guo, Qiong Luo2026-03-12🤖 cs.AI

STREAM-VAE: Dual-Path Routing for Slow and Fast Dynamics in Vehicle Telemetry Anomaly Detection

This paper introduces STREAM-VAE, a dual-path variational autoencoder that separates slow drifts and fast spikes in vehicle telemetry data to overcome the limitations of standard reconstruction-based methods and achieve robust anomaly detection across diverse operating modes.

Kadir-Kaan Özer, René Ebeling, Markus Enzweiler2026-03-12🤖 cs.LG

REMSA: Foundation Model Selection for Remote Sensing via a Constraint-Aware Agent

This paper introduces REMSA, a constraint-aware agent built upon the newly constructed RSFM Database (RS-FMD) that automates the selection of suitable remote sensing foundation models from natural language queries by integrating structured metadata retrieval with task-driven decision workflows, achieving superior performance over baselines in a novel expert-verified benchmark.

Binger Chen, Tacettin Emre Bök, Behnood Rasti, Volker Markl, Begüm Demir2026-03-12🤖 cs.AI

← Previous Next →

cs.AI