cs.LG papers | Gist.Science

Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

This paper provides a first-order analysis demonstrating that cross-entropy training in transformers induces a coupled specialization of attention routing and value updates—functioning as a two-timescale EM procedure—that sculpts low-dimensional Bayesian manifolds, thereby explaining how gradient-based optimization enables precise probabilistic reasoning.

Naman Agarwal, Siddhartha R. Dalal, Vishal Misra2026-03-12📊 stat

Geometric Scaling of Bayesian Inference in LLMs

This paper demonstrates that production-grade language models preserve a low-dimensional geometric substrate, specifically an entropy-aligned axis in their last-layer value representations, which encodes Bayesian posterior structures and serves as a privileged readout for uncertainty, even though it is not a singular computational bottleneck for inference.

Naman Agarwal, Siddhartha R. Dalal, Vishal Misra2026-03-12🤖 cs.LG

Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning

The paper introduces PanSubNet, an interpretable deep learning framework that accurately predicts clinically relevant basal-like and classical molecular subtypes of pancreatic ductal adenocarcinoma directly from routine H&E-stained histology slides, offering a cost-effective and rapid alternative to traditional RNA-seq-based methods for precision oncology.

Abdul Rehman Akbar, Alejandro Levya, Ashwini Esnakula, Elshad Hasanov, Anne Noonan, Lingbin Meng, Susan Tsai, Vaibhav Sahai, Midhun Malla, Sarbajit Mukherjee, Upender Manne, Anil Parwani, Wei Chen, Ashish Manne, Muhammad Khalid Khan Niazi2026-03-12⚡ eess

Over-Searching in Search-Augmented Large Language Models

This paper systematically evaluates the phenomenon of "over-searching" in search-augmented large language models, where unnecessary tool invocation harms efficiency and accuracy, and proposes the Tokens Per Correctness (TPC) metric along with mitigation strategies to address this issue.

Roy Xie, Deepak Gopinath, David Qiu, Dong Lin, Haitian Sun, Saloni Potdar, Bhuwan Dhingra2026-03-12🤖 cs.LG

Sampling via Stochastic Interpolants by Langevin-based Velocity and Initialization Estimation in Flow ODEs

This paper proposes a novel sampling method for unnormalized Boltzmann densities that leverages a sequence of Langevin samplers to efficiently simulate a probability flow ODE derived from linear stochastic interpolants by generating intermediate samples and robustly estimating the velocity field, while providing theoretical convergence guarantees and demonstrating effectiveness on challenging multimodal distributions and Bayesian inference tasks.

Chenguang Duan, Yuling Jiao, Gabriele Steidl, Christian Wald, Jerry Zhijian Yang, Ruizhe Zhang2026-03-12📊 stat

Error Analysis of Bayesian Inverse Problems with Generative Priors

This paper presents a theoretical analysis establishing quantitative error bounds for Bayesian inverse problems using generative priors, demonstrating that the posterior error inherits the convergence rate of the prior in Wasserstein distance, and validates these findings through numerical experiments on benchmarks and an elliptic PDE inverse problem.

Bamdad Hosseini, Ziqi Huang2026-03-12📊 stat

Time series forecasting with Hahn Kolmogorov-Arnold networks

The paper introduces HaKAN, a lightweight and interpretable time series forecasting model that leverages Hahn polynomial-based Kolmogorov-Arnold Networks (KANs) with channel independence and patching to effectively capture both global and local temporal patterns, outperforming recent state-of-the-art Transformer and MLP-based methods.

Md Zahidul Hasan, A. Ben Hamza, Nizar Bouguila2026-03-12📊 stat

Breaking the Stochasticity Barrier: An Adaptive Variance-Reduced Method for Variational Inequalities

This paper proposes VR-SDA-A, a novel variance-reduced algorithm that overcomes the stochasticity barrier in non-convex non-concave variational inequalities by integrating recursive momentum with a same-batch curvature verification mechanism, thereby achieving optimal O(ε⁻³) oracle complexity while enabling automated step-size adaptation.

Yungi Jeong, Takumi Otsuka2026-03-12🤖 cs.LG

Singular Bayesian Neural Networks

This paper proposes Singular Bayesian Neural Networks, which parameterize weights as low-rank products to induce a singular posterior that captures structured correlations, thereby achieving competitive predictive performance and improved uncertainty calibration with significantly fewer parameters and tighter generalization bounds compared to standard mean-field approaches.

Mame Diarra Toure, David A. Stephens2026-03-12📊 stat

Emergence of Distortions in High-Dimensional Guided Diffusion Models

This paper formalizes the loss of diversity in classifier-free guidance as "generative distortion," characterizes its emergence via a high-dimensional phase transition using statistical physics, and proposes a novel guidance schedule with a negative-guidance window to mitigate variance shrinkage while preserving class separability.

Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello2026-03-12📊 stat

Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing

This paper establishes a rate-distortion theorem demonstrating that hallucinations in large language models are an inevitable consequence of information-theoretic optimal memory compression when storing sparse facts, forcing the model to confidently assign high scores to non-facts rather than abstain.

Anxin Guo, Jingwei Li2026-03-12💬 cs.CL

Grounding Generated Videos in Feasible Plans via World Models

The paper proposes GVP-WM, a planning method that leverages learned action-conditioned world models to ground zero-shot video-generated plans into dynamically feasible action sequences by optimizing latent trajectories that satisfy physical constraints while preserving semantic alignment with the original video.

Christos Ziakas, Amir Bar, Alessandra Russo2026-03-12🤖 cs.LG

Expert-Data Alignment Governs Generation Quality in Decentralized Diffusion Models

This paper challenges the assumption that numerical stability governs generation quality in Decentralized Diffusion Models, demonstrating instead that aligning routing decisions with the experts whose training data best matches the current denoising state is the critical factor for achieving high-quality outputs.

Marcos Villagra, Bidhan Roy, Raihan Seraj, Zhiying Jiang2026-03-12🤖 cs.LG

A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization

This paper proposes a Contextual Thompson Sampling approach for educational recommender systems that leverages learner data to generate personalized exercise sequences, effectively optimizing skill gain and enabling scalable, adaptive instruction in digital learning environments.

Lukas De Kerpel, Arthur Thuy, Dries F. Benoit2026-03-12📊 stat

Universality of General Spiked Tensor Models

This paper establishes the universality of high-dimensional spectral behavior and statistical limits for asymmetric rank-one spiked tensor models with non-Gaussian noise, demonstrating that the maximum-likelihood estimator's performance matches the Gaussian case under finite fourth-moment assumptions.

Yanjin Xiang, Zhihua Zhang2026-03-12📊 stat

BLITZRANK: Principled Zero-shot Ranking Agents with Tournament Graphs

The paper introduces BLITZRANK, a principled zero-shot ranking framework that leverages tournament graphs to extract maximal information from $k$ -wise comparisons, achieving superior accuracy with significantly reduced token costs compared to existing methods.

Sheshansh Agrawal, Thien Hang Nguyen, Douwe Kiela2026-03-12🤖 cs.LG

Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

This paper introduces Fine-grained Group Policy Optimization (FGO), a reinforcement learning algorithm that effectively compresses verbose Chain-of-Thought reasoning in Large Language Models while simultaneously addressing the data inefficiency and entropy collapse limitations of Group Relative Policy Optimization (GRPO).

Xinchen Han, Hossam Afifi, Michel Marot, Xilu Wang, Lu Yin2026-03-12🤖 cs.LG

GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

The paper proposes GOT-JEPA, a model-predictive pretraining framework that learns to predict robust tracking models from corrupted observations to improve generalization, and introduces OccuSolver to enhance occlusion handling through iterative, object-aware visibility estimation.

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin2026-03-12🤖 cs.AI

LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy

The paper proposes LexiSafe, a theoretically grounded offline safe reinforcement learning framework that employs lexicographic prioritization to strictly enforce safety constraints while optimizing task performance, offering improved guarantees and empirical results over existing methods for safety-critical cyber-physical systems.

Hsin-Jung Yang, Zhanhong Jiang, Prajwal Koirala, Qisai Liu, Cody Fleming, Soumik Sarkar2026-03-12⚡ eess

ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging

The paper introduces ZACH-ViT, a compact Vision Transformer that eliminates positional embeddings and the [CLS] token to achieve permutation-invariant processing, demonstrating that this architecture is particularly effective for few-shot medical imaging tasks with weak spatial priors while remaining competitive on datasets with stronger anatomical structures.

Athanasios Angelakis2026-03-12⚡ eess

← Previous Next →