cs.LG papers | Gist.Science

PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations

This paper introduces PvP, a proprioceptive-privileged contrastive learning framework that enhances data efficiency and robustness in humanoid robot whole-body control by learning compact task-relevant representations without hand-crafted augmentations, supported by the new SRL4Humanoid evaluation framework.

Mingqi Yuan, Tao Yu, Haolin Song, Bo Li, Xin Jin, Hua Chen, Wenjun Zeng2026-03-12🤖 cs.LG

Pretrained battery transformer (PBT): A foundation model for universal battery life prediction

This paper introduces the Pretrained Battery Transformer (PBT), a foundation model that leverages battery-knowledge-encoded mixture-of-experts layers to overcome data scarcity and heterogeneity, achieving state-of-the-art universal battery life prediction across diverse chemistries and conditions.

Ruifeng Tan, Weixiang Hong, Jia Li, Jiaqiang Huang, Tong-Yi Zhang2026-03-12🤖 cs.LG

NMIRacle: Multi-modal Generative Molecular Elucidation from IR and NMR Spectra

NMIRacle is a novel two-stage generative framework that integrates IR and NMR spectra with count-aware fragment representations to accurately elucidate molecular structures, outperforming existing baselines across varying levels of complexity.

Federico Ottomano, Yingzhen Li, Alex M. Ganose2026-03-12🔬 physics

Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures

This paper presents a unifying theoretical framework demonstrating that gradient descent in diverse neural network architectures exhibits a simplicity bias by following saddle-to-saddle dynamics, which iteratively evolve near invariant manifolds to progressively learn solutions of increasing complexity such as higher rank, more kinks, or additional kernels and attention heads.

Yedi Zhang, Andrew Saxe, Peter E. Latham2026-03-12🤖 cs.LG

Data relativistic uncertainty framework for low-illumination anime scenery image enhancement

This paper addresses the scarcity of data and the domain gap in low-illumination anime scenery enhancement by introducing a Data Relativistic Uncertainty (DRU) framework that quantifies illumination uncertainty to dynamically recalibrate learning objectives, achieving superior perceptual and aesthetic results compared to state-of-the-art methods.

Yiquan Gao, John See2026-03-12🤖 cs.LG

The Bayesian Geometry of Transformer Attention

This paper introduces "Bayesian wind tunnels" to rigorously demonstrate that small transformers perform exact Bayesian inference through a specific geometric mechanism involving residual streams as belief substrates and attention-based routing, a capability that capacity-matched MLPs fundamentally lack.

Naman Agarwal, Siddhartha R. Dalal, Vishal Misra2026-03-12📊 stat

Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

This paper provides a first-order analysis demonstrating that cross-entropy training in transformers induces a coupled specialization of attention routing and value updates—functioning as a two-timescale EM procedure—that sculpts low-dimensional Bayesian manifolds, thereby explaining how gradient-based optimization enables precise probabilistic reasoning.

Naman Agarwal, Siddhartha R. Dalal, Vishal Misra2026-03-12📊 stat

Geometric Scaling of Bayesian Inference in LLMs

This paper demonstrates that production-grade language models preserve a low-dimensional geometric substrate, specifically an entropy-aligned axis in their last-layer value representations, which encodes Bayesian posterior structures and serves as a privileged readout for uncertainty, even though it is not a singular computational bottleneck for inference.

Naman Agarwal, Siddhartha R. Dalal, Vishal Misra2026-03-12🤖 cs.LG

Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning

The paper introduces PanSubNet, an interpretable deep learning framework that accurately predicts clinically relevant basal-like and classical molecular subtypes of pancreatic ductal adenocarcinoma directly from routine H&E-stained histology slides, offering a cost-effective and rapid alternative to traditional RNA-seq-based methods for precision oncology.

Abdul Rehman Akbar, Alejandro Levya, Ashwini Esnakula, Elshad Hasanov, Anne Noonan, Lingbin Meng, Susan Tsai, Vaibhav Sahai, Midhun Malla, Sarbajit Mukherjee, Upender Manne, Anil Parwani, Wei Chen, Ashish Manne, Muhammad Khalid Khan Niazi2026-03-12⚡ eess

Over-Searching in Search-Augmented Large Language Models

This paper systematically evaluates the phenomenon of "over-searching" in search-augmented large language models, where unnecessary tool invocation harms efficiency and accuracy, and proposes the Tokens Per Correctness (TPC) metric along with mitigation strategies to address this issue.

Roy Xie, Deepak Gopinath, David Qiu, Dong Lin, Haitian Sun, Saloni Potdar, Bhuwan Dhingra2026-03-12🤖 cs.LG

Sampling via Stochastic Interpolants by Langevin-based Velocity and Initialization Estimation in Flow ODEs

This paper proposes a novel sampling method for unnormalized Boltzmann densities that leverages a sequence of Langevin samplers to efficiently simulate a probability flow ODE derived from linear stochastic interpolants by generating intermediate samples and robustly estimating the velocity field, while providing theoretical convergence guarantees and demonstrating effectiveness on challenging multimodal distributions and Bayesian inference tasks.

Chenguang Duan, Yuling Jiao, Gabriele Steidl, Christian Wald, Jerry Zhijian Yang, Ruizhe Zhang2026-03-12📊 stat

Error Analysis of Bayesian Inverse Problems with Generative Priors

This paper presents a theoretical analysis establishing quantitative error bounds for Bayesian inverse problems using generative priors, demonstrating that the posterior error inherits the convergence rate of the prior in Wasserstein distance, and validates these findings through numerical experiments on benchmarks and an elliptic PDE inverse problem.

Bamdad Hosseini, Ziqi Huang2026-03-12📊 stat

Time series forecasting with Hahn Kolmogorov-Arnold networks

The paper introduces HaKAN, a lightweight and interpretable time series forecasting model that leverages Hahn polynomial-based Kolmogorov-Arnold Networks (KANs) with channel independence and patching to effectively capture both global and local temporal patterns, outperforming recent state-of-the-art Transformer and MLP-based methods.

Md Zahidul Hasan, A. Ben Hamza, Nizar Bouguila2026-03-12📊 stat

Breaking the Stochasticity Barrier: An Adaptive Variance-Reduced Method for Variational Inequalities

This paper proposes VR-SDA-A, a novel variance-reduced algorithm that overcomes the stochasticity barrier in non-convex non-concave variational inequalities by integrating recursive momentum with a same-batch curvature verification mechanism, thereby achieving optimal O(ε⁻³) oracle complexity while enabling automated step-size adaptation.

Yungi Jeong, Takumi Otsuka2026-03-12🤖 cs.LG

Singular Bayesian Neural Networks

This paper proposes Singular Bayesian Neural Networks, which parameterize weights as low-rank products to induce a singular posterior that captures structured correlations, thereby achieving competitive predictive performance and improved uncertainty calibration with significantly fewer parameters and tighter generalization bounds compared to standard mean-field approaches.

Mame Diarra Toure, David A. Stephens2026-03-12📊 stat

Emergence of Distortions in High-Dimensional Guided Diffusion Models

This paper formalizes the loss of diversity in classifier-free guidance as "generative distortion," characterizes its emergence via a high-dimensional phase transition using statistical physics, and proposes a novel guidance schedule with a negative-guidance window to mitigate variance shrinkage while preserving class separability.

Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello2026-03-12📊 stat

Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing

This paper establishes a rate-distortion theorem demonstrating that hallucinations in large language models are an inevitable consequence of information-theoretic optimal memory compression when storing sparse facts, forcing the model to confidently assign high scores to non-facts rather than abstain.

Anxin Guo, Jingwei Li2026-03-12💬 cs.CL

Grounding Generated Videos in Feasible Plans via World Models

The paper proposes GVP-WM, a planning method that leverages learned action-conditioned world models to ground zero-shot video-generated plans into dynamically feasible action sequences by optimizing latent trajectories that satisfy physical constraints while preserving semantic alignment with the original video.

Christos Ziakas, Amir Bar, Alessandra Russo2026-03-12🤖 cs.LG

Expert-Data Alignment Governs Generation Quality in Decentralized Diffusion Models

This paper challenges the assumption that numerical stability governs generation quality in Decentralized Diffusion Models, demonstrating instead that aligning routing decisions with the experts whose training data best matches the current denoising state is the critical factor for achieving high-quality outputs.

Marcos Villagra, Bidhan Roy, Raihan Seraj, Zhiying Jiang2026-03-12🤖 cs.LG

A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization

This paper proposes a Contextual Thompson Sampling approach for educational recommender systems that leverages learner data to generate personalized exercise sequences, effectively optimizing skill gain and enabling scalable, adaptive instruction in digital learning environments.

Lukas De Kerpel, Arthur Thuy, Dries F. Benoit2026-03-12📊 stat

← Previous Next →