cs.LG papers | Gist.Science

LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks

The paper introduces LoRA-Ensemble, a parameter-efficient method that leverages Low-Rank Adaptation to create an implicit ensemble for self-attention networks, achieving superior calibration and accuracy comparable to explicit ensembles while significantly reducing computational and memory costs.

Dominik J. Mühlematter, Michelle Halbheer, Alexander Becker, Dominik Narnhofer, Helge Aasen, Konrad Schindler, Mehmet Ozgur Turkoglu2026-03-10🤖 cs.LG

Fast Explanations via Policy Gradient-Optimized Explainer

This paper introduces Fast Explanation (FEX), a novel framework that utilizes policy gradient optimization to represent attribution-based explanations as probability distributions, achieving over 97% reduction in inference time and 70% less memory usage compared to traditional model-agnostic methods while maintaining high-quality, scalable explanations for image and text classification tasks.

Deng Pan, Nuno Moniz, Nitesh Chawla2026-03-10🤖 cs.LG

Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

This paper identifies a "corruption stage" in few-shot fine-tuned diffusion models caused by a narrowed learning distribution and proposes a Bayesian Neural Network approach with variational inference to broaden this distribution, thereby mitigating corruption and improving image fidelity, quality, and diversity without additional inference costs.

Xiaoyu Wu, Jiaru Zhang, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan2026-03-10🤖 cs.LG

DKDL-Net: A Lightweight Bearing Fault Detection Model via Decoupled Knowledge Distillation and Low-Rank Adaptation Fine-tuning

This paper proposes DKDL-Net, a lightweight bearing fault detection model that combines decoupled knowledge distillation and Low-Rank Adaptation fine-tuning to achieve state-of-the-art accuracy (99.48%) with significantly reduced computational complexity and parameter count compared to existing methods.

Ovanes Petrosian, Li Pengyi, He Yulong + 4 more2026-03-10🤖 cs.LG

Estimating Treatment Effects under Algorithmic Interference: A Structured Neural Networks Approach

This paper proposes a structured semiparametric framework combining an algorithm choice model and a viewer response model to correct the severe bias in standard estimators caused by algorithmic interference in two-sided marketplaces, thereby enabling accurate global treatment effect estimation for platform-wide algorithm updates.

Ruohan Zhan, Shichao Han, Yuchen Hu, Zhenling Jiang2026-03-10🤖 cs.LG

LAMBDA: A Large Model Based Data Agent

LAMBDA is a novel, open-source, code-free multi-agent system that leverages large language models with collaborative programmer and inspector roles, along with a knowledge integration mechanism and user intervention capabilities, to enhance the accessibility and efficiency of data analysis for diverse users.

Maojun Sun, Ruijian Han, Binyan Jiang + 4 more2026-03-10🤖 cs.AI

OTAD: An Optimal Transport-Induced Robust Model for Agnostic Adversarial Attack

The paper proposes OTAD, a novel two-step defense framework that combines optimal transport theory with convex integration to train deep neural networks that achieve both high accuracy and certified local Lipschitz robustness against adversarial attacks.

Kuo Gai, Sicong Wang, Shihua Zhang2026-03-10🤖 cs.LG

Mini-batch Estimation for Deep Cox Models: Statistical Foundations and Practical Guidance

This paper establishes the statistical foundations of the mini-batch maximum partial-likelihood estimator (mb-MPLE) for deep Cox models optimized via stochastic gradient descent, proving its consistency and optimal convergence rates while providing practical guidance on hyperparameter tuning and demonstrating its effectiveness in large-scale applications where standard estimation is intractable.

Lang Zeng, Weijing Tang, Zhao Ren, Ying Ding2026-03-10🤖 cs.LG

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

This paper proposes a novel Variational Learning framework for Gaussian Process Latent Variable Models that utilizes Stochastic Gradient Annealed Importance Sampling to overcome proposal distribution challenges in high-dimensional spaces, achieving tighter variational bounds and superior performance compared to state-of-the-art methods.

Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng, John Paisley2026-03-10🤖 cs.LG

The Z-Gromov-Wasserstein Distance

This paper introduces the $Z$ -Gromov-Wasserstein distance as a unified framework for comparing $Z$ -networks (measure spaces with $Z$ -valued kernels), proving that it forms a metric space with desirable topological properties while providing computable bounds for practical applications.

Martin Bauer, Facundo Mémoli, Tom Needham + 1 more2026-03-10🤖 cs.LG

From Model Explanation to Data Misinterpretation: A Cautionary Analysis of Post Hoc Explainers in Business Research

This paper cautions against treating post hoc explainers like SHAP and LIME as definitive evidence for underlying data relationships in business research, demonstrating through a systematic review and simulation that their explanations often misalign with true data-generating processes due to feature correlation and the Rashomon effect, and thus should be used only as exploratory tools rather than for hypothesis validation.

Tong Wang (Jeffrey), Ronilo Ragodos (Jeffrey), Lu Feng (Jeffrey), Yu (Jeffrey), Hu2026-03-10🤖 cs.LG

Reconsidering the energy efficiency of spiking neural networks

This paper challenges the prevailing assumption of Spiking Neural Networks' inherent energy superiority by introducing a rigorous, fair-comparison framework that reveals SNNs only outperform Quantized ANNs under specific low-spike-rate conditions, while demonstrating that such optimized SNNs could nearly double the battery life of devices like smartwatches.

Zhanglu Yan, Zhenyu Bai, Weng-Fai Wong2026-03-10🤖 cs.LG

Input-to-State Stable Coupled Oscillator Networks for Closed-form Model-based Control in Latent Space

This paper introduces a novel Coupled Oscillator Network (CON) model that overcomes key limitations in latent-space control by ensuring Lagrangian structure, global input-to-state stability, and an invertible input-force mapping, thereby enabling efficient closed-form control strategies for complex mechanical systems using only raw visual feedback.

Maximilian Stölzle, Cosimo Della Santina2026-03-10🤖 cs.LG

xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing

The paper proposes xTED, a flexible cross-domain adaptation framework that utilizes a diffusion model to edit and transform source domain trajectories into target domain distributions at the data level, thereby bridging domain gaps and enhancing policy learning performance without requiring complex domain-specific modeling.

Haoyi Niu, Qimao Chen, Tenglong Liu, Jianxiong Li, Guyue Zhou, Yi Zhang, Jianming Hu, Xianyuan Zhan2026-03-10🤖 cs.LG

BNEM: A Boltzmann Sampler Based on Bootstrapped Noised Energy Matching

This paper introduces BNEM, a robust diffusion-based sampler that learns from energy functions via bootstrapped noised energy matching to efficiently generate independent samples from Boltzmann distributions, outperforming existing methods on complex molecular dynamics benchmarks.

RuiKang OuYang, Bo Qiang, José Miguel Hernández-Lobato2026-03-10🤖 cs.LG

Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action

This paper establishes that policy gradient methods achieve global convergence with non-asymptotic sample complexity guarantees for finite-horizon MDPs with general state and action spaces by proving the Polyak-Łojasiewicz-Kurdyka condition holds, thereby providing the first theoretical foundations for optimizing multi-period inventory and stochastic cash balance systems.

Xin Chen, Yifan Hu, Minda Zhao2026-03-10🤖 cs.LG

Neural delay differential equations: learning non-Markovian closures for partially known dynamical systems

This paper introduces a constant-lag Neural Delay Differential Equations (NDDEs) framework, inspired by the Mori-Zwanzig formalism, to effectively learn non-Markovian dynamics from partially observed data by identifying memory effects through time delays, demonstrating superior performance over existing methods like LSTMs and ANODEs across synthetic, chaotic, and experimental datasets.

Thibault Monsel, Onofrio Semeraro, Lionel Mathelin, Guillaume Charpiat2026-03-10🤖 cs.LG

Open-World Reinforcement Learning over Long Short-Term Imagination

This paper introduces LS-Imagine, a novel approach that enhances open-world reinforcement learning by constructing a long short-term world model with goal-conditioned jumpy transitions and affordance maps, thereby enabling agents to efficiently explore vast state spaces and optimize for long-horizon rewards, as demonstrated by significant improvements in MineDojo.

Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang2026-03-10🤖 cs.LG

How Learning Dynamics Drive Adversarially Robust Generalization?

This paper introduces a PAC-Bayesian framework modeling adversarial training with momentum SGD as a discrete-time dynamical system to derive time-resolved generalization bounds that mechanistically explain robust overfitting and reveal the trade-offs in adversarial weight perturbation design.

Yuelin Xu, Xiao Zhang2026-03-10🤖 cs.LG

Transformers as Implicit State Estimators: In-Context Learning in Dynamical Systems

This paper demonstrates that frozen transformers, when used in an in-context learning setting, can implicitly infer hidden states to accurately predict the outputs of both linear and nonlinear dynamical systems from noisy observations, achieving performance comparable to optimal and heuristic Bayesian filters without requiring test-time gradient updates or explicit knowledge of the system model.

Usman Akram, Haris Vikalo2026-03-10🤖 cs.LG

← Previous Next →