cs.LG papers | Gist.Science

Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD

This paper analyzes the convergence and escape dynamics of Stochastic Gradient Descent in one-dimensional landscapes, establishing that while SGD reliably converges to local minima, it may linger near local maxima depending on noise variance and geometry, with specific results provided for the probability of escaping sharp maxima to neighboring minima.

Dmitry Dudukalov, Artem Logachov, Vladimir Lotov + 3 more2026-03-05🤖 cs.LG

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change

This paper introduces the BAH dataset, a multimodal collection of 1,427 videos from 300 participants annotated for ambivalence and hesitancy recognition, alongside baseline benchmarking results that highlight the need for advanced models to support personalized digital health interventions.

Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan + 6 more2026-03-05🤖 cs.LG

SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

SafeDPO is a lightweight, theory-driven method that achieves provably optimal safety alignment in Large Language Models by deriving a closed-form solution for safety-constrained objectives, thereby eliminating the need for complex reward models or multi-stage pipelines while maintaining competitive helpfulness.

Geon-Hyeong Kim, Yu Jin Kim, Byoungjip Kim + 4 more2026-03-05🤖 cs.AI

Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models

This paper introduces TADA, a targeted diffusion-based augmentation framework that selectively generates synthetic images for hard-to-learn examples to improve classifier generalization with significantly reduced computational overhead compared to full-dataset augmentation.

Dang Nguyen, Jiping Li, Jinghao Zheng + 1 more2026-03-05🤖 cs.LG

A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning

This paper proposes a computationally efficient supervised filter based on a Gumbel-copula implied upper-tail concordance score to identify features that are simultaneously extreme with the positive class, demonstrating its effectiveness in ranking clinically relevant predictors for diabetes risk across large-scale and clinical datasets while outperforming standard filters and matching strong baselines.

Agnideep Aich, Md Monzur Murshed, Sameera Hewage + 1 more2026-03-05🤖 cs.LG

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

This paper introduces Supervised Calibration (SC), a loss-minimization framework that enhances In-Context Learning in Large Language Models by learning optimal per-class affine transformations to correct systematic biases and alter decision boundary orientations, thereby achieving state-of-the-art performance across multiple models and datasets.

Korel Gundem, Juncheng Dong, Dennis Zhang + 2 more2026-03-05🤖 cs.AI

An Approximation Theory Perspective on Machine Learning

This paper reviews the historical disconnect between approximation theory and machine learning practice, discusses emerging trends like deep networks and transformers, and introduces novel research enabling function approximation on unknown manifolds without requiring explicit manifold feature learning.

Hrushikesh N. Mhaskar, Efstratios Tsoukanis, Ameya D. Jagtap2026-03-05🤖 cs.LG

Structural Vibration Monitoring with Diffractive Optical Processors

This paper presents a low-power, cost-effective diffractive optical system that integrates a passive diffractive layer with a shallow neural network to remotely and accurately reconstruct 3D structural vibration spectra, overcoming the scalability and complexity limitations of traditional Structural Health Monitoring solutions.

Yuntian Wang, Zafer Yilmaz, Yuhang Li + 5 more2026-03-05🔬 physics.optics

AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization

The paper presents AutoQD, a theoretically grounded method that automatically discovers diverse, high-performing policies in continuous control tasks by generating behavioral descriptors through random Fourier feature embeddings of policy occupancy measures, thereby eliminating the need for hand-crafted descriptors in Quality-Diversity optimization.

Saeed Hedayatian, Stefanos Nikolaidis2026-03-05🤖 cs.AI

Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning

This paper introduces Conflict-Aware Evidential Deep Learning (C-EDL), a lightweight post-hoc method that enhances the robustness of uncertainty quantification against adversarial and out-of-distribution inputs by leveraging diverse task-preserving transformations to detect representational conflict and calibrate predictions without retraining.

Charmaine Barker, Daniel Bethell, Simos Gerasimou2026-03-05🤖 cs.AI

Honesty in Causal Forests: When It Helps and When It Hurts

This paper challenges the default use of honest estimation in causal forests, demonstrating through extensive benchmarking that while it prevents overfitting, it often increases underfitting and reduces the accuracy of individual-level treatment effect estimates, suggesting its application should be guided by specific goals and empirical evaluation rather than reflexive adoption.

Yanfang Hou, Carlos Fernández-Loría2026-03-05🤖 cs.LG

Federated ADMM from Bayesian Duality

This paper proposes a novel Bayesian framework that generalizes federated ADMM by leveraging variational inference duality, yielding both a theoretical unification of ADMM with Gaussian assumptions and practical, high-performance variants like Newton-like and Adam-like updates for diverse distribution families.

Thomas Möllenhoff, Siddharth Swaroop, Finale Doshi-Velez + 1 more2026-03-05🤖 cs.LG

On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy

This paper presents a theoretical framework demonstrating that standard sparse autoencoders generally fail to recover ground truth monosemantic features from superposed polysemantic ones, and proposes a reweighted variant (WSAE) with a derived selection principle that significantly improves feature recovery and interpretability.

Jingyi Cui, Qi Zhang, Yifei Wang + 1 more2026-03-05🤖 cs.LG

Context Biasing for Pronunciation-Orthography Mismatch in Automatic Speech Recognition

This paper proposes a novel context biasing method for automatic speech recognition that leverages user-provided on-the-fly corrections of substitution errors to effectively resolve pronunciation-orthography mismatches, achieving a 22% to 34% relative improvement in biased word error rates without compromising overall system performance.

Christian Huber, Alexander Waibel2026-03-05🤖 cs.LG

UMA: A Family of Universal Models for Atoms

Meta FAIR introduces UMA, a family of universal models for atoms trained on half a billion 3D structures using a novel mixture of linear experts architecture, which achieves state-of-the-art speed and accuracy across diverse chemical domains without requiring fine-tuning.

Brandon M. Wood, Misko Dzamba, Xiang Fu + 15 more2026-03-05🤖 cs.LG

UQLM: A Python Package for Uncertainty Quantification in Large Language Models

The paper introduces UQLM, a Python package that leverages state-of-the-art uncertainty quantification techniques to generate confidence scores for detecting hallucinations and enhancing the reliability of Large Language Model outputs.

Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik + 3 more2026-03-05🤖 cs.AI

Q-Guided Stein Variational Model Predictive Control via RL-informed Policy Prior

This paper proposes Q-SVMPC, a novel framework that integrates Q-guided Stein variational inference with an RL-informed policy prior to enable diverse, robust, and sample-efficient trajectory optimization in Model Predictive Control, overcoming the mode collapse limitations of existing learning-based MPC methods.

Shizhe Cai, Zeya Yin, Jayadeep Jacob + 1 more2026-03-05🤖 cs.AI

Fast Equivariant Imaging: Acceleration for Unsupervised Learning via Augmented Lagrangian and Auxiliary PnP Denoisers

This paper introduces Fast Equivariant Imaging (FEI), a novel unsupervised learning framework that leverages the Augmented Lagrangian method and auxiliary Plug-and-Play denoisers to achieve a 10x training acceleration and improved generalization for deep imaging tasks like X-ray CT reconstruction and inpainting without requiring ground-truth data.

Guixian Xu, Jinglai Li, Junqi Tang2026-03-05🤖 cs.LG

Knowing When to Quit: Probabilistic Early Exits for Speech Separation

This paper introduces a probabilistic early-exit framework for single-channel speech separation and enhancement that dynamically scales computational resources based on uncertainty-aware signal quality estimates, enabling efficient deployment on heterogeneous devices without compromising reconstruction performance.

Kenny Falkær Olsen, Mads Østergaard, Karl Ulbæk + 4 more2026-03-05🤖 cs.LG

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

This paper employs interpretability techniques on the off-by-one addition task to reveal that large language models achieve task-level generalization through a reusable "function induction" mechanism, where multiple attention heads collaboratively learn and compose abstract functions to solve unseen problems.

Qinyuan Ye, Robin Jia, Xiang Ren2026-03-05🤖 cs.AI

← Previous Next →