cs.LG papers | Gist.Science

Information Theoretic Bayesian Optimization over the Probability Simplex

This paper introduces $\alpha$ -GaBO, a novel family of Bayesian optimization algorithms that leverages information geometry to construct Matérn kernels and geometric optimizers tailored for the probability simplex, demonstrating superior performance over constrained Euclidean approaches in optimizing mixtures and robotic control tasks.

Federico Pavesi, Antonio Candelieri, Noémie Jaquier2026-03-11🤖 cs.LG

Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning

This paper introduces In-Context RLVR, a method that leverages a model's own in-context learning ability to measure "Demonstration Utility" via Evidence Gain, thereby implicitly reweighting rewards to prioritize high-quality reasoning traces over merely correct but flawed solutions during Reinforcement Learning with Verifiable Rewards training.

Tiehua Mei, Minxuan Lv, Leiyu Pan, Zhenpeng Su, Hongru Hou, Hengrui Chen, Ao Xu, Deqing Yang2026-03-11🤖 cs.LG

Correction of Transformer-Based Models with Smoothing Pseudo-Projector

This paper introduces the smoothing pseudo-projector, a lightweight, multigrid-inspired module that corrects hidden representations in transformer-based models to suppress noise from label-irrelevant inputs, thereby improving training dynamics and robustness without altering the core architecture.

Vitaly Bulgakov2026-03-11🤖 cs.AI

A Unified Hierarchical Multi-Task Multi-Fidelity Framework for Data-Efficient Surrogate Modeling in Manufacturing

This paper proposes a novel hierarchical multi-task multi-fidelity (H-MT-MF) framework for Gaussian process-based surrogate modeling that unifies inter-task information sharing and fidelity-dependent uncertainty handling to significantly improve prediction accuracy and data efficiency in manufacturing systems with heterogeneous data sources.

Manan Mehta, Zhiqiao Dong, Yuhang Yang, Chenhui Shao2026-03-11🤖 cs.LG

A Graph-Based Approach to Spectrum Demand Prediction Using Hierarchical Attention Networks

This paper introduces HR-GAT, a hierarchical resolution graph attention network that leverages geospatial data to predict spectrum demand with 21% higher accuracy than baseline models, effectively addressing spatial autocorrelation challenges to enable more efficient spectrum sharing and policy-making.

Mohamad Alkadamani, Halim Yanikomeroglu, Amir Ghasemi2026-03-11🤖 cs.AI

GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection

The paper proposes GAST, a novel Parameter-Efficient Fine-Tuning method that unifies data-layer selection and layer-sparse strategies to adaptively match impactful data points with specific model layers, thereby overcoming the limitations of existing single-dimension approaches and achieving superior performance.

Kai Yao, Zhenghan Song, Kaixin Wu, Mingjie Zhong, Danzhao Cheng, Zhaorui Tan, Yixin Ji, Penglei Gao2026-03-11🤖 cs.LG

CarbonBench: A Global Benchmark for Upscaling of Carbon Fluxes Using Zero-Shot Learning

The paper introduces CarbonBench, the first standardized benchmark comprising over 1.3 million global observations from 567 sites, designed to rigorously evaluate and compare zero-shot spatial transfer learning methods for upscaling terrestrial carbon fluxes across diverse, unseen ecosystems and climate regimes.

Aleksei Rozanov, Arvind Renganathan, Yimeng Zhang, Vipin Kumar2026-03-11🤖 cs.LG

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

The paper proposes MSSR, a memory-aware adaptive replay framework that estimates sample-level memory strength to dynamically schedule rehearsal intervals, effectively mitigating catastrophic forgetting while maintaining fast adaptation in continual LLM fine-tuning.

Yiyang Lu, Yu He, Jianlong Chen, Hongyuan Zha2026-03-11🤖 cs.AI

OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

This paper introduces OptEMA, a novel adaptive Exponential Moving Average optimizer that achieves nearly optimal convergence rates in both stochastic and zero-noise regimes without requiring prior knowledge of Lipschitz constants or manual hyperparameter tuning.

Ganzhao Yuan2026-03-11🤖 cs.LG

Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

This paper establishes that generative drifting is theoretically equivalent to score matching under Gaussian kernels, providing a spectral and variational framework that explains the empirical superiority of Laplacian kernels, proposes an exponential bandwidth annealing schedule to accelerate convergence, and proves the necessity of the stop-gradient operator through its connection to Wasserstein gradient flows.

Erkan Turan, Maks Ovsjanikov2026-03-11🤖 cs.LG

SignalMC-MED: A Multimodal Benchmark for Evaluating Biosignal Foundation Models on Single-Lead ECG and PPG

The paper introduces SignalMC-MED, a comprehensive benchmark utilizing 22,256 synchronized single-lead ECG and PPG visits to evaluate biosignal foundation models across 20 clinical tasks, demonstrating that domain-specific models with multimodal fusion and full-duration signals outperform general time-series approaches while revealing that larger model sizes do not guarantee superior performance.

Fredrik K. Gustafsson, Xiao Gu, Mattia Carletti, Patitapaban Palo, David W. Eyre, David A. Clifton2026-03-11🤖 cs.LG

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

This paper introduces the Overfitting-Underfitting Indicator (OUI) as an efficient, early-stage metric based on hidden neuron activation patterns to distinguish optimal learning rates in PPO actor-critic training, demonstrating its superior ability to prune unpromising runs compared to traditional criteria by revealing distinct structural signatures in actor and critic networks.

Alberto Fernández-Hernández, Cristian Pérez-Corral, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí2026-03-11🤖 cs.AI

Towards a Neural Debugger for Python

This paper introduces "neural debuggers," a new class of language models that emulate traditional debugging interactions like setting breakpoints and stepping through code to enable both forward and inverse execution prediction, thereby laying the foundation for more powerful agentic coding systems and automated debugging.

Maximilian Beck, Jonas Gehring, Jannik Kossen, Gabriel Synnaeve2026-03-11🤖 cs.AI

On the Width Scaling of Neural Optimizers Under Matrix Operator Norms I: Row/Column Normalization and Hyperparameter Transfer

This paper introduces a family of mean-normalized matrix operator norms to derive width-independent smoothness bounds for deep neural networks, leading to the development of MOGA, a row/column-normalized optimizer that enables stable hyperparameter transfer across model widths and outperforms Muon in speed while maintaining competitive performance.

Ruihan Xu, Jiajin Li, Yiping Lu2026-03-11🤖 cs.LG

From Semantics to Pixels: Coarse-to-Fine Masked Autoencoders for Hierarchical Visual Understanding

The paper proposes C2FMAE, a coarse-to-fine masked autoencoder that resolves the tension between global semantics and local details in self-supervised learning by employing a cascaded decoder and progressive masking curriculum on a newly constructed multi-granular dataset to achieve hierarchical visual understanding and superior performance across various vision tasks.

Wenzhao Xiang, Yue Wu, Hongyang Yu, Feng Gao, Fan Yang, Xilin Chen2026-03-11🤖 cs.LG

Think Before You Lie: How Reasoning Improves Honesty

This paper demonstrates that reasoning consistently enhances honesty in large language models by guiding them through a representational space where deceptive states are metastable and easily disrupted, ultimately nudging the models toward more stable, honest defaults.

Ann Yuan, Asma Ghandeharioun, Carter Blum, Alicia Machado, Jessica Hoffmann, Daphne Ippolito, Martin Wattenberg, Lucas Dixon, Katja Filippova2026-03-11🤖 cs.AI

From Data Statistics to Feature Geometry: How Correlations Shape Superposition

This paper challenges the standard view of superposition in neural networks by demonstrating that, unlike in idealized uncorrelated settings where interference is merely noise, realistic feature correlations allow models to arrange features so that interference becomes constructive, thereby naturally forming the semantic clusters and cyclical structures observed in real language models.

Lucas Prieto, Edward Stevinson, Melih Barsbey, Tolga Birdal, Pedro A. M. Mediano2026-03-11🤖 cs.AI

Task Aware Modulation Using Representation Learning for Upsaling of Terrestrial Carbon Fluxes

The paper introduces Task-Aware Modulation with Representation Learning (TAM-RL), a novel framework that combines spatio-temporal representation learning with physically grounded constraints to significantly improve the accuracy and generalizability of global terrestrial carbon flux estimates compared to existing state-of-the-art methods.

Aleksei Rozanov, Arvind Renganathan, Vipin Kumar2026-03-11🤖 cs.LG

Online Neural Networks for Change-Point Detection

This paper introduces two online neural network-based algorithms for change-point detection in large time series that demonstrate linear computational complexity, outperform existing methods on various datasets, and are proven to converge to optimal solutions under specific conditions.

Mikhail Hushchyn, Kenenbek Arzymatov, Denis Derkach2026-03-10🤖 cs.LG

Accounting for shared covariates in semi-parametric Bayesian additive regression trees

This paper proposes a novel extension to semi-parametric Bayesian additive regression trees (BART) that resolves non-identifiability and bias issues by modifying tree-generation moves to allow shared covariates between linear and non-parametric components, thereby enabling the modeling of complex interactions while maintaining competitive performance across simulation and real-world applications.

Estevão B. Prado, Andrew C. Parnell, Keefe Murphy + 3 more2026-03-10🤖 cs.LG

← Previous Next →