cs.LG papers | Gist.Science

Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments

This paper identifies that learning stagnation in PPO arises from poor sample-based loss estimates due to excessive step sizes relative to gradient noise, proposing that scaling to over one million parallel environments effectively mitigates this issue and enables monotonic performance improvements up to one trillion transitions.

Michael Beukman, Khimya Khetarpal, Zeyu Zheng, Will Dabney, Jakob Foerster, Michael Dennis, Clare Lyle2026-03-09🤖 cs.LG

Agnostic learning in (almost) optimal time via Gaussian surface area

This paper improves the known bounds for agnostic learning of concept classes with bounded Gaussian surface area by demonstrating that a polynomial degree of $\tilde{O}(\Gamma^2 / \varepsilon^2)$ suffices for $\varepsilon$ -approximation, thereby yielding near-optimal complexity for learning polynomial threshold functions in the statistical query model.

Lucas Pesenti, Lucas Slot, Manuel Wiedmer2026-03-09🤖 cs.LG

Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging

This paper demonstrates that Langevin dynamics combined with stochastic weight averaging can achieve optimal sample complexity of $n \gtrsim d^{k^\star/2}$ for recovering a hidden direction in high-dimensional settings like tensor PCA and single-index models, effectively emulating landscape smoothing without explicit regularization.

Stanley Wei, Alex Damian, Jason D. Lee2026-03-09🤖 cs.LG

TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation

TempoSyncDiff is a reference-conditioned latent diffusion framework that employs teacher-student distillation and temporal regularization to enable low-latency, temporally stable, and identity-consistent audio-driven talking head generation suitable for edge deployment.

Soumya Mazumdar, Vineet Kumar Rakesh2026-03-09🤖 cs.AI

Latent Diffusion-Based 3D Molecular Recovery from Vibrational Spectra

This paper introduces IR-GeoDiff, a latent diffusion model that recovers three-dimensional molecular geometries from infrared spectra by integrating spectral information into molecular representations, thereby addressing the limitations of existing 2D approaches in capturing the relationship between spectral features and 3D structure.

Wenjin Wu, Aleš Leonardis, Linjiang Chen, Jianbo Jiao2026-03-09🤖 cs.LG

Dynamic Momentum Recalibration in Online Gradient Learning

This paper introduces SGDF, an optimizer that applies optimal linear filtering principles to dynamically adjust momentum coefficients in real-time, thereby minimizing mean-squared error to achieve a superior balance between noise suppression and signal preservation compared to conventional momentum methods.

Zhipeng Yao, Rui Yu, Guisong Chang, Ying Li, Yu Zhang, Dazhou Li2026-03-09🤖 cs.LG

Diffusion Language Models Are Natively Length-Aware

This paper proposes a zero-shot mechanism that leverages latent prompt representations to dynamically crop the fixed context window of Diffusion Language Models before generation, significantly reducing computational costs while maintaining or improving performance across diverse tasks.

Vittorio Rossi, Giacomo Cirò, Davide Beltrame, Luca Gandolfi, Paul Röttger, Dirk Hovy2026-03-09🤖 cs.LG

DQE: A Semantic-Aware Evaluation Metric for Time Series Anomaly Detection

This paper proposes DQE, a novel semantic-aware evaluation metric for time series anomaly detection that addresses existing limitations in bias, consistency, and false alarm penalization by introducing a semantic-based partitioning strategy and aggregating scores across the full threshold spectrum to provide more stable, discriminative, and interpretable assessments.

Yuewei Li, Dalin Zhang, Huan Li, Xinyi Gong, Hongjun Chu, Zhaohui Song2026-03-09🤖 cs.LG

Partial Policy Gradients for RL in LLMs

This paper introduces a partial policy gradient method for reinforcement learning in LLMs that optimizes subsets of future rewards to enable the reliable learning and comparison of diverse policy classes, such as greedy, K-step lookahead, and segment policies, which demonstrate varying effectiveness across different persona-alignment conversational tasks.

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai2026-03-09🤖 cs.AI

Predictive Coding Graphs are a Superset of Feedforward Neural Networks

This paper demonstrates that predictive coding graphs constitute a mathematical superset of feedforward neural networks, thereby strengthening their theoretical foundation in machine learning and highlighting the importance of network topology.

Björn van Zwol2026-03-09🤖 cs.AI

Ensemble Graph Neural Networks for Probabilistic Sea Surface Temperature Forecasting via Input Perturbations

This paper demonstrates that an ensemble of Graph Neural Networks for regional sea surface temperature forecasting, which introduces diversity through spatially coherent input perturbations like Perlin noise rather than model retraining, achieves well-calibrated probabilistic forecasts with improved uncertainty representation at no additional training cost.

Alejandro J. González-Santana, Giovanny A. Cuervo-Londoño, Javier Sánchez2026-03-09🤖 cs.AI

Efficient Vector Search in the Wild: One Model for Multi-K Queries

The paper introduces OMEGA, a K-generalizable learned top-K search method that leverages a base model trained on K=1 with trajectory-based features and a dynamic refinement procedure to achieve high accuracy and low latency for multi-K vector queries while significantly reducing preprocessing time compared to state-of-the-art methods.

Yifan Peng, Jiafei Fan, Xingda Wei, Sijie Shen, Rong Chen, Jianning Wang, Xiaojian Luo, Wenyuan Yu, Jingren Zhou, Haibo Chen2026-03-09🤖 cs.LG

Contrastive-to-Self-Supervised: A Two-Stage Framework for Script Similarity Learning

This paper proposes a two-stage framework that first trains a contrastive encoder on labeled invented alphabets and then uses teacher-student distillation to learn unsupervised, deformation-invariant embeddings for historically attested scripts, effectively bridging supervised discriminative learning with unsupervised discovery of latent cross-script similarities without requiring ground-truth evolutionary relationships.

Claire Roman, Philippe Meyer2026-03-09🤖 cs.AI

Random Quadratic Form on a Sphere: Synchronization by Common Noise

This paper introduces the Random Quadratic Form (RQF), a stochastic differential equation on a sphere that demonstrates how common noise induces synchronization and token clustering in deep transformers, offering an alternative explanation to self-attention for these phenomena.

Maximilian Engel, Anna Shalova2026-03-09🤖 cs.LG

Topological descriptors of foot clearance gait dynamics improve differential diagnosis of Parkinsonism

This study demonstrates that integrating Topological Data Analysis with machine learning on foot clearance gait dynamics significantly improves the differential diagnosis between idiopathic Parkinson's disease and vascular Parkinsonism, achieving 83% accuracy and revealing sensitivity to levodopa-induced gait changes.

Jhonathan Barrios, Wolfram Erlhagen, Miguel F. Gago, Estela Bicho, Flora Ferreira2026-03-09🤖 cs.LG

FedSCS-XGB -- Federated Server-centric surrogate XGBoost for continual health monitoring

This paper introduces FedSCS-XGB, a novel federated learning protocol for distributed XGBoost that preserves the algorithm's structural advantages and achieves near-centralized performance (within a 1% gap) for continual health monitoring via wearable sensors.

Felix Walger, Mehdi Ejtehadi, Anke Schmeink, Diego Paez-Granados2026-03-09🤖 cs.LG

DC-Merge: Improving Model Merging with Directional Consistency

DC-Merge is a novel model merging method that achieves state-of-the-art performance by first balancing the energy distribution of task vectors through singular value smoothing and then aligning their directional geometries via projection onto a shared orthogonal subspace to preserve multi-task knowledge.

Han-Chen Zhang, Zi-Hao Zhou, Mao-Lin Luo, Shimin Di, Min-Ling Zhang, Tong Wei2026-03-09🤖 cs.LG

Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions

This paper analyzes the gradient flow dynamics of the value-softmax model, a core component of self-attention, to demonstrate that training inherently drives the system toward low-entropy solutions, thereby providing a theoretical explanation for empirical phenomena like attention sinks and massive activations.

Aditya Varre, Mark Rofin, Nicolas Flammarion2026-03-09🤖 cs.LG

SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data

This paper proposes SPPCSO, an adaptive penalized estimation method that integrates single-parametric principal component regression with $L_1$ regularization to achieve stable variable selection and robust coefficient estimation in high-dimensional, highly correlated, and noisy datasets, outperforming traditional approaches in both theoretical bounds and practical applications such as gene discovery.

Ying Hu, Hu Yang2026-03-09🤖 cs.LG

Synthetic Monitoring Environments for Reinforcement Learning

This paper introduces Synthetic Monitoring Environments (SMEs), a configurable suite of continuous control tasks with known optimal policies and exact regret metrics, designed to enable rigorous, white-box diagnostics and systematic analysis of reinforcement learning algorithms' performance under varying conditions.

Leonard Pleiss, Carolin Schmidt, Maximilian Schiffer2026-03-09🤖 cs.LG

← Previous Next →