cs.LG papers | Gist.Science

Design Experiments to Compare Multi-armed Bandit Algorithms

This paper proposes "Artificial Replay," a new experimental design that reuses recorded rewards from a single policy trajectory to enable unbiased, cost-effective, and low-variance comparisons between multi-armed bandit algorithms, thereby significantly reducing the number of required user interactions compared to traditional independent restarts.

Huiling Meng, Ningyuan Chen, Xuefeng Gao2026-03-09🤖 cs.LG

Weak-SIGReg: Covariance Regularization for Stable Deep Learning

This paper introduces Weak-SIGReg, a computationally efficient covariance regularization method derived from Sketched Isotropic Gaussian Regularization that stabilizes deep learning optimization and prevents representation collapse in low-bias architectures like Vision Transformers without relying on standard architectural priors.

Habibullah Akbar2026-03-09🤖 cs.LG

Addressing the Ecological Fallacy in Larger LMs with Human Context

This paper demonstrates that addressing the ecological fallacy by modeling an author's language context through a specific task called HuLM, particularly during fine-tuning (HuFT) or continued pre-training, significantly improves the performance of an 8B Llama model across multiple downstream tasks compared to standard training methods.

Nikita Soni, Dhruv Vijay Kunjadiya, Pratham Piyush Shah, Dikshya Mohanty, H. Andrew Schwartz, Niranjan Balasubramanian2026-03-09🤖 cs.AI

A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA

This paper presents an FPGA accelerator that eliminates the memory-bound bottleneck of Gated DeltaNet decode by persistently storing the full recurrent state in on-chip BRAM, achieving 4.5 $\times$ faster inference and up to 60 $\times$ higher energy efficiency compared to an NVIDIA H100 GPU.

Neelesh Gupta, Peter Wang, Rajgopal Kannan, Viktor K. Prasanna2026-03-09🤖 cs.LG

Implicit Style Conditioning: A Structured Style-Rewrite Framework for Low-Resource Character Modeling

This paper proposes a Structured Style-Rewrite Framework that combines explicit disentanglement of stylistic dimensions with implicit Chain-of-Thought conditioning to enable small language models to achieve high-fidelity, consistent character role-playing without requiring explicit reasoning tokens during inference.

Chanhui Zhu2026-03-09🤖 cs.LG

Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models

This paper proposes an interpretable modeling approach that integrates person-level psychological traits with situational context features derived from social media data to predict dynamic mental well-being, demonstrating that theory-driven methods offer competitive performance and greater human-understandable insights compared to standard language model embeddings.

Nikita Soni, August Håkan Nilsson, Syeda Mahwish, Vasudha Varadarajan, H. Andrew Schwartz, Ryan L. Boyd2026-03-09🤖 cs.AI

Omni-Masked Gradient Descent: Memory-Efficient Optimization via Mask Traversal with Improved Convergence

The paper proposes Omni-Masked Gradient Descent (OMGD), a memory-efficient optimization method that achieves a strictly improved nonconvex convergence rate of $\tilde{\mathcal{O}}(\epsilon^{-3})$ and demonstrates consistent empirical improvements in large language model training tasks.

Hui Yang, Tao Ren, Jinyang Jiang, Wan Tian, Yijie Peng2026-03-09🤖 cs.LG

TADPO: Reinforcement Learning Goes Off-road

This paper introduces TADPO, a novel reinforcement learning framework that extends Proximal Policy Optimization with off-policy teacher guidance and on-policy student exploration to enable zero-shot sim-to-real, high-speed autonomous driving on full-scale off-road vehicles navigating complex, unmapped terrain.

Zhouchonghao Wu, Raymond Song, Vedant Mundheda, Luis E. Navarro-Serment, Christof Schoenborn, Jeff Schneider2026-03-09🤖 cs.AI

EvoESAP: Non-Uniform Expert Pruning for Sparse MoE

The paper introduces EvoESAP, an evolutionary framework that optimizes non-uniform layer-wise sparsity allocations for Sparse Mixture-of-Experts models using a stable, speculative-decoding-inspired metric called ESAP, significantly improving open-ended generation performance while maintaining competitive accuracy compared to traditional uniform pruning methods.

Zongfang Liu, Shengkun Tang, Boyang Sun, Zhiqiang Shen, Xin Yuan2026-03-09🤖 cs.LG

Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments

This paper identifies that learning stagnation in PPO arises from poor sample-based loss estimates due to excessive step sizes relative to gradient noise, proposing that scaling to over one million parallel environments effectively mitigates this issue and enables monotonic performance improvements up to one trillion transitions.

Michael Beukman, Khimya Khetarpal, Zeyu Zheng, Will Dabney, Jakob Foerster, Michael Dennis, Clare Lyle2026-03-09🤖 cs.LG

Agnostic learning in (almost) optimal time via Gaussian surface area

This paper improves the known bounds for agnostic learning of concept classes with bounded Gaussian surface area by demonstrating that a polynomial degree of $\tilde{O}(\Gamma^2 / \varepsilon^2)$ suffices for $\varepsilon$ -approximation, thereby yielding near-optimal complexity for learning polynomial threshold functions in the statistical query model.

Lucas Pesenti, Lucas Slot, Manuel Wiedmer2026-03-09🤖 cs.LG

Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging

This paper demonstrates that Langevin dynamics combined with stochastic weight averaging can achieve optimal sample complexity of $n \gtrsim d^{k^\star/2}$ for recovering a hidden direction in high-dimensional settings like tensor PCA and single-index models, effectively emulating landscape smoothing without explicit regularization.

Stanley Wei, Alex Damian, Jason D. Lee2026-03-09🤖 cs.LG

TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation

TempoSyncDiff is a reference-conditioned latent diffusion framework that employs teacher-student distillation and temporal regularization to enable low-latency, temporally stable, and identity-consistent audio-driven talking head generation suitable for edge deployment.

Soumya Mazumdar, Vineet Kumar Rakesh2026-03-09🤖 cs.AI

Latent Diffusion-Based 3D Molecular Recovery from Vibrational Spectra

This paper introduces IR-GeoDiff, a latent diffusion model that recovers three-dimensional molecular geometries from infrared spectra by integrating spectral information into molecular representations, thereby addressing the limitations of existing 2D approaches in capturing the relationship between spectral features and 3D structure.

Wenjin Wu, Aleš Leonardis, Linjiang Chen, Jianbo Jiao2026-03-09🤖 cs.LG

Dynamic Momentum Recalibration in Online Gradient Learning

This paper introduces SGDF, an optimizer that applies optimal linear filtering principles to dynamically adjust momentum coefficients in real-time, thereby minimizing mean-squared error to achieve a superior balance between noise suppression and signal preservation compared to conventional momentum methods.

Zhipeng Yao, Rui Yu, Guisong Chang, Ying Li, Yu Zhang, Dazhou Li2026-03-09🤖 cs.LG

Diffusion Language Models Are Natively Length-Aware

This paper proposes a zero-shot mechanism that leverages latent prompt representations to dynamically crop the fixed context window of Diffusion Language Models before generation, significantly reducing computational costs while maintaining or improving performance across diverse tasks.

Vittorio Rossi, Giacomo Cirò, Davide Beltrame, Luca Gandolfi, Paul Röttger, Dirk Hovy2026-03-09🤖 cs.LG

DQE: A Semantic-Aware Evaluation Metric for Time Series Anomaly Detection

This paper proposes DQE, a novel semantic-aware evaluation metric for time series anomaly detection that addresses existing limitations in bias, consistency, and false alarm penalization by introducing a semantic-based partitioning strategy and aggregating scores across the full threshold spectrum to provide more stable, discriminative, and interpretable assessments.

Yuewei Li, Dalin Zhang, Huan Li, Xinyi Gong, Hongjun Chu, Zhaohui Song2026-03-09🤖 cs.LG

Partial Policy Gradients for RL in LLMs

This paper introduces a partial policy gradient method for reinforcement learning in LLMs that optimizes subsets of future rewards to enable the reliable learning and comparison of diverse policy classes, such as greedy, K-step lookahead, and segment policies, which demonstrate varying effectiveness across different persona-alignment conversational tasks.

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai2026-03-09🤖 cs.AI

Predictive Coding Graphs are a Superset of Feedforward Neural Networks

This paper demonstrates that predictive coding graphs constitute a mathematical superset of feedforward neural networks, thereby strengthening their theoretical foundation in machine learning and highlighting the importance of network topology.

Björn van Zwol2026-03-09🤖 cs.AI

Ensemble Graph Neural Networks for Probabilistic Sea Surface Temperature Forecasting via Input Perturbations

This paper demonstrates that an ensemble of Graph Neural Networks for regional sea surface temperature forecasting, which introduces diversity through spatially coherent input perturbations like Perlin noise rather than model retraining, achieves well-calibrated probabilistic forecasts with improved uncertainty representation at no additional training cost.

Alejandro J. González-Santana, Giovanny A. Cuervo-Londoño, Javier Sánchez2026-03-09🤖 cs.AI

← Previous Next →