cs.LG papers | Gist.Science

Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

This paper introduces Graph-GRPO, an online reinforcement learning framework that enhances Graph Flow Models through analytical transition probabilities and a localized refinement strategy, achieving state-of-the-art performance in graph generation and molecular optimization tasks.

Baoheng Zhu, Deyu Bo, Delvin Ce Zhang, Xiao Wang2026-03-12🤖 cs.LG

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

This paper analyzes the learning dynamics of two-layer over-parameterized linear networks under label noise SGD, revealing a two-phase process where noise drives the transition from the lazy to the rich regime to improve generalization, a mechanism that also extends to Sharpness-Aware Minimization (SAM).

Tongcheng Zhang, Zhanpeng Zhou, Mingze Wang, Andi Han, Wei Huang, Taiji Suzuki, Junchi Yan2026-03-12🤖 cs.LG

Designing Service Systems from Textual Evidence

This paper introduces PP-LUCB, a cost-efficient algorithm that optimally combines biased LLM-generated proxy scores with selective human audits to identify the best service system configuration while providing statistically valid confidence guarantees and significantly reducing audit costs.

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, David Simchi-Levi2026-03-12🤖 cs.LG

Effective Dataset Distillation for Spatio-Temporal Forecasting with Bi-dimensional Compression

The paper introduces STemDist, the first dataset distillation method designed for spatio-temporal forecasting that simultaneously compresses both spatial and temporal dimensions through a hybrid cluster-level and subset-based approach, achieving significantly faster training, reduced memory usage, and lower prediction errors compared to existing methods.

Taehyung Kwon, Yeonje Choi, Yeongho Kim, Kijung Shin2026-03-12🤖 cs.LG

Domain-Adaptive Health Indicator Learning with Degradation-Stage Synchronized Sampling and Cross-Domain Autoencoder

This paper proposes a domain-adaptive framework featuring degradation-stage synchronized batch sampling and a cross-domain aligned fusion large autoencoder to overcome distribution mismatches and temporal dependency limitations in health indicator learning, achieving significant performance improvements on industrial datasets.

Jungho Choo, Hanbyeol Park, Gawon Lee, Yunkyung Park, Hyerim Bae2026-03-12🤖 cs.LG

Adaptive Active Learning for Regression via Reinforcement Learning

This paper proposes Weighted improved Greedy Sampling (WiGS), a reinforcement learning-based active learning framework that dynamically adapts the balance between feature-space diversity and output-space uncertainty to outperform static multiplicative methods, particularly in domains with irregular data density.

Simon D. Nguyen, Troy Russo, Kentaro Hoffman, Tyler H. McCormick2026-03-12📊 stat

GGMPs: Generalized Gaussian Mixture Processes

This paper introduces Generalized Gaussian Mixture Processes (GGMPs), a scalable and tractable Gaussian process-based framework that enables multimodal conditional density estimation by combining local mixture fitting, cross-input component alignment, and per-component heteroscedastic GP training to overcome the unimodal limitations of standard GP regression.

Vardaan Tekriwal, Mark D. Risser, Hengrui Luo, Marcus M. Noack2026-03-12🤖 cs.LG

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

This paper identifies a coherent rank-one mean bias as the primary cause of numerical instability in low-bit LLM training and demonstrates that simply subtracting this mean restores stability and performance in FP4 quantization, offering a hardware-efficient alternative to complex spectral methods.

Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fanqi Yu, Ruijun Huang, Fang Dong, Xin Zhang, Jixian Zhou, Anrui Chen, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Yuan Cheng, Tun Lu, Fan Yang, Li Shang2026-03-12🤖 cs.LG

Unlearning the Unpromptable: Prompt-free Instance Unlearning in Diffusion Models

This paper introduces a prompt-free instance unlearning method for diffusion models that effectively removes specific, unpromptable undesired outputs—such as individual faces or culturally inaccurate depictions—while preserving the model's overall integrity through a novel approach combining image editing, timestep-aware weighting, and gradient surgery.

Kyungryeol Lee, Kyeonghyun Lee, Seongmin Hong, Byung Hyun Lee, Se Young Chun2026-03-12🤖 cs.LG

Brenier Isotonic Regression

This paper introduces "Brenier isotonic regression," a novel multi-output regression framework that extends classical isotonic regression by enforcing cyclic monotonicity through optimal transport principles, demonstrating superior performance in probability calibration and generalized linear models.

Han Bao, Amirreza Eshraghi, Yutong Wang2026-03-12📊 stat

Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

This study introduces a multi-resolution ConvLSTM stacking ensemble framework that effectively mitigates error accumulation and enhances the accuracy of long-horizon retaining wall deformation forecasting by integrating models trained on diverse temporal input resolutions.

Jihoon Kim (Department of Civil,Environmental Engineering, Hongik University, Seoul, Republic of Korea), Heejung Youn (Department of Civil,Environmental Engineering, Hongik University, Seoul, Republic of Korea)2026-03-12🤖 cs.LG

Beam-Plasma Collective Oscillations in Intense Charged-Particle Beams: Dielectric Response Theory, Langmuir Wave Dispersion, and Unsupervised Detection via Prometheus

This paper establishes a kinetic field theory for beam-plasma collective oscillations in intermediate-energy charged-particle beams, deriving dispersion relations and critical density thresholds that are validated by a Prometheus beta-VAE analyzing particle-in-cell simulation data to confirm predicted signatures like density-tunable resonances and Friedel oscillations.

Brandon Yee, Wilson Collins, Michael Iofin, Jiayi Fu2026-03-12🔬 physics

Muscle Synergy Priors Enhance Biomechanical Fidelity in Predictive Musculoskeletal Locomotion Simulation

This paper introduces a physiology-informed reinforcement learning framework that utilizes low-dimensional muscle synergies as a control constraint to significantly enhance the biomechanical fidelity and generalization of predictive musculoskeletal simulations across diverse locomotion conditions.

Ilseung Park (Carnegie Mellon University), Eunsik Choi (Seoul National University), Jangwhan Ahn (UNC-Chapel Hill and NC State University), Jooeun Ahn (Seoul National University)2026-03-12🤖 cs.LG

Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

This paper establishes the convergence of Dual Space Preconditioned Gradient Descent to an interpolating solution for overparameterized linear models using novel Bregman divergence techniques, while characterizing its implicit bias to show that isotropic preconditioners recover standard gradient descent solutions and general preconditioners yield solutions within a constant factor of the standard solution.

Reza Ghane, Danil Akhtiamov, Babak Hassibi2026-03-12📊 stat

JEDI: Jointly Embedded Inference of Neural Dynamics

The paper introduces JEDI, a hierarchical model that jointly learns shared embeddings over recurrent neural network weights to robustly infer generalizable, task-specific neural dynamics from noisy, high-dimensional experimental recordings, successfully recovering underlying mechanistic structures and providing insights into motor control.

Anirudh Jamkhandi, Ali Korojy, Olivier Codol, Guillaume Lajoie, Matthew G. Perich2026-03-12🧬 q-bio

A Universal Nearest-Neighbor Estimator for Intrinsic Dimensionality

This paper introduces a universal, nearest-neighbor-based estimator for intrinsic dimensionality that achieves state-of-the-art performance through simple calculations and theoretical guarantees of convergence independent of the underlying data distribution.

Eng-Jon Ong, Omer Bobrowski, Gesine Reinert, Primoz Skraba2026-03-12🤖 cs.LG

VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

The paper introduces VERI-DPO, an evidence-aware alignment framework that leverages claim verification to mine preference pairs for Direct Preference Optimization, significantly reducing unsupported claims and improving the faithfulness of clinical summarizations while maintaining informative length.

Weixin Liu, Congning Ni, Qingyuan Song, Susannah L. Rose, Christopher Symons, Murat Kantarcioglu, Bradley A. Malin, Zhijun Yin2026-03-12💬 cs.CL

A New Tensor Network: Tubal Tensor Train and Its Applications

This paper introduces the tubal tensor train (TTT) decomposition, a novel tensor network model that integrates t-product algebra with the tensor train structure to achieve linear storage scaling for high-order tensors, and validates its effectiveness through efficient algorithms and applications in image/video compression, tensor completion, and hyperspectral imaging.

Salman Ahmadi-Asl, Valentin Leplat, Anh-Huy Phan, Andrzej Cichocki2026-03-12🔢 math

Resource-constrained Amazons chess decision framework integrating large language models and graph attention

This paper proposes a lightweight hybrid framework for the Game of the Amazons that integrates Graph Attention Autoencoders, Stochastic Graph Genetic Algorithms, and GPT-4o-mini to overcome resource constraints, achieving decision accuracy improvements of 15%–56% over baselines and outperforming its teacher model by effectively denoising LLM outputs through structural graph reasoning.

Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Hanjie Liu, Leszek Rutkowski2026-03-12🤖 cs.AI

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

The paper introduces IH-Challenge, a reinforcement learning dataset designed to enhance instruction hierarchy robustness in frontier LLMs, which significantly improves their ability to prioritize instructions against conflicts and adversarial attacks while maintaining helpfulness and minimizing capability regression.

Chuan Guo (Michael Pokorny), Juan Felipe Ceron Uribe (Michael Pokorny), Sicheng Zhu (Michael Pokorny), Christopher A. Choquette-Choo (Michael Pokorny), Steph Lin (Michael Pokorny), Nikhil Kandpal (Michael Pokorny), Milad Nasr (Michael Pokorny), Rai (Michael Pokorny), Sam Toyer, Miles Wang, Yaodong Yu, Alex Beutel, Kai Xiao2026-03-12🤖 cs.AI

← Previous Next →