cs.LG 篇论文 | Gist.Science

Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

本文提出了 Graph-GRPO，一种针对图流模型（GFM）的在线强化学习框架，其通过推导转移概率的解析表达式以支持完全可微的 RL 训练，并引入局部扰动重生成策略以实现自我改进，从而在分子优化等任务中显著提升了生成质量并取得了最先进性能。

Baoheng Zhu, Deyu Bo, Delvin Ce Zhang, Xiao Wang2026-03-12🤖 cs.LG

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

该论文通过理论分析与实验验证，揭示了标签噪声 SGD 在两层过参数化线性网络中通过驱动模型从“懒惰”区域向“丰富”区域转变并增强权重与真实插值器的对齐，从而解释了其提升泛化能力的内在机制，并将该发现推广至锐度感知最小化（SAM）等更广泛的优化算法。

Tongcheng Zhang, Zhanpeng Zhou, Mingze Wang, Andi Han, Wei Huang, Taiji Suzuki, Junchi Yan2026-03-12🤖 cs.LG

Designing Service Systems from Textual Evidence

该论文针对服务系统配置优化中 LLM 自动评分存在偏差而人工审核成本高昂的问题，提出了一种名为 PP-LUCB 的序贯决策算法，通过结合代理分数与逆倾向加权残差估计，在显著降低人工审计成本的同时，以高置信度准确识别出最优服务配置。

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, David Simchi-Levi2026-03-12🤖 cs.LG

Effective Dataset Distillation for Spatio-Temporal Forecasting with Bi-dimensional Compression

本文提出了首个专门针对时空时间序列预测的数据集蒸馏方法 STemDist，通过平衡压缩时空维度并结合粗粒度聚类与细粒度子集蒸馏技术，在显著降低训练时间和内存消耗的同时，实现了比现有方法更低的预测误差。

Taehyung Kwon, Yeonje Choi, Yeongho Kim, Kijung Shin2026-03-12🤖 cs.LG

Domain-Adaptive Health Indicator Learning with Degradation-Stage Synchronized Sampling and Cross-Domain Autoencoder

该论文提出了一种结合退化阶段同步采样（DSSBS）与跨域对齐融合大自编码器（CAFLAE）的领域自适应框架，通过解决退化阶段失配和长程时序依赖捕捉难题，显著提升了变工况下健康指标的学习性能。

Jungho Choo, Hanbyeol Park, Gawon Lee, Yunkyung Park, Hyerim Bae2026-03-12🤖 cs.LG

Adaptive Active Learning for Regression via Reinforcement Learning

本文提出了一种名为加权改进贪婪采样（WiGS）的新方法，通过强化学习动态调整探索与利用的平衡，从而在回归主动学习中克服了传统静态乘积规则的局限性，显著提升了在数据分布不规则场景下的采样效率与预测精度。

Simon D. Nguyen, Troy Russo, Kentaro Hoffman, Tyler H. McCormick2026-03-12📊 stat

GGMPs: Generalized Gaussian Mixture Processes

本文提出了一种名为广义高斯混合过程（GGMP）的新方法，通过结合局部高斯混合拟合、跨输入分量对齐及分量异方差高斯过程训练，在保持计算可行性的同时实现了针对多模态、异方差及强非高斯数据的条件密度估计。

Vardaan Tekriwal, Mark D. Risser, Hengrui Luo, Marcus M. Noack2026-03-12🤖 cs.LG

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

该论文指出大语言模型低比特训练中的数值不稳定性主要由秩一均值偏差驱动，并提出通过简单的均值减法消除该偏差，从而在无需复杂 SVD 分解的情况下显著提升了 FP4 量化训练的稳定性与性能。

Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fanqi Yu, Ruijun Huang, Fang Dong, Xin Zhang, Jixian Zhou, Anrui Chen, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Yuan Cheng, Tun Lu, Fan Yang, Li Shang2026-03-12🤖 cs.LG

Unlearning the Unpromptable: Prompt-free Instance Unlearning in Diffusion Models

该论文提出了一种基于代理的无提示实例遗忘方法，利用图像编辑、时间步感知加权和梯度手术技术，使扩散模型能够精准遗忘无法通过文本提示指定的特定实例（如人脸或文化误读），同时保持模型其余功能的完整性。

Kyungryeol Lee, Kyeonghyun Lee, Seongmin Hong, Byung Hyun Lee, Se Young Chun2026-03-12🤖 cs.LG

Brenier Isotonic Regression

该论文提出了一种名为“布伦尼尔等距回归”的新型多输出回归方法，它利用最优传输理论将循环单调性约束转化为凸势函数优化问题，从而在概率校准等任务中展现出优于现有基线的性能。

Han Bao, Amirreza Eshraghi, Yutong Wang2026-03-12📊 stat

Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

该研究提出了一种多分辨率 ConvLSTM 集成框架，通过融合不同时间尺度的输入数据，有效缓解了误差累积问题，显著提升了基坑开挖过程中挡土墙变形的长时序预测精度与稳定性。

Jihoon Kim (Department of Civil,Environmental Engineering, Hongik University, Seoul, Republic of Korea), Heejung Youn (Department of Civil,Environmental Engineering, Hongik University, Seoul, Republic of Korea)2026-03-12🤖 cs.LG

Beam-Plasma Collective Oscillations in Intense Charged-Particle Beams: Dielectric Response Theory, Langmuir Wave Dispersion, and Unsupervised Detection via Prometheus

该论文通过建立基于 Vlasov-Poisson 系统的动力学场论框架推导了强流带电粒子束的朗缪尔波色散关系，并利用 Prometheus 无监督学习模型验证了等离子体频率、异常束展宽及弗里德尔振荡等集体振荡特征。

Brandon Yee, Wilson Collins, Michael Iofin, Jiayi Fu2026-03-12🔬 physics

Chuan Guo (Michael Pokorny), Juan Felipe Ceron Uribe (Michael Pokorny), Sicheng Zhu (Michael Pokorny), Christopher A. Choquette-Choo (Michael Pokorny), Steph Lin (Michael Pokorny), Nikhil Kandpal (Michael Pokorny), Milad Nasr (Michael Pokorny), Rai (Michael Pokorny), Sam Toyer, Miles Wang, Yaodong Yu, Alex Beutel, Kai Xiao2026-03-12🤖 cs.AI

cs.LG

Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

Designing Service Systems from Textual Evidence

Effective Dataset Distillation for Spatio-Temporal Forecasting with Bi-dimensional Compression

Domain-Adaptive Health Indicator Learning with Degradation-Stage Synchronized Sampling and Cross-Domain Autoencoder

Adaptive Active Learning for Regression via Reinforcement Learning

GGMPs: Generalized Gaussian Mixture Processes

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Unlearning the Unpromptable: Prompt-free Instance Unlearning in Diffusion Models

Brenier Isotonic Regression

Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

Beam-Plasma Collective Oscillations in Intense Charged-Particle Beams: Dielectric Response Theory, Langmuir Wave Dispersion, and Unsupervised Detection via Prometheus

Muscle Synergy Priors Enhance Biomechanical Fidelity in Predictive Musculoskeletal Locomotion Simulation

Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

JEDI: Jointly Embedded Inference of Neural Dynamics

A Universal Nearest-Neighbor Estimator for Intrinsic Dimensionality

VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

A New Tensor Network: Tubal Tensor Train and Its Applications

Resource-constrained Amazons chess decision framework integrating large language models and graph attention

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs