Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

本文提出了一种名为 H-EARS 的混合能量感知奖励塑形方法,通过将基于势函数的奖励塑形与能量感知动作正则化相结合,在无需完整系统动力学模型的情况下实现了线性复杂度,从而显著提升了深度强化学习在连续控制任务中的收敛速度、稳定性及能效。

Qijun Liao (School of Mechanical Engineering, University of Science and Technology Beijing, China), Jue Yang (School of Mechanical Engineering, University of Science and Technology Beijing, China), Yiting Kang (School of Mechanical Engineering, University of Science and Technology Beijing, China), Xinxin Zhao (School of Mechanical Engineering, University of Science and Technology Beijing, China), Yong Zhang (Jiangsu XCMG Construction Machinery Research Institute Co., Ltd., China), Mingan Zhao (Jiangsu XCMG Construction Machinery Research Institute Co., Ltd., China)2026-03-13🤖 cs.LG

AutoScout: Structured Optimization for Automating ML System Configuration

AutoScout 提出了一种面向机器学习系统配置(涵盖训练、微调及推理)的通用优化框架,通过混合离散/连续优化与分层依赖建模,结合自适应特征优先级排序及多保真度模拟器集成,在显著降低配置搜索成本的同时实现了比专家调优快 2.7 至 3.0 倍的训练加速。

Jimmy Shong, Yuhan Ding, Yihan Jiang, Liheng Jing, Haonan Chen, Gaokai Zhang, Aditya Akella, Fan Lai2026-03-13🤖 cs.LG

Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE

该论文系统研究了部分旋转位置编码(Partial RoPE)对 Transformer 模型训练动态和收敛性的影响,发现仅需对约 10% 的隐藏维度应用 RoPE 即可在保持与全量 RoPE 相当性能的同时实现高达 10 倍的显存节省,并为平衡效率与训练稳定性提供了实用指导。

Mohammad Aflah Khan, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander2026-03-13🤖 cs.LG

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

该论文通过系统性研究揭示,对于大型预训练视觉 - 语言 - 动作(VLA)模型而言,结合低秩适应(LoRA)的简单序列微调策略在持续强化学习中表现卓越,不仅能有效避免灾难性遗忘并保留零样本泛化能力,其效果甚至优于复杂的持续学习方法。

Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin2026-03-13🤖 cs.LG

Context-dependent manifold learning: A neuromodulated constrained autoencoder approach

本文提出了一种名为神经调节约束自编码器(NcAE)的新方法,通过引入神经调节机制动态调整几何约束参数,成功实现了在多变环境条件下解耦全局上下文与局部流形表示的自适应降维学习。

Jérôme Adriaens (Neuroengineering Lab, Department of Electrical Engineering and Computer Science, University of Liège), Guillaume Drion (Neuroengineering Lab, Department of Electrical Engineering and Computer Science, University of Liège), Pierre Sacré (Neuroengineering Lab, Department of Electrical Engineering and Computer Science, University of Liège)2026-03-13🤖 cs.LG

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

本文提出了 EvoFlows,一种基于进化编辑流匹配的变长序列到序列蛋白质建模方法,它通过控制插入、删除和替换操作来预测突变及其位置,在保持与主流掩码语言模型相当的序列分布建模质量的同时,展现出从模板蛋白生成非平凡且自然类突变体的更优能力。

Nicolas Deutschmann, Constance Ferragu, Jonathan D. Ziegler, Shayan Aziznejad, Eli Bixby2026-03-13🤖 cs.LG

Anomaly detection in time-series via inductive biases in the latent space of conditional normalizing flows

该论文提出了一种基于条件归一化流的异常检测方法,通过在潜在空间引入显式归纳偏置并约束其遵循预设的时间动态,将异常检测转化为对潜在轨迹分布的统计一致性检验,从而有效解决了传统基于观测空间似然的方法难以识别违背时序结构异常的问题。

David Baumgartner, Eliezer de Souza da Silva, Iñigo Urteaga2026-03-13🤖 cs.AI

Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

本文提出了一种基于自由能的社会多臂老虎机学习算法,使智能体能够在无需奖励信息或先验规范的情况下,自主评估并有效利用非专家及多样化同伴的行为策略,从而在保持对数遗憾的同时显著提升个体学习性能。

Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli, Reshad Hosseini, Majid Nili Ahmadabadi2026-03-13📊 stat