More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty

The paper introduces EDU-PRM, an entropy-driven process reward model that automatically identifies reasoning step boundaries using predictive entropy to eliminate manual annotations, achieving state-of-the-art performance with only 1.5% of the training data while significantly improving accuracy and reducing token usage.

Lang Cao, Renhong Chen, Yingtian Zou, Chao Peng, Huacong Xu, Yuxian Wang, Wu Ning, Qian Chen, Mofan Peng, Zijie Chen, Peishuo Su, Yitong LiTue, 10 Ma🤖 cs.LG

A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

This paper introduces a vision-based reinforcement learning agent that achieves champion-level performance in Gran Turismo 7 by utilizing an asymmetric actor-critic framework to rely solely on ego-centric camera views and onboard sensors, thereby eliminating the need for external global localization while outperforming the game's built-in drivers.

Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. WurmanTue, 10 Ma🤖 cs.LG

StablePCA: Distributionally Robust Learning of Shared Representations from Multi-Source Data

This paper introduces StablePCA, a distributionally robust framework for extracting shared low-dimensional representations from multi-source data by maximizing worst-case explained variance, and addresses its inherent nonconvexity through a convex relaxation solved by an efficient Mirror-Prox algorithm with global convergence guarantees and a data-dependent certificate for solution tightness.

Zhenyu Wang, Molei Liu, Jing Lei, Francis Bach, Zijian GuoTue, 10 Ma🤖 cs.LG

Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation

This paper employs mechanistic interpretability to reveal that knowledge distillation not only compresses teacher models into smaller students but also fundamentally restructures their internal circuits by reorganizing, discarding, and relying more heavily on fewer components, necessitating new metrics to quantify these internal functional shifts beyond mere output similarity.

Reilly Haskins, Benjamin AdamsTue, 10 Ma🤖 cs.LG

EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

To address the data scarcity in dexterous manipulation imitation learning, this paper introduces EgoDex, the largest and most diverse dataset of its kind featuring 829 hours of Apple Vision Pro-captured egocentric videos with precise, native 3D hand and finger tracking, alongside established benchmarks for training and evaluating manipulation policies.

Ryan Hoque, Peide Huang, David J. Yoon, Mouli Sivapurapu, Jian ZhangTue, 10 Ma🤖 cs.LG

Online Decision-Focused Learning

This paper introduces the first provably convergent online algorithms for decision-focused learning in dynamic environments by regularizing non-differentiable objectives and employing perturbation techniques to handle non-convexity, thereby establishing static and dynamic regret bounds and demonstrating superior performance over standard benchmarks.

Aymeric Capitaine, Maxime Haddouche, Eric Moulines, Michael I. Jordan, Etienne Boursier, Alain DurmusTue, 10 Ma🤖 cs.LG

MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision

MAS-ZERO is a novel, self-evolved inference-time framework that automatically designs, critiques, and refines multi-agent system configurations for specific tasks without requiring a validation set, achieving significant performance improvements over manual and existing automatic baselines across reasoning, coding, and agentic benchmarks.

Zixuan Ke, Austin Xu, Yifei Ming, Xuan-Phi Nguyen, Ryan Chin, Caiming Xiong, Shafiq JotyTue, 10 Ma🤖 cs.LG

HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases

The paper proposes HDLxGraph, a novel framework that integrates Abstract Syntax Trees and Data Flow Graphs into Retrieval Augmented Generation to overcome structural and vocabulary mismatches in Hardware Description Language tasks, while also introducing the HDLSearch benchmark to demonstrate significant improvements in search, debugging, and code completion accuracy over existing baselines.

Pingqing Zheng (Katie), Jiayin Qin (Katie), Fuqi Zhang (Katie), Niraj Chitla (Katie), Zishen Wan (Katie), Shang Wu (Katie), Yu Cao (Katie), Caiwen Ding (Katie), Yang (Katie), ZhaoTue, 10 Ma🤖 cs.LG

The Cell Must Go On: Agar.io for Continual Reinforcement Learning

This paper introduces AgarCL, a research platform based on the non-episodic game Agar.io designed to advance continual reinforcement learning by providing a complex, dynamic environment where standard algorithms and existing continual learning methods face significant challenges beyond the traditional stability-plasticity dilemma.

Mohamed A. Mohamed, Kateryna Nekhomiazh, Vedant Vyas, Marcos M. Jose, Andrew Patterson, Marlos C. MachadoTue, 10 Ma🤖 cs.LG

X-MethaneWet: A Cross-scale Global Wetland Methane Emission Benchmark Dataset for Advancing Science Discovery with AI

This paper introduces X-MethaneWet, the first cross-scale global wetland methane benchmark dataset combining physics-based simulations and real-world observations, and demonstrates how deep learning models enhanced by transfer learning can significantly improve methane flux prediction and climate modeling.

Yiming Sun, Shuo Chen, Shengyu Chen, Chonghao Qiu, Licheng Liu, Youmi Oh, Sparkle L. Malone, Gavin McNicol, Qianlai Zhuang, Chris Smith, Yiqun Xie, Xiaowei JiaTue, 10 Ma🤖 cs.LG