cs.LG papers | Gist.Science

More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty

The paper introduces EDU-PRM, an entropy-driven process reward model that automatically identifies reasoning step boundaries using predictive entropy to eliminate manual annotations, achieving state-of-the-art performance with only 1.5% of the training data while significantly improving accuracy and reducing token usage.

Lang Cao, Renhong Chen, Yingtian Zou, Chao Peng, Huacong Xu, Yuxian Wang, Wu Ning, Qian Chen, Mofan Peng, Zijie Chen, Peishuo Su, Yitong Li2026-03-10🤖 cs.LG

Enhancing Metabolic Syndrome Prediction with Hybrid Data Balancing and Counterfactuals

This paper proposes MetaBoost, a novel hybrid framework combining SMOTE, ADASYN, and CTGAN to optimize data balancing for enhanced Metabolic Syndrome prediction, while utilizing counterfactual analysis to identify blood glucose and triglycerides as the most critical modifiable risk factors.

Sanyam Paresh Shah, Abdullah Mamun, Shovito Barua Soumma + 1 more2026-03-10🤖 cs.AI

Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms

This study demonstrates that while Large Language Models can directly estimate item difficulty for K-5 assessments, a hybrid approach combining LLM-extracted cognitive and linguistic features with tree-based machine learning algorithms yields significantly higher predictive accuracy, offering a scalable alternative to resource-intensive field testing.

Pooya Razavi, Sonya Powers2026-03-10🤖 cs.LG

A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

This paper introduces a vision-based reinforcement learning agent that achieves champion-level performance in Gran Turismo 7 by utilizing an asymmetric actor-critic framework to rely solely on ego-centric camera views and onboard sensors, thereby eliminating the need for external global localization while outperforming the game's built-in drivers.

Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman2026-03-10🤖 cs.LG

Structural Inference: Interpreting Small Language Models with Susceptibilities

This paper introduces a linear response framework that models neural networks as Bayesian statistical mechanical systems to efficiently compute susceptibility-based attribution scores, revealing a low-rank structure that isolates functional modules like multigram and induction heads in small transformers.

Garrett Baker, George Wang, Jesse Hoogland, Daniel Murfet2026-03-10🤖 cs.LG

Learning to Rank Critical Road Segments via Heterogeneous Graphs with Origin-Destination Flow Integration

This paper proposes HetGL2R, a heterogeneous graph learning framework that integrates origin-destination flows, routes, and network topology via a tripartite graph and attribute-guided nodes to effectively rank critical road segments by capturing long-range spatial dependencies and functional similarities.

Ming Xu, Jinrong Xiang, Zilong Xie + 1 more2026-03-10🤖 cs.LG

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

This paper presents a comprehensive review that consolidates fragmented evaluation efforts into a unified taxonomy of approximately 60 benchmarks, surveys AI-agent frameworks and collaboration protocols, and explores real-world applications and future research directions for autonomous AI agents.

Mohamed Amine Ferrag, Norbert Tihanyi, Merouane Debbah2026-03-10🤖 cs.LG

StablePCA: Distributionally Robust Learning of Shared Representations from Multi-Source Data

This paper introduces StablePCA, a distributionally robust framework for extracting shared low-dimensional representations from multi-source data by maximizing worst-case explained variance, and addresses its inherent nonconvexity through a convex relaxation solved by an efficient Mirror-Prox algorithm with global convergence guarantees and a data-dependent certificate for solution tightness.

Zhenyu Wang, Molei Liu, Jing Lei, Francis Bach, Zijian Guo2026-03-10🤖 cs.LG

Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data

This paper proposes a penalized pessimistic personalized policy learning (P4L) framework that leverages individual latent variables to derive optimal policies for heterogeneous populations from offline data, achieving fast regret rates under weak coverage assumptions and outperforming existing methods in both simulations and real-world applications.

Rui Miao, Babak Shahbaba, Annie Qu2026-03-10🤖 cs.LG

Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation

This paper employs mechanistic interpretability to reveal that knowledge distillation not only compresses teacher models into smaller students but also fundamentally restructures their internal circuits by reorganizing, discarding, and relying more heavily on fewer components, necessitating new metrics to quantify these internal functional shifts beyond mere output similarity.

Reilly Haskins, Benjamin Adams2026-03-10🤖 cs.LG

Ready2Unlearn: A Learning-Time Approach for Preparing Models with Future Unlearning Readiness

This paper introduces Ready2Unlearn, a proactive, model-agnostic training-time optimization approach that leverages meta-learning principles to prepare machine learning models for efficient and principled future unlearning, shifting the focus from reactive post-deployment algorithms to forward-looking readiness.

Hanyu Duan, Yi Yang, Ahmed Abbasi, Kar Yan Tam2026-03-10🤖 cs.LG

EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

To address the data scarcity in dexterous manipulation imitation learning, this paper introduces EgoDex, the largest and most diverse dataset of its kind featuring 829 hours of Apple Vision Pro-captured egocentric videos with precise, native 3D hand and finger tracking, alongside established benchmarks for training and evaluating manipulation policies.

Ryan Hoque, Peide Huang, David J. Yoon, Mouli Sivapurapu, Jian Zhang2026-03-10🤖 cs.LG

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

FreeKV is a training-free framework that combines speculative retrieval, fine-grained correction, and hybrid CPU-GPU memory management to significantly accelerate KV cache retrieval for large language models, achieving up to a 13× speedup over state-of-the-art methods while maintaining near-lossless accuracy.

Guangda Liu, Chengwei Li, Zhenyu Ning, Jing Lin, Yiwu Yao, Danning Ke, Minyi Guo, Jieru Zhao2026-03-10🤖 cs.LG

Online Decision-Focused Learning

This paper introduces the first provably convergent online algorithms for decision-focused learning in dynamic environments by regularizing non-differentiable objectives and employing perturbation techniques to handle non-convexity, thereby establishing static and dynamic regret bounds and demonstrating superior performance over standard benchmarks.

Aymeric Capitaine, Maxime Haddouche, Eric Moulines, Michael I. Jordan, Etienne Boursier, Alain Durmus2026-03-10🤖 cs.LG

Vid2World: Crafting Video Diffusion Models to Interactive World Models

Vid2World is a general framework that transforms pre-trained video diffusion models into interactive world models by implementing causalization techniques and a causal action guidance mechanism to enable high-fidelity, controllable, and autoregressive future prediction across diverse domains.

Siqiao Huang, Jialong Wu, Qixing Zhou, Shangchen Miao, Mingsheng Long2026-03-10🤖 cs.LG

MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision

MAS-ZERO is a novel, self-evolved inference-time framework that automatically designs, critiques, and refines multi-agent system configurations for specific tasks without requiring a validation set, achieving significant performance improvements over manual and existing automatic baselines across reasoning, coding, and agentic benchmarks.

Zixuan Ke, Austin Xu, Yifei Ming, Xuan-Phi Nguyen, Ryan Chin, Caiming Xiong, Shafiq Joty2026-03-10🤖 cs.LG

HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases

The paper proposes HDLxGraph, a novel framework that integrates Abstract Syntax Trees and Data Flow Graphs into Retrieval Augmented Generation to overcome structural and vocabulary mismatches in Hardware Description Language tasks, while also introducing the HDLSearch benchmark to demonstrate significant improvements in search, debugging, and code completion accuracy over existing baselines.

Pingqing Zheng (Katie), Jiayin Qin (Katie), Fuqi Zhang (Katie), Niraj Chitla (Katie), Zishen Wan (Katie), Shang Wu (Katie), Yu Cao (Katie), Caiwen Ding (Katie), Yang (Katie), Zhao2026-03-10🤖 cs.LG

WikiDBGraph: A Data Management Benchmark Suite for Collaborative Learning over Database Silos

This paper introduces WikiDBGraph, a large-scale benchmark suite derived from 100,000 real-world relational databases, designed to evaluate and expose the limitations of existing collaborative learning frameworks in handling the complex, unaligned, and interconnected nature of practical data silos.

Zhaomin Wu, Ziyang Wang, Bingsheng He2026-03-10🤖 cs.LG

The Cell Must Go On: Agar.io for Continual Reinforcement Learning

This paper introduces AgarCL, a research platform based on the non-episodic game Agar.io designed to advance continual reinforcement learning by providing a complex, dynamic environment where standard algorithms and existing continual learning methods face significant challenges beyond the traditional stability-plasticity dilemma.

Mohamed A. Mohamed, Kateryna Nekhomiazh, Vedant Vyas, Marcos M. Jose, Andrew Patterson, Marlos C. Machado2026-03-10🤖 cs.LG

X-MethaneWet: A Cross-scale Global Wetland Methane Emission Benchmark Dataset for Advancing Science Discovery with AI

This paper introduces X-MethaneWet, the first cross-scale global wetland methane benchmark dataset combining physics-based simulations and real-world observations, and demonstrates how deep learning models enhanced by transfer learning can significantly improve methane flux prediction and climate modeling.

Yiming Sun, Shuo Chen, Shengyu Chen, Chonghao Qiu, Licheng Liu, Youmi Oh, Sparkle L. Malone, Gavin McNicol, Qianlai Zhuang, Chris Smith, Yiqun Xie, Xiaowei Jia2026-03-10🤖 cs.LG

← Previous Next →