Learning Bayesian and Markov Networks with an Unreliable Oracle

This paper investigates constraint-based structure learning for Markov and Bayesian networks using an unreliable oracle, demonstrating that Markov networks remain uniquely identifiable under bounded errors if vertex-wise disjoint paths are limited, whereas Bayesian networks cannot tolerate any errors for guaranteed identification, and subsequently providing algorithms for cases where unique identifiability holds.

Juha Harviainen, Pekka Parviainen, Vidya Sagar SharmaWed, 11 Ma🤖 cs.LG

Reconstructing Movement from Sparse Samples: Enhanced Spatio-Temporal Matching Strategies for Low-Frequency Data

This paper proposes four enhancements to the Spatial-Temporal Matching algorithm—dynamic buffering, adaptive observation probability, a redesigned temporal scoring function, and behavioral analysis—to improve the efficiency and accuracy of reconstructing GPS trajectories from sparse, low-frequency data in dense urban environments, as validated by experiments in Milan.

Ali Yousefian, Arianna Burzacchi, Simone VantiniWed, 11 Ma🤖 cs.LG

Interactive 3D visualization of surface roughness predictions in additive manufacturing: A data-driven framework

This paper presents a data-driven framework that combines a multilayer perceptron trained on experimental data augmented by a conditional generative adversarial network with an interactive 3D web interface to predict and visualize surface roughness in material extrusion additive manufacturing, enabling optimized process planning and part orientation.

Engin Deniz Erkan, Elif Surer, Ulas YamanWed, 11 Ma🤖 cs.LG

Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning

The paper introduces Reward-Zero, a general-purpose implicit reward mechanism that leverages language embeddings to transform natural-language task descriptions into dense, semantically grounded progress signals, thereby accelerating training, stabilizing learning, and improving generalization for reinforcement learning agents without requiring task-specific reward engineering.

Heng Zhang, Haddy Alchaer, Arash Ajoudani, Yu SheWed, 11 Ma🤖 cs.LG

Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification

This paper introduces efficient, representation-based transductive generalization bounds for graph node classification using optimal transport and Wasserstein distances, which not only correlate strongly with empirical performance but also explain the non-monotonic relationship between GNN depth and generalization error through the analysis of distributional transformations.

MoonJeong Park, Seungbeom Lee, Kyungmin Kim, Jaeseung Heo, Seunghyuk Cho, Shouheng Li, Sangdon Park, Dongwoo KimWed, 11 Ma🤖 cs.LG

Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

This paper introduces Test-Time Control (TTC), a hardware-efficient neural layer that embeds finite-horizon optimal control planning directly into pretrained LLMs via a symplectic LQR solver, significantly boosting mathematical reasoning performance without requiring test-time training.

Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, René VidalWed, 11 Ma🤖 cs.LG

Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation

This paper proposes \texttt{RQRE-OVI}, an optimistic value iteration algorithm that computes the unique and smooth Risk-Sensitive Quantal Response Equilibrium (RQRE) in general-sum Markov games with linear function approximation, offering a principled trade-off between performance and robustness that outperforms traditional Nash equilibrium approaches in both theoretical guarantees and empirical stability.

Jake Gonzales, Max Horwitz, Eric Mazumdar, Lillian J. RatliffWed, 11 Ma🤖 cs.LG

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

This paper introduces DCPO, a framework that resolves the inherent gradient conflict between accuracy and calibration in Reinforcement Learning from Verifiable Rewards by decoupling reasoning and confidence objectives, thereby achieving state-of-the-art calibration performance without compromising model accuracy.

Zhengzhao Ma, Xueru Wen, Boxi Cao, Yaojie Lu, Hongyu Lin, Jinglin Yang, Min He, Xianpei Han, Le SunWed, 11 Ma🤖 cs.LG

Probabilistic Hysteresis Factor Prediction for Electric Vehicle Batteries with Graphite Anodes Containing Silicon

This paper proposes a data-driven framework that harmonizes heterogeneous driving cycle data and employs statistical and deep learning models to enable efficient, probabilistic prediction of voltage hysteresis factors in silicon-graphite anode batteries, thereby improving state-of-charge estimation and generalizability across different vehicle models.

Runyao Yu, Viviana Kleine, Philipp Gromotka, Thomas Rudolf, Adrian Eisenmann, Gautham Ram Chandra Mouli, Peter Palensky, Jochen L. CremerWed, 11 Ma🤖 cs.LG

Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms

This paper identifies and theoretically proves that unmasked policy gradient algorithms systematically suppress valid actions at unvisited states due to parameter sharing and gradient propagation, a failure mode that action masking avoids and that can be mitigated in unmasked settings through feasibility classification.

Renos Zabounidis, Roy Siegelmann, Mohamad Qadri, Woojun Kim, Simon Stepputtis, Katia P. SycaraWed, 11 Ma🤖 cs.LG

PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing

This paper proposes a Reconfigurable Intelligent Surface (RIS)-aided semantic-aware Vehicle Edge Computing framework that utilizes a Proximal Policy Optimization (PPO) and Linear Programming (LP) hybrid scheme to jointly optimize offloading ratios, semantic symbols, and RIS phase shifts, achieving a 40–50% reduction in end-to-end latency compared to existing methods.

Wei Feng, Jingbo Zhang, Qiong Wu, Pingyi Fan, Qiang FanWed, 11 Ma🤖 cs.LG