cs.LG papers | Gist.Science

Learning Bayesian and Markov Networks with an Unreliable Oracle

This paper investigates constraint-based structure learning for Markov and Bayesian networks using an unreliable oracle, demonstrating that Markov networks remain uniquely identifiable under bounded errors if vertex-wise disjoint paths are limited, whereas Bayesian networks cannot tolerate any errors for guaranteed identification, and subsequently providing algorithms for cases where unique identifiability holds.

Juha Harviainen, Pekka Parviainen, Vidya Sagar SharmaWed, 11 Ma🤖 cs.LG

From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation

This paper proposes Nonparametric Weighting (NW) and Model-assisted Nonparametric Weighting (MNW) estimators for off-policy evaluation in contextual bandits, which achieve lower variance than traditional Inverse Probability Weighting and Doubly Robust methods while maintaining low bias by constructing weights via nonparametric models and incorporating reward predictions.

Rong J. B. ZhuWed, 11 Ma🤖 cs.LG

Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning

This paper systematically analyzes how Markov Decision Process design choices impact the sim-to-real gap in industrial process control, demonstrating through a color mixing task that physics-based dynamics models significantly outperform simplified models in achieving real-world success under strict precision constraints.

Tatjana Krau, Jorge Mandlmaier, Tobias Damm, Frieder HeieckWed, 11 Ma🤖 cs.LG

Reconstructing Movement from Sparse Samples: Enhanced Spatio-Temporal Matching Strategies for Low-Frequency Data

This paper proposes four enhancements to the Spatial-Temporal Matching algorithm—dynamic buffering, adaptive observation probability, a redesigned temporal scoring function, and behavioral analysis—to improve the efficiency and accuracy of reconstructing GPS trajectories from sparse, low-frequency data in dense urban environments, as validated by experiments in Milan.

Ali Yousefian, Arianna Burzacchi, Simone VantiniWed, 11 Ma🤖 cs.LG

From Representation to Clusters: A Contrastive Learning Approach for Attributed Hypergraph Clustering

The paper proposes CAHC, an end-to-end contrastive learning framework for attributed hypergraph clustering that integrates node-level and hyperedge-level objectives with joint embedding and cluster assignment optimization to outperform existing two-stage methods.

Li Ni, Shuaikang Zeng, Lin Mu, Longlong LinWed, 11 Ma🤖 cs.LG

Interactive 3D visualization of surface roughness predictions in additive manufacturing: A data-driven framework

This paper presents a data-driven framework that combines a multilayer perceptron trained on experimental data augmented by a conditional generative adversarial network with an interactive 3D web interface to predict and visualize surface roughness in material extrusion additive manufacturing, enabling optimized process planning and part orientation.

Engin Deniz Erkan, Elif Surer, Ulas YamanWed, 11 Ma🤖 cs.LG

Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning

The paper introduces Reward-Zero, a general-purpose implicit reward mechanism that leverages language embeddings to transform natural-language task descriptions into dense, semantically grounded progress signals, thereby accelerating training, stabilizing learning, and improving generalization for reinforcement learning agents without requiring task-specific reward engineering.

Heng Zhang, Haddy Alchaer, Arash Ajoudani, Yu SheWed, 11 Ma🤖 cs.LG

A Gaussian Comparison Theorem for Training Dynamics in Machine Learning

This paper establishes a non-asymptotic Gaussian comparison theorem based on Gordon's theorem to rigorously validate dynamic mean-field expressions and derive refined iterative approximations for the training dynamics of machine learning models, such as perceptrons, under Gaussian mixture data.

Ashkan PanahiWed, 11 Ma🤖 cs.LG

Proxy-Guided Measurement Calibration

This paper proposes a proxy-guided framework using variational autoencoders and causal modeling to identify and correct systematic measurement errors in aggregate outcome variables by leveraging proxy variables that depend on true outcomes but are independent of bias mechanisms.

Saketh Vishnubhatla, Shu Wan, Andre Harrison, Adrienne Raglin, Huan LiuWed, 11 Ma🤖 cs.LG

Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification

This paper introduces efficient, representation-based transductive generalization bounds for graph node classification using optimal transport and Wasserstein distances, which not only correlate strongly with empirical performance but also explain the non-monotonic relationship between GNN depth and generalization error through the analysis of distributional transformations.

MoonJeong Park, Seungbeom Lee, Kyungmin Kim, Jaeseung Heo, Seunghyuk Cho, Shouheng Li, Sangdon Park, Dongwoo KimWed, 11 Ma🤖 cs.LG

Efficient Reasoning at Fixed Test-Time Cost via Length-Aware Attention Priors and Gain-Aware Training

This paper proposes a training-only framework combining a length-aware attention prior (RPA) and a gain-aware controller (Guardian) to enhance reasoning efficiency and reduce validation loss in Transformers without increasing test-time computational costs or latency.

Rian AtriWed, 11 Ma🤖 cs.LG

Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

This paper introduces Test-Time Control (TTC), a hardware-efficient neural layer that embeds finite-horizon optimal control planning directly into pretrained LLMs via a symplectic LQR solver, significantly boosting mathematical reasoning performance without requiring test-time training.

Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, René VidalWed, 11 Ma🤖 cs.LG

Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation

This paper proposes \texttt{RQRE-OVI}, an optimistic value iteration algorithm that computes the unique and smooth Risk-Sensitive Quantal Response Equilibrium (RQRE) in general-sum Markov games with linear function approximation, offering a principled trade-off between performance and robustness that outperforms traditional Nash equilibrium approaches in both theoretical guarantees and empirical stability.

Jake Gonzales, Max Horwitz, Eric Mazumdar, Lillian J. RatliffWed, 11 Ma🤖 cs.LG

The Radio-Frequency Transformer for Signal Separation

This paper presents a fully data-driven signal separator using a modified SoundStream tokenizer and a transformer trained with cross-entropy loss, which achieves significant improvements in separating radio-frequency signals from non-Gaussian interference compared to conventional methods.

Egor Lifar, Semyon Savkin, Rachana Madhukara, Tejas Jayashankar, Yury Polyanskiy, Gregory W. WornellWed, 11 Ma🤖 cs.LG

$P^2$ GNN: Two Prototype Sets to boost GNN Performance

The paper introduces $P^2$ GNN, a plug-and-play technique that leverages two sets of prototypes to enrich global context and denoise local neighborhoods, thereby significantly boosting the performance of Message Passing Graph Neural Networks across diverse node recommendation and classification tasks.

Arihant Jain, Gundeep Arora, Anoop Saladi, Chaosheng DongWed, 11 Ma🤖 cs.LG

Better Bounds for the Distributed Experts Problem

This paper presents an improved distributed protocol for the distributed experts problem that achieves a specific regret bound while significantly reducing communication costs compared to previous work.

David P. Woodruff, Samson ZhouWed, 11 Ma🤖 cs.LG

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

This paper introduces DCPO, a framework that resolves the inherent gradient conflict between accuracy and calibration in Reinforcement Learning from Verifiable Rewards by decoupling reasoning and confidence objectives, thereby achieving state-of-the-art calibration performance without compromising model accuracy.

Zhengzhao Ma, Xueru Wen, Boxi Cao, Yaojie Lu, Hongyu Lin, Jinglin Yang, Min He, Xianpei Han, Le SunWed, 11 Ma🤖 cs.LG

Probabilistic Hysteresis Factor Prediction for Electric Vehicle Batteries with Graphite Anodes Containing Silicon

This paper proposes a data-driven framework that harmonizes heterogeneous driving cycle data and employs statistical and deep learning models to enable efficient, probabilistic prediction of voltage hysteresis factors in silicon-graphite anode batteries, thereby improving state-of-charge estimation and generalizability across different vehicle models.

Runyao Yu, Viviana Kleine, Philipp Gromotka, Thomas Rudolf, Adrian Eisenmann, Gautham Ram Chandra Mouli, Peter Palensky, Jochen L. CremerWed, 11 Ma🤖 cs.LG

Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms

This paper identifies and theoretically proves that unmasked policy gradient algorithms systematically suppress valid actions at unvisited states due to parameter sharing and gradient propagation, a failure mode that action masking avoids and that can be mitigated in unmasked settings through feasibility classification.

Renos Zabounidis, Roy Siegelmann, Mohamad Qadri, Woojun Kim, Simon Stepputtis, Katia P. SycaraWed, 11 Ma🤖 cs.LG

PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing

This paper proposes a Reconfigurable Intelligent Surface (RIS)-aided semantic-aware Vehicle Edge Computing framework that utilizes a Proximal Policy Optimization (PPO) and Linear Programming (LP) hybrid scheme to jointly optimize offloading ratios, semantic symbols, and RIS phase shifts, achieving a 40–50% reduction in end-to-end latency compared to existing methods.

Wei Feng, Jingbo Zhang, Qiong Wu, Pingyi Fan, Qiang FanWed, 11 Ma🤖 cs.LG

← Previous Next →

cs.LG