Robust Post-Training for Generative Recommenders: Why Exponential Reward-Weighted SFT Outperforms RLHF

This paper proposes and validates exponential reward-weighted SFT as a robust, fully offline post-training method for generative recommenders that eliminates reward hacking and propensity score requirements while offering theoretical guarantees and a controllable tradeoff between robustness and performance.

Keertana Chidambaram, Sanath Kumar Krishnamurthy, Qiuling Xu, Ko-Jen Hsiao, Moumita Bhattacharya2026-03-12🤖 cs.LG

Quantum entanglement provides a competitive advantage in adversarial games

This study demonstrates that quantum entanglement serves as a functional resource in competitive reinforcement learning, enabling hybrid quantum-classical agents trained on the game Pong to consistently outperform separable quantum circuits and match or exceed classical baselines by learning structurally distinct features that better model dynamic agent interactions.

Peiyong Wang, Kieran Hymas, James Quach2026-03-12⚛️ quant-ph

How to make the most of your masked language model for protein engineering

This paper introduces a flexible stochastic beam search sampling method for masked language models that optimizes protein properties by evaluating entire-sequence neighborhoods, demonstrating through extensive in silico and in vitro antibody engineering experiments that the choice of sampling strategy is at least as critical as the model itself.

Calvin McCarter, Nick Bhattacharya, Sebastian W. Ober, Hunter Elliott2026-03-12🧬 q-bio

Data-Driven Integration Kernels for Interpretable Nonlocal Operator Learning

This paper introduces a data-driven integration kernel framework that enhances the interpretability and efficiency of nonlocal operator learning in climate modeling by separating nonlocal information aggregation via learnable weighting functions from local nonlinear prediction, thereby achieving competitive performance with fewer parameters and clearer physical insights.

Savannah L. Ferretti, Jerry Lin, Sara Shamekh, Jane W. Baldwin, Michael S. Pritchard, Tom Beucler2026-03-12🤖 cs.LG

Federated Active Learning Under Extreme Non-IID and Global Class Imbalance

This paper introduces FairFAL, an adaptive federated active learning framework that leverages lightweight prediction discrepancy and prototype-guided pseudo-labeling to dynamically select between global and local query models, effectively addressing the challenges of extreme non-IID data and global class imbalance to achieve superior performance over state-of-the-art methods.

Chen-Chen Zong, Sheng-Jun Huang2026-03-12🤖 cs.LG

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

This paper addresses the fixed-budget best-arm identification problem in non-stationary linear bandits by establishing a tighter, arm-set-dependent lower bound on error probability and proposing the Adjacent-BAI\textsf{Adjacent-BAI} algorithm, which utilizes an Adjacent-optimal design to achieve minimax-optimal performance that fully leverages the geometric structure of the arm set.

Leo Maynard-Zhang, Zhihan Xiong, Kevin Jamieson, Maryam Fazel2026-03-12📊 stat

Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning

This paper introduces Causal Concept Graphs (CCG), a framework that combines task-conditioned sparse autoencoders with differentiable structure learning to map causal dependencies between interpretable latent features in LLMs, demonstrating through the Causal Fidelity Score that graph-guided interventions significantly enhance stepwise reasoning performance compared to existing tracing and random baselines.

Md Muntaqim Meherab, Noor Islam S. Mohammad, Faiza Feroz2026-03-12🤖 cs.LG