Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models

This paper introduces Code-Space Response Oracles (CSRO), a novel framework that replaces black-box deep reinforcement learning oracles with Large Language Models to generate human-readable, interpretable multi-agent policies as code, achieving competitive performance while enabling the discovery of complex, explainable strategies.

Daniel Hennes, Zun Li, John Schultz, Marc Lanctot2026-03-12🤖 cs.AI

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

The paper introduces CLIPO, a method that integrates contrastive learning into policy optimization to generalize Reinforcement Learning with Verifiable Rewards (RLVR) by capturing invariant structures across correct reasoning paths, thereby mitigating hallucinations and improving the generalization and robustness of Large Language Models.

Sijia Cui, Pengyu Cheng, Jiajun Song, Yongbo Gai, Guojun Zhang, Zhechao Yu, Jianhe Lin, Xiaoxi Jiang, Guanjun Jiang2026-03-12🤖 cs.LG

ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning

This paper proposes ReMix, a novel Mixture-of-LoRAs framework that employs non-learnable routing weights and a Reinforce Leave-One-Out (RLOO) gradient estimator to prevent routing imbalance, thereby ensuring all active LoRAs contribute equally and significantly outperforming state-of-the-art parameter-efficient finetuning methods.

Ruizhong Qiu, Hanqing Zeng, Yinglong Xia, Yiwen Meng, Ren Chen, Jiarui Feng, Dongqi Fu, Qifan Wang, Jiayi Liu, Jun Xiao, Xiangjun Fan, Benyu Zhang, Hong Li, Zhining Liu, Hyunsik Yoo, Zhichen Zeng, Tianxin Wei, Hanghang Tong2026-03-12🤖 cs.LG

DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation Learning

This paper introduces DT-BEHRT, a graph-enhanced transformer model that improves predictive performance and interpretability in electronic health record analysis by explicitly modeling disease trajectories within organ systems and employing a novel pre-training strategy to capture heterogeneous medical code interactions.

Deyi Li, Zijun Yao, Qi Xu, Muxuan Liang, Lingyao Li, Zijian Xu, Mei Liu2026-03-12🤖 cs.LG

Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent

This paper establishes a general stability criterion for stochastic mirror descent algorithms to enable valid statistical inference in adaptive bandit settings, introducing regularized-EXP3 variants that simultaneously achieve minimax-optimal regret, nominal confidence interval coverage, and robustness to adversarial corruptions.

Budhaditya Halder, Ishan Sengupta, Koustav Chowdhury, Koulik Khamaru2026-03-12📊 stat

Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

This paper introduces Adaptive Activation Cancellation (AAC), a real-time, training-free inference framework that mitigates hallucinations in large language models by identifying and suppressing hallucination-associated neural activations as structured interference, thereby improving factual accuracy across multiple model scales without degrading general capabilities or fluency.

Eric Yocam, Varghese Vaidyan, Gurcan Comert, Paris Kalathas, Yong Wang, Judith L. Mwakalonge2026-03-12💬 cs.CL

Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion

This paper proposes a hybrid Hidden Markov Model that combines Laplace quantile-defined market states with a Poisson-driven jump-duration mechanism to generate synthetic equity excess growth rates that simultaneously preserve heavy-tailed distributions, volatility clustering, and realistic tail-state dwell times, outperforming standard GARCH and HMM models in joint distributional and temporal fidelity.

Abdulrahman Alswaidan, Jeffrey D. Varner2026-03-12💰 q-fin

Flexible Cutoff Learning: Optimizing Machine Learning Potentials After Training

This paper introduces Flexible Cutoff Learning (FCL), a method that trains machine learning interatomic potentials with randomly sampled cutoff radii to enable post-training optimization of per-atom cutoffs, thereby significantly reducing computational costs for specific applications without requiring retraining.

Rick Oerder (Institute for Numerical Simulation, University of Bonn, Fraunhofer Institute for Algorithms and Scientific Computing SCAI), Jan Hamaekers (Fraunhofer Institute for Algorithms and Scientific Computing SCAI)2026-03-12🔬 cond-mat.mtrl-sci

SDSR: A Spectral Divide-and-Conquer Approach for Species Tree Reconstruction

The paper introduces SDSR, a scalable spectral divide-and-conquer algorithm for species tree reconstruction that achieves up to 10-fold faster runtimes compared to standard methods while maintaining comparable accuracy under the multispecies coalescent model.

Ortal Reshef (Hebrew University of Jerusalem), Ofer Glassman (Weizmann Institute of Science), Or Zuk (Hebrew University of Jerusalem), Yariv Aizenbud (Tel Aviv University), Boaz Nadler (Weizmann Institute of Science), Ariel Jaffe (Hebrew University of Jerusalem)2026-03-12🧬 q-bio

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

This paper extends the harmonic loss framework by systematically evaluating various non-Euclidean distance metrics across vision and language models, demonstrating that cosine-based variants offer superior trade-offs in accuracy, interpretability, and sustainability compared to traditional cross-entropy and Euclidean approaches.

Maxwell Miller-Golub, Kamil Faber, Marcin Pietron, Panpan Zheng, Pasquale Minervini, Roberto Corizzo2026-03-12🤖 cs.LG