MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

MM-Zero is the first RL-based framework to enable Vision Language Models to self-evolve from zero data by employing a multi-role system (Proposer, Coder, and Solver) trained with Group Relative Policy Optimization to generate visual concepts, render them via code, and solve multimodal reasoning tasks without any seed images.

Zongxia Li, Hongyang Du, Chengsong Huang, Xiyang Wu, Lantao Yu, Yicheng He, Jing Xie, Xiaomin Wu, Zhichao Liu, Jiarui Zhang, Fuxiao Liu2026-03-11🤖 cs.LG

Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation

This paper proposes \texttt{RQRE-OVI}, an optimistic value iteration algorithm that computes the unique and smooth Risk-Sensitive Quantal Response Equilibrium (RQRE) in general-sum Markov games with linear function approximation, offering a principled trade-off between performance and robustness that outperforms traditional Nash equilibrium approaches in both theoretical guarantees and empirical stability.

Jake Gonzales, Max Horwitz, Eric Mazumdar, Lillian J. Ratliff2026-03-11🤖 cs.LG

Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

This paper introduces Test-Time Control (TTC), a hardware-efficient neural layer that embeds finite-horizon optimal control planning directly into pretrained LLMs via a symplectic LQR solver, significantly boosting mathematical reasoning performance without requiring test-time training.

Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, René Vidal2026-03-11🤖 cs.LG

Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification

This paper introduces efficient, representation-based transductive generalization bounds for graph node classification using optimal transport and Wasserstein distances, which not only correlate strongly with empirical performance but also explain the non-monotonic relationship between GNN depth and generalization error through the analysis of distributional transformations.

MoonJeong Park, Seungbeom Lee, Kyungmin Kim, Jaeseung Heo, Seunghyuk Cho, Shouheng Li, Sangdon Park, Dongwoo Kim2026-03-11🤖 cs.LG

DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data

This paper introduces DendroNN, a novel dendrocentric neural network that leverages non-differentiable sequence detection and a rewiring phase to efficiently classify event-based spatiotemporal data, achieving competitive accuracy with up to 4x higher energy efficiency than state-of-the-art neuromorphic hardware through a dedicated asynchronous digital architecture.

Jann Krausse, Zhe Su, Kyrus Mama, Maryada, Klaus Knobloch, Giacomo Indiveri, Jürgen Becker2026-03-11🤖 cs.AI

Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning

The paper introduces Reward-Zero, a general-purpose implicit reward mechanism that leverages language embeddings to transform natural-language task descriptions into dense, semantically grounded progress signals, thereby accelerating training, stabilizing learning, and improving generalization for reinforcement learning agents without requiring task-specific reward engineering.

Heng Zhang, Haddy Alchaer, Arash Ajoudani, Yu She2026-03-11🤖 cs.LG

Interactive 3D visualization of surface roughness predictions in additive manufacturing: A data-driven framework

This paper presents a data-driven framework that combines a multilayer perceptron trained on experimental data augmented by a conditional generative adversarial network with an interactive 3D web interface to predict and visualize surface roughness in material extrusion additive manufacturing, enabling optimized process planning and part orientation.

Engin Deniz Erkan, Elif Surer, Ulas Yaman2026-03-11🤖 cs.LG

Democratising Clinical AI through Dataset Condensation for Classical Clinical Models

This paper introduces a differentially private, zero-order optimization framework that extends dataset condensation to non-differentiable clinical models, enabling the creation of compact, privacy-preserving synthetic datasets that facilitate the democratization of clinical data sharing without compromising model utility.

Anshul Thakur, Soheila Molaei, Pafue Christy Nganjimi, Joshua Fieggen, Andrew A. S. Soltan, Danielle Belgrave, Lei Clifton, David A. Clifton2026-03-11🤖 cs.AI