Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale

This paper introduces the Robot Control Stack (RCS), a lean and modular software ecosystem designed to bridge the gap between large-scale Vision-Language-Action model training and real-world robot deployment by unifying simulation and physical control, while validating its effectiveness through extensive evaluations of policies like Octo, OpenVLA, and Pi Zero.

Tobias Jülg, Pierre Krack, Seongjin Bien, Yannik Blei, Khaled Gamal, Ken Nakahara, Johannes Hechtl, Roberto Calandra, Wolfram Burgard, Florian Walter2026-03-11🤖 cs.LG

VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment

The paper proposes VLCE, a knowledge-enhanced framework that integrates external semantic knowledge from ConceptNet and WordNet into a two-stage vision-language pipeline to generate more accurate, domain-specific, and actionable image descriptions for disaster assessment, outperforming general-purpose models on satellite and UAV benchmarks.

Md. Mahfuzur Rahman, Kishor Datta Gupta, Marufa Kamal + 5 more2026-03-11🤖 cs.LG

ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse

This paper introduces ZeroSiam, an efficient asymmetric Siamese architecture that prevents model collapse during test-time entropy minimization by employing asymmetric divergence alignment, thereby enhancing adaptation and reasoning performance across diverse vision and language tasks with negligible overhead.

Guohao Chen, Shuaicheng Niu, Deyu Chen, Jiahao Yang, Zitian Zhang, Mingkui Tan, Pengcheng Wu, Zhiqi Shen2026-03-11🤖 cs.LG

Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition

This paper introduces General Policy Composition (GPC), a training-free method that enhances diffusion and flow-based robot policies by theoretically and empirically demonstrating that convexly combining the distributional scores of multiple pre-trained policies at test time yields superior performance and adaptability across diverse tasks.

Jiahang Cao, Yize Huang, Hanzhong Guo, Rui Zhang, Mu Nan, Weijian Mai, Jiaxu Wang, Hao Cheng, Jingkai Sun, Gang Han, Wen Zhao, Qiang Zhang, Yijie Guo, Qihao Zheng, Chunfeng Song, Xiao Li, Ping Luo, Andrew F. Luo2026-03-11🤖 cs.LG

Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking

This paper proposes a hybrid control framework that combines Deep Reinforcement Learning (DRL) with robust model-independent bounded extremum seeking to enhance the stability and adaptability of controlling nonlinear time-varying systems, demonstrating its effectiveness through numerical simulations and the automatic tuning of a particle accelerator.

Shaifalee Saxena, Alan Williams, Rafael Fierro, Alexander Scheinker2026-03-11🤖 cs.LG

Latent Speech-Text Transformer

The Latent Speech-Text Transformer (LST) improves the efficiency and performance of auto-regressive speech-text models by aggregating speech tokens into latent patches, which aligns sequence granularity with text, reduces computational costs, and achieves significant accuracy gains across speech and text benchmarks.

Yen-Ju Lu, Yashesh Gaur, Wei Zhou, Benjamin Muller, Jesus Villalba, Najim Dehak, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Srinivasan Iyer, Duc Le2026-03-11🤖 cs.AI

AlphaApollo: A System for Deep Agentic Reasoning

AlphaApollo is an agentic reasoning system that enhances foundation models' performance on complex, long-horizon tasks by orchestrating multi-turn agentic reasoning, turn-level reinforcement learning for tool-use optimization, and a propose-judge-update evolution loop with verification.

Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Tian Cheng, Jianghangfan Zhang, Tangyu Jiang, Linrui Xu, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han2026-03-11🤖 cs.AI

Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels

This paper addresses the challenge of LiDAR-based 3D semantic segmentation under noisy labels and domain shifts by introducing the DGLSS-NL task, establishing a new benchmark, and proposing DuNe, a dual-view framework that achieves state-of-the-art robustness across multiple datasets.

Weitong Kong, Zichao Zeng, Di Wen, Jiale Wei, Kunyu Peng, June Moh Goo, Jan Boehm, Rainer Stiefelhagen2026-03-11🤖 cs.LG

RECODE: Reasoning Through Code Generation for Visual Question Answering

The paper introduces RECODE, an agentic framework that enhances visual question answering by reverse-engineering structured visuals into executable code through iterative generation and selection, thereby transforming ambiguous perceptual tasks into verifiable symbolic reasoning problems that significantly outperform existing methods.

Junhong Shen, Mu Cai, Bo Hu, Ameet Talwalkar, David A Ross, Cordelia Schmid, Alireza Fathi2026-03-11🤖 cs.AI

RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning

RL-100 is a unified real-world reinforcement learning framework that combines diffusion visuomotor policies with a clipped PPO objective and consistency distillation to achieve 100% success across 1,000 diverse robotic manipulation trials, matching or surpassing human experts while demonstrating robust zero-shot generalization and continuous deployment in dynamic environments.

Kun Lei, Huanyu Li, Dongjie Yu, Zhenyu Wei, Lingxiao Guo, Zhennan Jiang, Ziyu Wang, Shiyu Liang, Huazhe Xu2026-03-11🤖 cs.AI

Bradley-Terry Policy Optimization for Generative Preference Modeling

This paper introduces Bradley-Terry Policy Optimization (BTPO), a novel framework that derives a consistent Monte Carlo gradient estimator to effectively train large language models with chain-of-thought reasoning on non-verifiable pairwise preference tasks, overcoming the limitations of existing heuristic RL approaches.

Shengyu Feng, Yun He, Shuang Ma, Beibin Li, Yuanhao Xiong, Songlin Li, Karishma Mandyam, Julian Katz-Samuels, Shengjie Bi, Licheng Yu, Hejia Zhang, Karthik Abinav Sankararaman, Han Fang, Yiming Yang, Manaal Faruqui2026-03-11🤖 cs.LG

From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors

FALCON addresses the spatial reasoning limitations of existing 2D-based vision-language-action models by leveraging spatial foundation models to inject rich 3D geometric priors directly into the action head, achieving state-of-the-art performance across diverse simulation and real-world tasks without requiring architectural changes or specialized sensors.

Zhengshen Zhang, Hao Li, Yalun Dai, Zhengbang Zhu, Lei Zhou, Chenchen Liu, Dong Wang, Francis E. H. Tay, Sijin Chen, Ziwei Liu, Yuxiao Liu, Xinghang Li, Pan Zhou2026-03-11🤖 cs.AI

GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation

The paper proposes GraphKeeper, a novel framework for Graph Domain-Incremental Learning that addresses catastrophic forgetting through knowledge disentanglement and deviation-free preservation, achieving state-of-the-art performance across multiple graph domains while remaining compatible with various graph foundation models.

Zihao Guo, Qingyun Sun, Ziwei Zhang, Haonan Yuan, Huiping Zhuang, Xingcheng Fu, Jianxin Li2026-03-11🤖 cs.AI

Structured Matrix Scaling for Multi-Class Calibration

This paper proposes a structured matrix scaling approach for multi-class calibration that leverages theoretical insights from logistic regression, combined with structured regularization and robust optimization, to effectively manage the bias-variance tradeoff and achieve substantial performance gains over existing methods while providing an open-source implementation.

Eugène Berta, David Holzmüller, Michael I. Jordan, Francis Bach2026-03-11🤖 cs.AI