Listening with the Eyes: Benchmarking Egocentric Co-Speech Grounding across Space and Time

This paper introduces EcoG-Bench, a rigorous bilingual benchmark for egocentric co-speech grounding that reveals a significant performance gap between humans and state-of-the-art MLLMs, highlighting how multimodal interface limitations rather than reasoning deficits hinder the alignment of speech with pointing gestures in situated collaboration.

Weijie Zhou, Xuantang Xiong, Zhenlin Hu, Xiaomeng Zhu, Chaoyang Zhao, Honghui Dong, Zhengyou Zhang, Ming Tang, Jinqiao Wang2026-03-10💻 cs

Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs

This paper introduces EvoStage, a novel evolutionary paradigm that leverages large language models with a stagewise, multi-agent approach and real-time feedback to overcome the limitations of black-box modeling, successfully generating algorithm designs that outperform both human experts and existing methods in complex industrial tasks like chip placement and black-box optimization.

Chen Lu, Ke Xue, Chengrui Gao, Yunqi Shi, Siyuan Xu, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou2026-03-10💻 cs

Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning

This paper introduces HILA, a Human-In-the-Loop Multi-Agent Collaboration framework that employs Dual-Loop Policy Optimization to train agents with metacognitive policies for dynamically deferring to human experts and continuously improving their reasoning capabilities, thereby overcoming the static knowledge limitations of purely autonomous systems.

Wei Yang, Defu Cao, Jiacheng Pang, Muyan Weng, Yan Liu2026-03-10💻 cs

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

VORL-EXPLORE is a hybrid learning and planning framework for multi-robot exploration in dynamic environments that couples task allocation with motion execution via a shared navigability fidelity signal, enabling adaptive arbitration between global and reactive policies to prevent bottlenecks and ensure robust, collision-free coverage.

Ning Liu, Sen Shen, Zheng Li, Sheng Liu, Dongkun Han, Shangke Lyu, Thomas Braunl2026-03-10💻 cs

OSExpert: Computer-Use Agents Learning Professional Skills via Exploration

The paper introduces OSExpert, a computer-use agent that leverages a GUI-based depth-first search exploration algorithm to discover action primitives and self-construct a skill curriculum, thereby significantly improving performance and efficiency on complex tasks to approach human expert levels.

Jiateng Liu, Zhenhailong Wang, Rushi Wang, Bingxuan Li, Jeonghwan Kim, Aditi Tiwari, Pengfei Yu, Denghui Zhang, Heng Ji2026-03-10💻 cs

Extend Your Horizon: A Device-Agnostic Surgical Tool Tracking Framework with Multi-View Optimization for Augmented Reality

This paper presents a device-agnostic surgical tool tracking framework that fuses multiple sensing modalities within a dynamic scene graph to overcome line-of-sight occlusions and enhance the robustness of augmented reality visualization in operating rooms.

Jiaming Zhang, Mingxu Liu, Hongchao Shu, Ruixing Liang, Yihao Liu, Ojas Taskar, Amir Kheradmand, Mehran Armand, Alejandro Martin-Gomez2026-03-10💻 cs

Energy-Efficient Online Scheduling for Wireless Powered Mobile Edge Computing Networks

This paper proposes an energy-efficient online scheduling framework for Wireless Powered Mobile Edge Computing networks that utilizes Lyapunov optimization and a relax-then-adjust approach to solve the joint wireless power transfer and computation offloading problem, achieving a fundamental trade-off between latency and energy consumption while ensuring theoretical performance guarantees.

Xingqiu He, Chaoqun You, Yuzhi Yang, Zihan Chen, Yuhang Shen, Tony Q. S. Quek, Yue Gao2026-03-10💻 cs

On the Feasibility and Opportunity of Autoregressive 3D Object Detection

The paper introduces AutoReg3D, an autoregressive 3D object detector that reformulates LiDAR-based detection as a sequence generation task using a near-to-far ordering to eliminate reliance on hand-crafted components like anchors and NMS, thereby achieving competitive performance while enabling the integration of advanced language model techniques such as reinforcement learning.

Zanming Huang, Jinsu Yoo, Sooyoung Jeon, Zhenzhen Liu, Mark Campbell, Kilian Q Weinberger, Bharath Hariharan, Wei-Lun Chao, Katie Z Luo2026-03-10💻 cs

SI-ChainFL: Shapley-Incentivized Secure Federated Learning for High-Speed Rail Data Sharing

This paper proposes SI-ChainFL, a secure and efficient federated learning framework for high-speed rail data sharing that combines Shapley value-based contribution incentives with a blockchain-driven decentralized aggregation protocol to mitigate free-riding and model poisoning while ensuring robust performance against malicious attacks.

Mingjie Zhao, Cheng Dai, Fei Chen, Xin Chen, Kaoru Ota, Mianxiong Dong, Bing Guo2026-03-10💻 cs

CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval

The paper proposes CMMR-VLN, a vision-and-language navigation framework that enhances large language model agents with structured multimodal memory retrieval and reflection-based updates to selectively leverage prior experiences, significantly improving performance in long-horizon and unfamiliar scenarios compared to existing methods.

Haozhou Li, Xiangyu Dong, Huiyan Jiang, Yaoming Zhou, Xiaoguang Ma2026-03-10💻 cs

Dual-Horizon Hybrid Internal Model for Low-Gravity Quadrupedal Jumping with Hardware-in-the-Loop Validation

This paper introduces a Dual-Horizon Hybrid Internal Model that enables stable, continuous quadrupedal jumping under lunar gravity using only proprioceptive sensing, validated through the MATRIX hardware-in-the-loop testbed which emulates reduced gravity and lunar terrain in real time.

Haozhe Xu, Yifei Zhao, Wenhao Feng, Zhipeng Wang, Hongrui Sang, Cheng Cheng, Xiuxian Li, Zhen Yin, Bin He2026-03-10💻 cs

SafarDB: FPGA-Accelerated Distributed Transactions via Replicated Data Types

SafarDB is a novel FPGA-accelerated distributed transaction system that co-designs a network-attached replication engine with a custom FPGA network interface to achieve significantly lower latency and higher throughput for both Conflict-Free and Well-coordinated Replicated Data Types compared to state-of-the-art RDMA-based implementations.

Javad Saberlatibari, Prithviraj Yuvaraj, Mohsen Lesani, Philip Brisk, Mohammad Sadoghi2026-03-10💻 cs

ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation

This paper proposes the ViSA-enhanced framework, a triple-phase collaborative architecture that leverages structured visual prompting to enable Vision-Language Models to perform direct spatial reasoning on image planes, achieving a 70.3% improvement in success rate over state-of-the-art aerial Vision-Language Navigation methods on the CityNav benchmark.

Haoyu Tong, Xiangyu Dong, Xiaoguang Ma, Haoran Zhao, Yaoming Zhou, Chenghao Lin2026-03-10💻 cs

It's Time to Get It Right: Improving Analog Clock Reading and Clock-Hand Spatial Reasoning in Vision-Language Models

This paper addresses the significant challenge of analog clock reading in state-of-the-art Vision-Language Models by introducing the real-world, diverse TickTockVQA dataset and the Swap-DPO fine-tuning framework, which together substantially improve spatial-temporal reasoning and accuracy under complex visual conditions.

Jaeha Choi, Jin Won Lee, Siwoo You, Jangho Lee2026-03-10💻 cs