CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval

The paper proposes CMMR-VLN, a vision-and-language navigation framework that enhances large language model agents with structured multimodal memory retrieval and reflection-based updates to selectively leverage prior experiences, significantly improving performance in long-horizon and unfamiliar scenarios compared to existing methods.

Haozhou Li, Xiangyu Dong, Huiyan Jiang, Yaoming Zhou, Xiaoguang Ma2026-03-10💻 cs

Dual-Horizon Hybrid Internal Model for Low-Gravity Quadrupedal Jumping with Hardware-in-the-Loop Validation

This paper introduces a Dual-Horizon Hybrid Internal Model that enables stable, continuous quadrupedal jumping under lunar gravity using only proprioceptive sensing, validated through the MATRIX hardware-in-the-loop testbed which emulates reduced gravity and lunar terrain in real time.

Haozhe Xu, Yifei Zhao, Wenhao Feng, Zhipeng Wang, Hongrui Sang, Cheng Cheng, Xiuxian Li, Zhen Yin, Bin He2026-03-10💻 cs

SafarDB: FPGA-Accelerated Distributed Transactions via Replicated Data Types

SafarDB is a novel FPGA-accelerated distributed transaction system that co-designs a network-attached replication engine with a custom FPGA network interface to achieve significantly lower latency and higher throughput for both Conflict-Free and Well-coordinated Replicated Data Types compared to state-of-the-art RDMA-based implementations.

Javad Saberlatibari, Prithviraj Yuvaraj, Mohsen Lesani, Philip Brisk, Mohammad Sadoghi2026-03-10💻 cs

ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation

This paper proposes the ViSA-enhanced framework, a triple-phase collaborative architecture that leverages structured visual prompting to enable Vision-Language Models to perform direct spatial reasoning on image planes, achieving a 70.3% improvement in success rate over state-of-the-art aerial Vision-Language Navigation methods on the CityNav benchmark.

Haoyu Tong, Xiangyu Dong, Xiaoguang Ma, Haoran Zhao, Yaoming Zhou, Chenghao Lin2026-03-10💻 cs

It's Time to Get It Right: Improving Analog Clock Reading and Clock-Hand Spatial Reasoning in Vision-Language Models

This paper addresses the significant challenge of analog clock reading in state-of-the-art Vision-Language Models by introducing the real-world, diverse TickTockVQA dataset and the Swap-DPO fine-tuning framework, which together substantially improve spatial-temporal reasoning and accuracy under complex visual conditions.

Jaeha Choi, Jin Won Lee, Siwoo You, Jangho Lee2026-03-10💻 cs

Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing Infrared

This paper proposes "Missing No More," a novel dictionary-guided framework that addresses the challenge of missing infrared modality in image fusion by learning a shared convolutional dictionary to enable interpretable coefficient-domain inference and fusion, thereby avoiding uncontrolled pixel-space generation while improving perceptual quality and downstream detection performance.

Yafei Zhang, Meng Ma, Huafeng Li, Yu Liu2026-03-10💻 cs

Vector Field Augmented Differentiable Policy Learning for Vision-Based Drone Racing

This paper introduces DiffRacing, a novel framework that enhances differentiable policy learning for vision-based drone racing by integrating vector fields to provide stable gradient signals for balancing high-speed gate traversal with obstacle avoidance, while employing a differentiable Delta Action Model to enable robust sim-to-real transfer without explicit system identification.

Yang Su, Feng Yu, Yu Hu, Xinze Niu, Linzuo Zhang, Fangyu Sun, Danping Zou2026-03-10💻 cs

Controllable Complex Human Motion Video Generation via Text-to-Skeleton Cascades

This paper proposes a two-stage cascaded framework that generates controllable complex human motion videos by first using an autoregressive model to synthesize 2D skeleton sequences from text descriptions and then employing a pose-conditioned diffusion model with adaptive layer fusion to render high-fidelity videos, supported by a new synthetic dataset designed to overcome limitations in existing benchmarks.

Ashkan Taghipour, Morteza Ghahremani, Zinuo Li, Hamid Laga, Farid Boussaid, Mohammed Bennamoun2026-03-10💻 cs

QualiTeacher: Quality-Conditioned Pseudo-Labeling for Real-World Image Restoration

QualiTeacher introduces a novel framework for real-world image restoration that transforms imperfect pseudo-labels into conditional supervisory signals by explicitly conditioning the student model on estimated label quality, thereby enabling the learning of a quality-graded restoration manifold that avoids artifact mimicry and achieves superior generalization.

Fengyang Xiao, Jingjia Feng, Peng Hu, Dingming Zhang, Lei Xu, Guanyi Qin, Lu Li, Chunming He, Sina Farsiu2026-03-10💻 cs

Solution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout

This paper presents a robust multimodal framework for the 10th ABAW Expression Recognition Challenge that utilizes a dual-branch Transformer with safe cross-attention and modality dropout to dynamically fuse audio and visual data, effectively addressing partial occlusions, missing modalities, and class imbalance to achieve 60.79% accuracy on the Aff-Wild2 validation set.

Jun Yu, Naixiang Zheng, Guoyuan Wang, Yunxiang Zhang, Lingsi Zhu, Jiaen Liang, Wei Huang, Shengping Liu2026-03-10💻 cs

Samyama: A Unified Graph-Vector Database with In-Database Optimization, Agentic Enrichment, and Hardware Acceleration

This paper introduces Samyama, a high-performance, unified graph-vector database written in Rust that integrates persistent storage, vector indexing, native optimization solvers, and agentic LLM enrichment into a single engine, achieving competitive throughput and latency on commodity hardware while offering GPU-accelerated enterprise features.

Madhulatha Mandarapu, Sandeep Kunkunuru2026-03-10💻 cs

Distributed Coordination Algorithms with Efficient Communication for Open Multi-Agent Systems with Dynamic Communication Links and Processing Delays

This paper proposes and analyzes three communication-efficient distributed algorithms that achieve finite-time quantized average consensus in open multi-agent systems with dynamic directed links, arbitrary bounded processing delays, and continuous node turnover, while establishing novel topological conditions for convergence and demonstrating superior performance through simulations.

Jiaqi Hu, Karl H. Johansson, Apostolos I. Rikos2026-03-10💻 cs