CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval

The paper proposes CMMR-VLN, a vision-and-language navigation framework that enhances large language model agents with structured multimodal memory retrieval and reflection-based updates to selectively leverage prior experiences, significantly improving performance in long-horizon and unfamiliar scenarios compared to existing methods.

Haozhou Li, Xiangyu Dong, Huiyan Jiang, Yaoming Zhou, Xiaoguang Ma2026-03-10💻 cs

Dual-Horizon Hybrid Internal Model for Low-Gravity Quadrupedal Jumping with Hardware-in-the-Loop Validation

This paper introduces a Dual-Horizon Hybrid Internal Model that enables stable, continuous quadrupedal jumping under lunar gravity using only proprioceptive sensing, validated through the MATRIX hardware-in-the-loop testbed which emulates reduced gravity and lunar terrain in real time.

Haozhe Xu, Yifei Zhao, Wenhao Feng, Zhipeng Wang, Hongrui Sang, Cheng Cheng, Xiuxian Li, Zhen Yin, Bin He2026-03-10💻 cs

SafarDB: FPGA-Accelerated Distributed Transactions via Replicated Data Types

SafarDB is a novel FPGA-accelerated distributed transaction system that co-designs a network-attached replication engine with a custom FPGA network interface to achieve significantly lower latency and higher throughput for both Conflict-Free and Well-coordinated Replicated Data Types compared to state-of-the-art RDMA-based implementations.

Javad Saberlatibari, Prithviraj Yuvaraj, Mohsen Lesani, Philip Brisk, Mohammad Sadoghi2026-03-10💻 cs

ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation

This paper proposes the ViSA-enhanced framework, a triple-phase collaborative architecture that leverages structured visual prompting to enable Vision-Language Models to perform direct spatial reasoning on image planes, achieving a 70.3% improvement in success rate over state-of-the-art aerial Vision-Language Navigation methods on the CityNav benchmark.

Haoyu Tong, Xiangyu Dong, Xiaoguang Ma, Haoran Zhao, Yaoming Zhou, Chenghao Lin2026-03-10💻 cs

It's Time to Get It Right: Improving Analog Clock Reading and Clock-Hand Spatial Reasoning in Vision-Language Models

This paper addresses the significant challenge of analog clock reading in state-of-the-art Vision-Language Models by introducing the real-world, diverse TickTockVQA dataset and the Swap-DPO fine-tuning framework, which together substantially improve spatial-temporal reasoning and accuracy under complex visual conditions.

Jaeha Choi, Jin Won Lee, Siwoo You, Jangho Lee2026-03-10💻 cs

Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing Infrared

This paper proposes "Missing No More," a novel dictionary-guided framework that addresses the challenge of missing infrared modality in image fusion by learning a shared convolutional dictionary to enable interpretable coefficient-domain inference and fusion, thereby avoiding uncontrolled pixel-space generation while improving perceptual quality and downstream detection performance.

Yafei Zhang, Meng Ma, Huafeng Li, Yu Liu2026-03-10💻 cs

Vector Field Augmented Differentiable Policy Learning for Vision-Based Drone Racing

This paper introduces DiffRacing, a novel framework that enhances differentiable policy learning for vision-based drone racing by integrating vector fields to provide stable gradient signals for balancing high-speed gate traversal with obstacle avoidance, while employing a differentiable Delta Action Model to enable robust sim-to-real transfer without explicit system identification.

Yang Su, Feng Yu, Yu Hu, Xinze Niu, Linzuo Zhang, Fangyu Sun, Danping Zou2026-03-10💻 cs

Controllable Complex Human Motion Video Generation via Text-to-Skeleton Cascades

This paper proposes a two-stage cascaded framework that generates controllable complex human motion videos by first using an autoregressive model to synthesize 2D skeleton sequences from text descriptions and then employing a pose-conditioned diffusion model with adaptive layer fusion to render high-fidelity videos, supported by a new synthetic dataset designed to overcome limitations in existing benchmarks.

Ashkan Taghipour, Morteza Ghahremani, Zinuo Li, Hamid Laga, Farid Boussaid, Mohammed Bennamoun2026-03-10💻 cs

QualiTeacher: Quality-Conditioned Pseudo-Labeling for Real-World Image Restoration

QualiTeacher introduces a novel framework for real-world image restoration that transforms imperfect pseudo-labels into conditional supervisory signals by explicitly conditioning the student model on estimated label quality, thereby enabling the learning of a quality-graded restoration manifold that avoids artifact mimicry and achieves superior generalization.

Fengyang Xiao, Jingjia Feng, Peng Hu, Dingming Zhang, Lei Xu, Guanyi Qin, Lu Li, Chunming He, Sina Farsiu2026-03-10💻 cs

Solution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout

This paper presents a robust multimodal framework for the 10th ABAW Expression Recognition Challenge that utilizes a dual-branch Transformer with safe cross-attention and modality dropout to dynamically fuse audio and visual data, effectively addressing partial occlusions, missing modalities, and class imbalance to achieve 60.79% accuracy on the Aff-Wild2 validation set.

Jun Yu, Naixiang Zheng, Guoyuan Wang, Yunxiang Zhang, Lingsi Zhu, Jiaen Liang, Wei Huang, Shengping Liu2026-03-10💻 cs

Samyama: A Unified Graph-Vector Database with In-Database Optimization, Agentic Enrichment, and Hardware Acceleration

This paper introduces Samyama, a high-performance, unified graph-vector database written in Rust that integrates persistent storage, vector indexing, native optimization solvers, and agentic LLM enrichment into a single engine, achieving competitive throughput and latency on commodity hardware while offering GPU-accelerated enterprise features.

Madhulatha Mandarapu, Sandeep Kunkunuru2026-03-10💻 cs