Khelte Khelte Shikhi: A Proposed HCI Framework for Gamified Interactive Learning with Minecraft in Bangladeshi Education Systems

This paper proposes a practical, three-tiered HCI framework for deploying localized, gamified Minecraft learning in Bangladesh's resource-constrained schools, addressing critical infrastructure gaps through adaptive offline and low-power solutions while outlining specific curriculum-aligned content and evaluation benchmarks for future pilot testing.

Mohd Ruhul Ameen, Akif Islam, Momen Khandokar Ope2026-03-10💻 cs

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

This paper introduces Dream4Drive, a novel synthetic data generation framework that leverages 3D-aware guidance and a fine-tuned driving world model to create diverse, multi-view corner cases, effectively enhancing downstream perception tasks in autonomous driving without the performance gains being negated by increased training epochs.

Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang2026-03-10💻 cs

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

The paper introduces LagMemo, a novel navigation system that utilizes a language-enhanced 3D Gaussian Splatting memory to enable efficient multi-modal, open-vocabulary, and multi-goal visual navigation, demonstrating superior performance over state-of-the-art methods on the newly curated GOAT-Core benchmark.

Haotian Zhou, Xiaole Wang, He Li, Zhuo Qi, Jinrun Yin, Haiyu Kong, Jianghuan Xu, Huijing Zhao2026-03-10💻 cs

MobiDock: Design and Control of A Modular Self Reconfigurable Bimanual Mobile Manipulator via Robotic Docking

This paper presents MobiDock, a modular self-reconfigurable bimanual mobile manipulator that utilizes an autonomous computer vision-based docking mechanism to physically unite two independent robots, thereby transforming complex multi-robot coordination into a simpler single-system control problem that significantly improves dynamic stability, precision, and operational efficiency.

Xuan-Thuan Nguyen, Khac Nam Nguyen, Ngoc Duy Tran, Thi Thoa Mac, Anh Nguyen, Hoang Hiep Ly, Tung D. Ta2026-03-10💻 cs

Detecting AI-Generated Images via Diffusion Snap-Back Reconstruction: A Forensic Approach

This paper proposes a forensic method called "diffusion snap-back reconstruction," which detects AI-generated images by analyzing how perceptual similarity metrics change when an image is perturbed and reconstructed by a diffusion model, achieving high accuracy (AUROC of 0.993) and robustness against common distortions without relying on traditional pixel-level artifacts.

Mohd Ruhul Ameen, Akif Islam2026-03-10💻 cs

MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks

This paper introduces MUGSQA, a novel framework comprising a multi-uncertainty-based Gaussian Splatting quality assessment dataset, a unified multi-distance subjective evaluation method, and two benchmarks designed to rigorously assess the robustness of reconstruction methods and the performance of existing quality metrics under varying input conditions.

Tianang Chen, Jian Jin, Shilv Cai, Zhuangzi Li, Weisi Lin2026-03-10💻 cs

Counting Through Occlusion: Framework for Open World Amodal Counting

This paper introduces CountOCC, a novel amodal counting framework that overcomes the limitations of existing methods under occlusion by hierarchically reconstructing complete object features through multimodal guidance and visual equivalence objectives, achieving state-of-the-art performance on newly established occlusion-augmented benchmarks.

Safaeid Hossain Arib, Rabeya Akter, Abdul Monaf Chowdhury, Md Jubair Ahmed Sourov, Md Mehedi Hasan2026-03-10💻 cs

Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making

The paper proposes LAMP, a language-augmented multi-agent reinforcement learning framework that employs a "Think-Speak-Decide" pipeline to integrate unstructured language with numerical data, significantly outperforming existing baselines in economic decision-making through improved cumulative returns, robustness, and interpretability.

Heyang Ma, Qirui Mi, Qipeng Yang, Zijun Fan, Bo Li, Haifeng Zhang2026-03-10💻 cs

Video2Layout: Recall and Reconstruct Metric-Grounded Cognitive Map for Spatial Reasoning

The paper proposes Video2Layout, a two-stage framework that reconstructs metric-grounded spatial layouts using continuous object boundary coordinates instead of discretized grids, thereby enhancing fine-grained spatial reasoning in Multimodal Large Language Models and achieving superior performance on spatial benchmarks.

Yibin Huang, Wang Xu, Wanyue Zhang, Helu Zhi, Jingjing Huang, Yangbin Xu, Yangang Sun, Conghui Zhu, Tiejun Zhao2026-03-10💻 cs