cs papers | Gist.Science

Khelte Khelte Shikhi: A Proposed HCI Framework for Gamified Interactive Learning with Minecraft in Bangladeshi Education Systems

This paper proposes a practical, three-tiered HCI framework for deploying localized, gamified Minecraft learning in Bangladesh's resource-constrained schools, addressing critical infrastructure gaps through adaptive offline and low-power solutions while outlining specific curriculum-aligned content and evaluation benchmarks for future pilot testing.

Mohd Ruhul Ameen, Akif Islam, Momen Khandokar Ope2026-03-10💻 cs

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

This paper introduces Dream4Drive, a novel synthetic data generation framework that leverages 3D-aware guidance and a fine-tuned driving world model to create diverse, multi-view corner cases, effectively enhancing downstream perception tasks in autonomous driving without the performance gains being negated by increased training epochs.

Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang2026-03-10💻 cs

MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting

MoE-GS introduces a novel Mixture-of-Experts framework for dynamic Gaussian Splatting that utilizes a Volume-aware Pixel Router to adaptively blend heterogeneous deformation priors for superior novel view synthesis, while addressing efficiency concerns through multi-expert rendering optimizations and knowledge distillation.

In-Hwan Jin, Hyeongju Mun, Joonsoo Kim, Kugjin Yun, Kyeongbo Kong2026-03-10💻 cs

Next Generation Cloud-native In-Memory Stores: From Redis to Valkey and Beyond

This study provides a comprehensive experimental benchmark of emerging cloud-native in-memory key-value stores—including Valkey, KeyDB, and Garnet—within Kubernetes environments to evaluate their performance trade-offs, resource efficiency, and long-term viability as alternatives to Redis.

Carl-Johan Fauvelle Munck af Rosensch"old, Feras M. Awaysheh, Ahmad Awad2026-03-10💻 cs

Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions

This paper presents HCLA, a human-centered multi-agent system that enhances transparency and accountability in digital asset anomaly detection by reconstructing traceable, expert-style reasoning processes through a conversational workflow that separates evidence scoring from justification, rather than merely explaining black-box models.

Gyuyeon Na, Minjung Park, Hyeonjeong Cha, Sangmi Chai2026-03-10💻 cs

AnyPcc: Compressing Any Point Cloud with a Single Universal Model

The paper introduces AnyPcc, a universal point cloud compression framework that achieves state-of-the-art performance across diverse datasets by combining a robust Universal Context Model with an Instance-Adaptive Fine-Tuning strategy to effectively handle varying data densities and out-of-distribution scenarios.

Kangli Wang, Qianxi Yi, Yuqi Ye, Shihao Li, Wei Gao2026-03-10💻 cs

Automated Pest Counting in Water Traps through Active Robotic Stirring for Occlusion Handling

This paper proposes an automated pest counting system for water traps that utilizes a robotic arm with adaptive-speed stirring and a confidence-driven closed-loop control mechanism to effectively mitigate occlusion, significantly reducing counting errors and execution time compared to static image methods and constant-speed stirring.

Xumin Gao, Mark Stevens, Grzegorz Cielniak2026-03-10💻 cs

CountFormer: A Transformer Framework for Learning Visual Repetition and Structure in Class-Agnostic Object Counting

This paper introduces CountFormer, a transformer-based framework that leverages the DINOv2 foundation model to improve structural consistency and reduce overcounting errors in exemplar-free object counting, achieving competitive performance on the FSC-147 benchmark.

Md Tanvir Hossain, Akif Islam, Mohd Ruhul Ameen2026-03-10💻 cs

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

The paper introduces LagMemo, a novel navigation system that utilizes a language-enhanced 3D Gaussian Splatting memory to enable efficient multi-modal, open-vocabulary, and multi-goal visual navigation, demonstrating superior performance over state-of-the-art methods on the newly curated GOAT-Core benchmark.

Haotian Zhou, Xiaole Wang, He Li, Zhuo Qi, Jinrun Yin, Haiyu Kong, Jianghuan Xu, Huijing Zhao2026-03-10💻 cs

SAGE: Structure-Aware Generative Video Transitions between Diverse Clips

SAGE is a zero-shot, structure-aware generative framework that synthesizes visually coherent and motion-consistent video transitions between diverse clips by combining line maps and motion flow guidance with generative synthesis, effectively outperforming existing classical and generative methods without requiring fine-tuning or specific training data.

Mia Kan, Yilin Liu, Niloy Mitra2026-03-10💻 cs

MobiDock: Design and Control of A Modular Self Reconfigurable Bimanual Mobile Manipulator via Robotic Docking

This paper presents MobiDock, a modular self-reconfigurable bimanual mobile manipulator that utilizes an autonomous computer vision-based docking mechanism to physically unite two independent robots, thereby transforming complex multi-robot coordination into a simpler single-system control problem that significantly improves dynamic stability, precision, and operational efficiency.

Xuan-Thuan Nguyen, Khac Nam Nguyen, Ngoc Duy Tran, Thi Thoa Mac, Anh Nguyen, Hoang Hiep Ly, Tung D. Ta2026-03-10💻 cs

Vectorized Online POMDP Planning

This paper introduces VOPP, a novel vectorized online POMDP planner that eliminates synchronization bottlenecks by representing all planning data as tensors and performing fully parallelized expectation estimations, achieving a 20-fold efficiency gain over existing parallel solvers and outperforming state-of-the-art sequential methods with a 1000-fold reduction in planning budget.

Marcus Hoerger, Muhammad Sudrajat, Hanna Kurniawati2026-03-10💻 cs

Detecting AI-Generated Images via Diffusion Snap-Back Reconstruction: A Forensic Approach

This paper proposes a forensic method called "diffusion snap-back reconstruction," which detects AI-generated images by analyzing how perceptual similarity metrics change when an image is perturbed and reconstructed by a diffusion model, achieving high accuracy (AUROC of 0.993) and robustness against common distortions without relying on traditional pixel-level artifacts.

Mohd Ruhul Ameen, Akif Islam2026-03-10💻 cs

PhantomFetch: Obfuscating Loads against Prefetcher Side-Channel Attacks

This paper introduces PhantomFetch, a hardware-agnostic defense that secures IP-stride prefetchers against side-channel attacks by obfuscating sensitive load effects to break exploitable couplings, thereby maintaining prefetching performance without requiring hardware modifications.

Xingzhi Zhang, Buyi Lv, Yimin Lu, Kai Bu2026-03-10💻 cs

MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks

This paper introduces MUGSQA, a novel framework comprising a multi-uncertainty-based Gaussian Splatting quality assessment dataset, a unified multi-distance subjective evaluation method, and two benchmarks designed to rigorously assess the robustness of reconstruction methods and the performance of existing quality metrics under varying input conditions.

Tianang Chen, Jian Jin, Shilv Cai, Zhuangzi Li, Weisi Lin2026-03-10💻 cs

Counting Through Occlusion: Framework for Open World Amodal Counting

This paper introduces CountOCC, a novel amodal counting framework that overcomes the limitations of existing methods under occlusion by hierarchically reconstructing complete object features through multimodal guidance and visual equivalence objectives, achieving state-of-the-art performance on newly established occlusion-augmented benchmarks.

Safaeid Hossain Arib, Rabeya Akter, Abdul Monaf Chowdhury, Md Jubair Ahmed Sourov, Md Mehedi Hasan2026-03-10💻 cs

Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making

The paper proposes LAMP, a language-augmented multi-agent reinforcement learning framework that employs a "Think-Speak-Decide" pipeline to integrate unstructured language with numerical data, significantly outperforming existing baselines in economic decision-making through improved cumulative returns, robustness, and interpretability.

Heyang Ma, Qirui Mi, Qipeng Yang, Zijun Fan, Bo Li, Haifeng Zhang2026-03-10💻 cs

Video2Layout: Recall and Reconstruct Metric-Grounded Cognitive Map for Spatial Reasoning

The paper proposes Video2Layout, a two-stage framework that reconstructs metric-grounded spatial layouts using continuous object boundary coordinates instead of discretized grids, thereby enhancing fine-grained spatial reasoning in Multimodal Large Language Models and achieving superior performance on spatial benchmarks.

Yibin Huang, Wang Xu, Wanyue Zhang, Helu Zhi, Jingjing Huang, Yangbin Xu, Yangang Sun, Conghui Zhu, Tiejun Zhao2026-03-10💻 cs

Multi-Order Matching Network for Alignment-Free Depth Super-Resolution

This paper proposes the Multi-Order Matching Network (MOMNet), an alignment-free framework that achieves state-of-the-art depth super-resolution by adaptively retrieving and integrating misaligned RGB information through a novel multi-order matching and aggregation mechanism.

Zhengxue Wang, Zhiqiang Yan, Yuan Wu, Guangwei Gao, Xiang Li, Jian Yang2026-03-10💻 cs

Learning to Think Fast and Slow for Visual Language Models

This paper introduces DualMindVLM, a visual language model that leverages a dual-mode thinking mechanism to dynamically select between fast, intuitive responses and slow, deliberate reasoning based on problem complexity, thereby achieving state-of-the-art performance with significantly improved token efficiency.

Chenyu Lin, Cheng Chi, Jinlin Wu, Sharon Li, Kaiyang Zhou2026-03-10💻 cs

← Previous Next →