cs papers | Gist.Science

GLASS: Graph and Vision-Language Assisted Semantic Shape Correspondence

GLASS is a novel unsupervised framework that establishes dense 3D shape correspondence across challenging non-isometric and inter-class scenarios by integrating geometric spectral analysis with semantic priors from vision-language foundation models, achieving state-of-the-art performance through view-consistent feature extraction, language-injected vertex descriptors, and a graph-assisted contrastive loss.

Qinfeng Xiao, Guofeng Mei, Qilong Liu, Chenyuan Yi, Fabio Poiesi, Jian Zhang, Bo Yang, Yick Kit-lun2026-03-10💻 cs

Scaling Test-Time Robustness of Vision-Language Models via Self-Critical Inference Framework

This paper proposes a Self-Critical Inference (SCI) framework that enhances the robustness of Large Vision-Language Models against language bias and sensitivity through multi-round counterfactual reasoning with textual and visual perturbations, alongside a new Dynamic Robustness Benchmark (DRBench) for model-specific evaluation.

Kaihua Tang, Jiaxin Qi, Jinli Ou, Yuhua Zheng, Jianqiang Huang2026-03-10💻 cs

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

This paper introduces Holi-Spatial, the first fully automated, large-scale, spatially-aware multimodal dataset constructed from raw video streams without human intervention, which provides 4 million high-quality 3D semantic annotations and spatial QA pairs to significantly enhance the training and performance of Vision-Language Models on spatial reasoning tasks.

Yuanyuan Gao, Hao Li, Yifei Liu, Xinhao Ji, Yuning Gong, Yuanjun Liao, Fangfu Liu, Manyuan Zhang, Yuchen Yang, Dan Xu, Xue Yang, Huaxi Huang, Hongjie Zhang, Ziwei Liu, Xiao Sun, Dingwen Zhang, Zhihang Zhong2026-03-10💻 cs

DAISS: Phase-Aware Imitation Learning for Dual-Arm Robotic Ultrasound-Guided Interventions

This paper presents DAISS, a dual-arm robotic system that utilizes a phase-aware imitation learning policy trained on high-fidelity teleoperated demonstrations to automate the complex, asymmetric bimanual coordination required for ultrasound-guided needle interventions.

Feng Li, Pei Liu, Shiting Wang, Ning Wang, Zhongliang Jiang, Nassir Navab, Yuan Bi2026-03-10💻 cs

Ref-DGS: Reflective Dual Gaussian Splatting

Ref-DGS is an efficient, rasterization-based framework that achieves state-of-the-art novel view synthesis on reflective scenes by decoupling surface geometry from specular reflections using a dual Gaussian representation and a lightweight adaptive mixing shader, thereby avoiding the high computational cost of explicit ray tracing.

Ningjing Fan, Yiqun Wang, Dongming Yan, Peter Wonka2026-03-10💻 cs

FusionRegister: Every Infrared and Visible Image Fusion Deserves Registration

This paper introduces FusionRegister, a general and efficient cross-modality registration framework guided by visual priors that directly corrects misalignment within fused infrared and visible images, thereby enhancing detail alignment and robustness without requiring extensive pre-registration.

Congcong Bian, Haolong Ma, Hui Li, Zhongwei Shen, Xiaoqing Luo, Xiaoning Song, Xiao-Jun Wu2026-03-10💻 cs

The Effect of Code Obfuscation on Human Program Comprehension

This study investigates how varying levels of code obfuscation affect human program comprehension in Python and JavaScript, revealing that while obfuscation generally increases reasoning time and reduces accuracy, its impact is non-monotonic and language-specific, with moderate deliberation improving performance and experience proving more critical within specific languages than across them.

Anh H. N. Nguyen, Jack Le, Ilse Lahnstein Coronado, Tien N. Nguyen2026-03-10💻 cs

Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers

This survey provides a comprehensive overview of memory mechanisms in autonomous LLM agents from 2022 to early 2026, formalizing a write–manage–read framework, introducing a three-dimensional taxonomy, analyzing key mechanisms and evaluation benchmarks, and outlining critical applications and future challenges.

Pengfei Du2026-03-10💻 cs

Low-Cost Teleoperation Extension for Mobile Manipulators

This paper presents an open-source, low-cost teleoperation framework for mobile bimanual manipulators that utilizes commodity hardware like smartphones and foot pedals to achieve intuitive whole-body control, demonstrating improved task performance and reduced cognitive load compared to traditional keyboard-based methods.

Danil Belov, Artem Erkhov, Yaroslav Savotin, Tatiana Podladchikova, Pavel Osinenko2026-03-10💻 cs

A Primer on Evolutionary Frameworks for Near-Field Multi-Source Localization

This paper introduces two novel model-driven evolutionary frameworks, NEMO-DE and NEEF-DE, that leverage differential evolution to perform near-field multi-source localization on continuous spherical-wave models with arbitrary array geometries, effectively overcoming the limitations of traditional grid-based subspace methods and data-dependent deep learning approaches without requiring labeled data or discretized grids.

Seyed Jalaleddin Mousavirad, Parisa Ramezani, Mattias O'Nils, Emil Björnson2026-03-10💻 cs

UniUncer: Unified Dynamic Static Uncertainty for End to End Driving

UniUncer is a lightweight, unified framework for end-to-end autonomous driving that jointly estimates and leverages uncertainty for both static map elements and dynamic agents through probabilistic regression, uncertainty-aware query fusion, and adaptive gating, thereby significantly improving trajectory accuracy and planning robustness with minimal computational overhead.

Yu Gao, Jijun Wang, Zongzheng Zhang, Anqing Jiang, Yiru Wang, Yuwen Heng, Shuo Wang, Hao Sun, Zhangfeng Hu, Hao Zhao2026-03-10💻 cs

FrameVGGT: Frame Evidence Rolling Memory for streaming VGGT

FrameVGGT addresses the unbounded memory growth in streaming Visual Geometry Transformers by introducing a frame-driven rolling explicit-memory framework that aggregates frame-level evidence into compact prototypes, enabling stable long-sequence 3D perception under strict memory budgets.

Zhisong Xu, Takeshi Oishi2026-03-10💻 cs

RoboPCA: Pose-centered Affordance Learning from Human Demonstrations for Robot Manipulation

This paper introduces RoboPCA, a pose-centered affordance learning framework that jointly predicts task-appropriate contact regions and poses from human demonstrations via the Human2Afford data curation pipeline, enabling robots to effectively manipulate objects with improved consistency and generalization across tasks and categories.

Zhanqi Xiao, Ruiping Wang, Xilin Chen2026-03-10💻 cs

Compressed-Domain-Aware Online Video Super-Resolution

This paper proposes CDA-VSR, a compressed-domain-aware online video super-resolution network that leverages motion vectors, residual maps, and frame types to achieve real-time, high-quality reconstruction with significantly reduced computational cost compared to state-of-the-art methods.

Yuhang Wang, Hai Li, Shujuan Hou, Zhetao Dong, Xiaoyao Yang2026-03-10💻 cs

Learning Context-Adaptive Motion Priors for Masked Motion Diffusion Models with Efficient Kinematic Attention Aggregation

This paper introduces the Masked Motion Diffusion Model (MMDM), a diffusion-based framework equipped with a Kinematic Attention Aggregation mechanism that learns context-adaptive motion priors to effectively reconstruct, refine, and complete 3D human motion from incomplete or noisy data.

Junkun Jiang, Jie Chen, Ho Yin Au, Jingyu Xiang2026-03-10💻 cs

C $^2$ -Explorer: Contiguity-Driven Task Allocation with Connectivity-Aware Task Representation for Decentralized Multi-UAV Exploration

C $^2$ -Explorer is a decentralized framework for multi-UAV exploration that addresses communication limitations and inefficient traversal by utilizing connectivity-aware task representation and a contiguity-driven allocation strategy, achieving significant reductions in exploration time and path length compared to state-of-the-art methods.

Xinlu Yan, Mingjie Zhang, Yuhao Fang, Yanke Sun, Jun Ma, Youmin Gong, Boyu Zhou, Jie Mei2026-03-10💻 cs

TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward

TDM-R1 introduces a novel reinforcement learning paradigm that enables few-step diffusion models to effectively incorporate non-differentiable rewards by decoupling surrogate reward learning from generator training, achieving state-of-the-art performance across various metrics and scaling to powerful models like Z-Image with only 4 inference steps.

Yihong Luo, Tianyang Hu, Weijian Luo, Jing Tang2026-03-10💻 cs

PARSE: Part-Aware Relational Spatial Modeling

The paper introduces PARSE, a framework utilizing part-level geometric relations encoded in Part-centric Assembly Graphs to resolve spatial ambiguities, which is validated through the creation of the PARSE-10K dataset and demonstrated to significantly enhance both object layout reasoning in vision-language models and the physical realism of generated 3D scenes.

Yinuo Bai, Peijun Xu, Kuixiang Shao, Yuyang Jiao, Jingxuan Zhang, Kaixin Yao, Jiayuan Gu, Jingyi Yu2026-03-10💻 cs

VoiceSHIELD-Small: Real-Time Malicious Speech Detection and Transcription

VoiceSHIELD-Small is a lightweight, real-time model built on Whisper-small that simultaneously transcribes speech and detects malicious content with 99.16% accuracy, offering a faster and more secure alternative to traditional text-based filtering for voice AI systems.

Sumit Ranjan, Sugandha Sharma, Ubaid Abbas, Puneeth N Ail2026-03-10💻 cs

YAQIN: Culturally Sensitive, Agentic AI for Mental Healthcare Support Among Muslim Women in the UK

This paper presents YAQIN, a co-designed AI application that integrates Islamic frameworks and user-centered design to provide culturally sensitive mental health support for Muslim women in the UK, addressing gaps in trust and engagement through a faith-aware chatbot and guided journaling tool.

Yasmin Zaraket, Céline Mougenot2026-03-10💻 cs

← Previous Next →

cs