cs papers | Gist.Science

Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness

This paper proposes a contact-safe reinforcement learning framework that combines Proximal Policy Optimization with movement primitives and an energy-aware Cartesian impedance controller to generate robust, safe, and energy-efficient task-space trajectories for complex contact-rich manipulation in 3D environments.

Bingkun Huang, Yuhe Gong, Zewen Yang, Tianyu Ren, Luis Figueredo2026-03-09💻 cs

Symmetry-Breaking in Multi-Agent Navigation: Winding Number-Aware MPC with a Learned Topological Strategy

This paper introduces WNumMPC, a hierarchical multi-agent navigation framework that combines a reinforcement learning-based planner and a model-based controller to resolve symmetry-induced deadlocks in dense environments by leveraging topological winding numbers for robust, communication-free coordination.

Tomoki Nakao, Kazumi Kasaura, Tadashi Kozuno2026-03-09💻 cs

FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

The paper introduces FunnyNodules, a fully parameterized synthetic dataset of lung nodule-like shapes with controllable visual attributes and known decision rules, designed to systematically evaluate and benchmark explainable AI models by verifying whether they learn correct attribute-target relations and align their attention with relevant diagnostic features.

Luisa Gallée, Yiheng Xiong, Meinrad Beer, Michael Götz2026-03-09💻 cs

Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers

Bi-AQUA is a novel bilateral control-based imitation learning framework for underwater robot arms that integrates transformer-based action chunking with explicit lighting modeling to achieve robust performance in challenging, variable illumination conditions.

Takeru Tsunoori, Masato Kobayashi, Yuki Uranishi2026-03-09💻 cs

UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification

This paper introduces the Unified Attention-Mamba (UAM) backbone, a flexible architecture that seamlessly integrates Attention and Mamba modules without manual tuning, achieving state-of-the-art performance in both tumor cell classification and image segmentation tasks on public benchmarks.

Taixi Chen, Jingyun Chen, Nancy Guo2026-03-09💻 cs

EchoVLA: Synergistic Declarative Memory for VLA-Driven Mobile Manipulation

EchoVLA is a memory-enhanced Vision-Language-Action model for mobile manipulation that synergizes scene and episodic declarative memories to improve navigation and task performance, validated by the new MoMani benchmark and demonstrating significant gains over existing baselines in both simulation and real-world settings.

Min Lin, Xiwen Liang, Bingqian Lin, Liu Jingzhi, Zijian Jiao, Kehan Li, Yu Sun, Weijia Liufu, Yuhan Ma, Yuecheng Liu, Shen Zhao, Yuzheng Zhuang, Xiaodan Liang2026-03-09💻 cs

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

SyncMV4D is a novel framework that overcomes the limitations of single-view and data-hungry 3D methods by introducing a Multi-view Joint Diffusion model and a Diffusion Points Aligner to simultaneously generate synchronized, realistic multi-view hand-object interaction videos and globally aligned 4D metric motions through a closed-loop coupling of visual appearance and dynamic geometry.

Lingwei Dang, Zonghan Li, Juntong Li, Hongwen Zhang, Liang An, Yebin Liu, Qingyao Wu2026-03-09💻 cs

Safe Autonomous Lane Changing: Planning with Dynamic Risk Fields and Time-Varying Convex Space Generation

This paper proposes a novel autonomous lane-changing planning framework that integrates dynamic risk fields with time-varying convex feasible spaces and a constrained iLQR solver to achieve safe, efficient, and comfortable trajectories that outperform traditional methods in complex traffic scenarios.

Yijun Lu, Zhihao Lin, Zhen Tian2026-03-09💻 cs

Reversible Inversion for Training-Free Exemplar-guided Image Editing

This paper introduces ReInversion, a training-free exemplar-guided image editing method that employs a two-stage reversible denoising process and a Mask-Guided Selective Denoising strategy to achieve state-of-the-art performance with minimal computational overhead.

Yuke Li, Lianli Gao, Ji Zhang, Pengpeng Zeng, Lichuan Xiang, Hongkai Wen, Heng Tao Shen, Jingkuan Song2026-03-09💻 cs

A method for tissue-mask supported whole-body image registration in the UK Biobank

This paper presents a sex-stratified whole-body MR image registration method for the UK Biobank that leverages subcutaneous adipose tissue and muscle masks to significantly outperform existing intensity-based and deep learning approaches in anatomical alignment and correlation analysis accuracy.

Yasemin Utkueri, Elin Lundström, Håkan Ahlström, Johan Öfverstedt, Joel Kullberg2026-03-09💻 cs

UniTS: Unified Spatio-Temporal Generative Model for Remote Sensing

This paper introduces UniTS, a unified spatio-temporal generative model based on flow matching and diffusion transformers that integrates tasks like cloud removal, change detection, and forecasting into a single framework, significantly outperforming specialized models under challenging conditions.

Yuxiang Zhang, Shunlin Liang, Wenyuan Li, Han Ma, Jianglei Xu, Yichuan Ma, Jiangwei Xie, Wei Li, Mengmeng Zhang, Ran Tao, Xiang-Gen Xia2026-03-09💻 cs

Safe Model Predictive Diffusion with Shielding

This paper introduces Safe Model Predictive Diffusion (Safe MPD), a training-free planning framework that integrates a safety shield directly into the diffusion denoising process to generate kinodynamically feasible and safe trajectories in real-time, outperforming existing methods in success rate and safety without requiring post-processing corrections.

Taekyung Kim, Keyvan Majd, Hideki Okamoto, Bardh Hoxha, Dimitra Panagou, Georgios Fainekos2026-03-09💻 cs

Fast-BEV++: Fast by Algorithm, Deployable by Design

Fast-BEV++ is a vision-only Bird's-Eye-View perception framework that resolves the trade-off between accuracy and deployment efficiency by employing a hardware-oriented, kernel-free architecture to achieve a new state-of-the-art 0.488 NDS on nuScenes while delivering real-time inference at over 134 FPS.

Yuanpeng Chen, Hui Song, Sheng Yang, Wei Tao, Shanhui Mo, Shuang Zhang, Xiao Hua, Tiankun Zhao2026-03-09💻 cs

Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail Enhancement

Photo3D is a framework that advances photorealistic 3D generation by leveraging GPT-4o-Image data within a structure-aligned multi-view synthesis pipeline to create detail-enhanced datasets, thereby enabling realistic texture refinement while preserving geometric consistency across diverse 3D-native generators.

Xinyue Liang, Zhinyuan Ma, Lingchen Sun, Yanjun Guo, Lei Zhang2026-03-09💻 cs

Modular Neural Image Signal Processing

This paper introduces a modular, fully learning-based neural image signal processing (ISP) framework that offers unprecedented control over intermediate rendering stages to enhance scalability, generalization, and flexibility, enabling a user-interactive photo-editing tool capable of unlimited post-editable re-rendering with competitive performance across multiple test sets.

Mahmoud Afifi, Zhongling Wang, Ran Zhang, Michael S. Brown2026-03-09💻 cs

UniCoR: Modality Collaboration for Robust Cross-Language Hybrid Code Retrieval

UniCoR is a novel self-supervised framework that addresses the challenges of insufficient semantic understanding, inefficient modality fusion, and weak cross-language generalization in hybrid code retrieval by employing multi-perspective supervised contrastive learning and representation distribution consistency, thereby achieving state-of-the-art performance on both empirical and large-scale benchmarks.

Yang Yang, Li Kuang, Jiakun Liu, Zhongxin Liu, Yingjie Xia, David Lo2026-03-09💻 cs

Towards Scalable Pre-training of Visual Tokenizers for Generation

This paper introduces VTP, a unified pre-training framework that optimizes visual tokenizers through joint image-text contrastive, self-supervised, and reconstruction losses to shift the latent space focus from low-level pixel accuracy to high-level semantics, thereby solving the "pre-training scaling problem" and enabling significantly improved, compute-efficient generative performance.

Jingfeng Yao, Yuda Song, Yucong Zhou, Xinggang Wang2026-03-09💻 cs

Reexamining Paradigms of End-to-End Data Movement

This paper argues that achieving high-performance end-to-end data movement requires shifting focus from raw network bandwidth to a holistic hardware-software co-design approach, introducing the "Drainage Basin Pattern" to identify and resolve bottlenecks across six critical paradigms ranging from network latency to host-side factors.

Chin Fang, Timothy Stitt, Michael J. McManus, Toshio Moriya2026-03-09✓ Author reviewed ⓘ💻 cs

SORS: A Modular, High-Fidelity Simulator for Soft Robots

This paper introduces SORS, a modular, high-fidelity simulator based on the finite element method and constrained nonlinear optimization that accurately models complex soft robot dynamics and contact interactions, effectively bridging the sim-to-real gap for prototyping and control optimization.

Manuel Mekkattu, Mike Y. Michelis, Robert K. Katzschmann2026-03-09💻 cs

Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding

This paper introduces a lightweight, pretrained history encoder that efficiently compresses long video histories into short embeddings using a frame query objective, enabling content-consistent autoregressive video generation under limited compute and memory constraints.

Lvmin Zhang, Shengqu Cai, Muyang Li, Chong Zeng, Beijia Lu, Anyi Rao, Song Han, Gordon Wetzstein, Maneesh Agrawala2026-03-09💻 cs

← Previous Next →