cs papers | Gist.Science

ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval

ExpReS-VLA is a specialized Vision-Language-Action model that enables rapid, memory-efficient on-device adaptation to specific robotic tasks by combining compressed experience replay, retrieval-augmented generation, and a novel contrastive loss to prevent catastrophic forgetting while significantly improving performance on both spatial and long-horizon benchmarks.

Shahram Najam Syed, Yatharth Ahuja, Arthur Jakobsson, Jeff Ichnowski2026-03-09💻 cs

CAVER: Curious Audiovisual Exploring Robot

The paper introduces CAVER, a novel robot equipped with a specialized audio-exciting end-effector, a combined audiovisual representation, and a curiosity-driven exploration algorithm that efficiently learns object correlations to improve material classification and audio-only imitation tasks.

Luca Macesanu, Boueny Folefack, Samik Singh, Ruchira Ray, Ben Abbatematteo, Roberto Martín-Martín2026-03-09💻 cs

Decoupling Bias, Aligning Distributions: Synergistic Fairness Optimization for Deepfake Detection

This paper proposes a dual-mechanism collaborative optimization framework that synergistically integrates structural fairness decoupling and global distribution alignment to enhance both inter-group and intra-group fairness in deepfake detection without compromising overall accuracy.

Feng Ding, Wenhui Yi, Yunpeng Zhou, Xinan He, Hong Rao, Shu Hu2026-03-09💻 cs

Fast polynomial computations with space constraints

This habilitation thesis investigates polynomial algorithmics under space constraints, presenting time- and space-efficient algorithms for fundamental operations and introducing the first quasi-linear algorithm for sparse polynomial interpolation while addressing hard problems in sparse factorization and divisibility.

Bruno Grenet2026-03-09💻 cs

LaxMotion: Rethinking Supervision Granularity for 3D Human Motion Generation

LaxMotion is a novel framework for 3D human motion generation that replaces precise 3D coordinate supervision with a relaxed paradigm based on global trajectories and monocular 2D cues, thereby enhancing model generalization and diversity while achieving performance comparable to fully supervised methods.

Sheng Liu, Yuanzhi Liang, Sidan Du2026-03-09💻 cs

A Topological Rewriting of Tarski's Mereogeometry

This paper presents a Coq-based formalization that extends Tarski's mereogeometry into a full topological space by proving the correspondence between mereological classes and regular open sets, thereby reducing Tarski's axiomatic system and enhancing its expressiveness for Euclidean geometric reasoning.

Patrick Barlatier, Richard Dapoigny2026-03-09💻 cs

SPARK: Jailbreaking T2V Models by Synergistically Prompting Auditory and Recontextualized Knowledge

This paper introduces SPARK, a jailbreak framework that exploits cross-modal associations in text-to-video models by combining neutral scene anchors, latent auditory triggers, and stylistic modulators to generate semantically unsafe videos that bypass safety guardrails while maintaining a benign appearance.

Zonghao Ying, Moyang Chen, Nizhang Li, Zhiqiang Wang, Wenxin Zhang, Quanchen Zou, Zonglei Jing, Aishan Liu, Xianglong Liu2026-03-09💻 cs

MRIQT: Physics-Aware Diffusion Model for Image Quality Transfer in Neonatal Ultra-Low-Field MRI

The paper introduces MRIQT, a physics-aware 3D conditional diffusion model that significantly enhances the image quality and diagnostic fidelity of portable ultra-low-field neonatal MRI by translating noisy scans into high-fidelity images comparable to high-field MRI.

Malek Al Abed, Sebiha Demir, Anne Groteklaes, Elodie Germani, Shahrooz Faghihroohi, Hemmen Sabir, Shadi Albarqouni2026-03-09💻 cs

Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness

This paper proposes a contact-safe reinforcement learning framework that combines Proximal Policy Optimization with movement primitives and an energy-aware Cartesian impedance controller to generate robust, safe, and energy-efficient task-space trajectories for complex contact-rich manipulation in 3D environments.

Bingkun Huang, Yuhe Gong, Zewen Yang, Tianyu Ren, Luis Figueredo2026-03-09💻 cs

Symmetry-Breaking in Multi-Agent Navigation: Winding Number-Aware MPC with a Learned Topological Strategy

This paper introduces WNumMPC, a hierarchical multi-agent navigation framework that combines a reinforcement learning-based planner and a model-based controller to resolve symmetry-induced deadlocks in dense environments by leveraging topological winding numbers for robust, communication-free coordination.

Tomoki Nakao, Kazumi Kasaura, Tadashi Kozuno2026-03-09💻 cs

FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

The paper introduces FunnyNodules, a fully parameterized synthetic dataset of lung nodule-like shapes with controllable visual attributes and known decision rules, designed to systematically evaluate and benchmark explainable AI models by verifying whether they learn correct attribute-target relations and align their attention with relevant diagnostic features.

Luisa Gallée, Yiheng Xiong, Meinrad Beer, Michael Götz2026-03-09💻 cs

Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers

Bi-AQUA is a novel bilateral control-based imitation learning framework for underwater robot arms that integrates transformer-based action chunking with explicit lighting modeling to achieve robust performance in challenging, variable illumination conditions.

Takeru Tsunoori, Masato Kobayashi, Yuki Uranishi2026-03-09💻 cs

UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification

This paper introduces the Unified Attention-Mamba (UAM) backbone, a flexible architecture that seamlessly integrates Attention and Mamba modules without manual tuning, achieving state-of-the-art performance in both tumor cell classification and image segmentation tasks on public benchmarks.

Taixi Chen, Jingyun Chen, Nancy Guo2026-03-09💻 cs

EchoVLA: Synergistic Declarative Memory for VLA-Driven Mobile Manipulation

EchoVLA is a memory-enhanced Vision-Language-Action model for mobile manipulation that synergizes scene and episodic declarative memories to improve navigation and task performance, validated by the new MoMani benchmark and demonstrating significant gains over existing baselines in both simulation and real-world settings.

Min Lin, Xiwen Liang, Bingqian Lin, Liu Jingzhi, Zijian Jiao, Kehan Li, Yu Sun, Weijia Liufu, Yuhan Ma, Yuecheng Liu, Shen Zhao, Yuzheng Zhuang, Xiaodan Liang2026-03-09💻 cs

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

SyncMV4D is a novel framework that overcomes the limitations of single-view and data-hungry 3D methods by introducing a Multi-view Joint Diffusion model and a Diffusion Points Aligner to simultaneously generate synchronized, realistic multi-view hand-object interaction videos and globally aligned 4D metric motions through a closed-loop coupling of visual appearance and dynamic geometry.

Lingwei Dang, Zonghan Li, Juntong Li, Hongwen Zhang, Liang An, Yebin Liu, Qingyao Wu2026-03-09💻 cs

Safe Autonomous Lane Changing: Planning with Dynamic Risk Fields and Time-Varying Convex Space Generation

This paper proposes a novel autonomous lane-changing planning framework that integrates dynamic risk fields with time-varying convex feasible spaces and a constrained iLQR solver to achieve safe, efficient, and comfortable trajectories that outperform traditional methods in complex traffic scenarios.

Yijun Lu, Zhihao Lin, Zhen Tian2026-03-09💻 cs

Reversible Inversion for Training-Free Exemplar-guided Image Editing

This paper introduces ReInversion, a training-free exemplar-guided image editing method that employs a two-stage reversible denoising process and a Mask-Guided Selective Denoising strategy to achieve state-of-the-art performance with minimal computational overhead.

Yuke Li, Lianli Gao, Ji Zhang, Pengpeng Zeng, Lichuan Xiang, Hongkai Wen, Heng Tao Shen, Jingkuan Song2026-03-09💻 cs

A method for tissue-mask supported whole-body image registration in the UK Biobank

This paper presents a sex-stratified whole-body MR image registration method for the UK Biobank that leverages subcutaneous adipose tissue and muscle masks to significantly outperform existing intensity-based and deep learning approaches in anatomical alignment and correlation analysis accuracy.

Yasemin Utkueri, Elin Lundström, Håkan Ahlström, Johan Öfverstedt, Joel Kullberg2026-03-09💻 cs

UniTS: Unified Spatio-Temporal Generative Model for Remote Sensing

This paper introduces UniTS, a unified spatio-temporal generative model based on flow matching and diffusion transformers that integrates tasks like cloud removal, change detection, and forecasting into a single framework, significantly outperforming specialized models under challenging conditions.

Yuxiang Zhang, Shunlin Liang, Wenyuan Li, Han Ma, Jianglei Xu, Yichuan Ma, Jiangwei Xie, Wei Li, Mengmeng Zhang, Ran Tao, Xiang-Gen Xia2026-03-09💻 cs

Safe Model Predictive Diffusion with Shielding

This paper introduces Safe Model Predictive Diffusion (Safe MPD), a training-free planning framework that integrates a safety shield directly into the diffusion denoising process to generate kinodynamically feasible and safe trajectories in real-time, outperforming existing methods in success rate and safety without requiring post-processing corrections.

Taekyung Kim, Keyvan Majd, Hideki Okamoto, Bardh Hoxha, Dimitra Panagou, Georgios Fainekos2026-03-09💻 cs

← Previous Next →