cs.CV papers | Gist.Science

SSL-SLR: Self-Supervised Representation Learning for Sign Language Recognition

This paper proposes SSL-SLR, a self-supervised learning framework for sign language recognition that addresses the limitations of standard contrastive methods by introducing free-negative pairs and a novel data augmentation technique to better handle video redundancy and shared movements, thereby achieving significant accuracy improvements across various evaluation settings.

Ariel Basso Madjoukeng, Jérôme Fink, Pierre Poitier, Edith Belise Kenmogne, Benoit Frenay2026-03-09💻 cs

RED: Robust Event-Guided Motion Deblurring with Modality-Specific Disentanglement

This paper introduces RED, a robust event-guided motion deblurring network that employs a robustness-oriented perturbation strategy and a modality-specific disentanglement mechanism to effectively reconstruct sharp images from fragmented event data caused by real-world sensor under-reporting.

Yihong Leng, Siming Zheng, Jinwei Chen, Bo Li, Jiaojiao Li, Peng-Tao Jiang2026-03-09💻 cs

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space

This paper introduces Kernel VICReg, a novel self-supervised learning framework that extends the VICReg objective into a Reproducing Kernel Hilbert Space to capture nonlinear dependencies and improve representation learning performance on datasets with complex geometric structures.

M. Hadi Sepanj, Benyamin Ghojogh, Saed Moradi, Paul Fieguth2026-03-09🤖 cs.LG

C^2Prompt: Class-aware Client Knowledge Interaction for Federated Continual Learning

This paper proposes C²Prompt, a novel federated continual learning method that mitigates temporal and spatial forgetting by introducing a local class distribution compensation mechanism and a class-aware prompt aggregation scheme to enhance class-wise knowledge coherence across distributed clients.

Kunlun Xu, Yibo Feng, Jiangmeng Li, Yongsheng Qi, Jiahuan Zhou2026-03-09🤖 cs.LG

Decision-Driven Semantic Object Exploration for Legged Robots via Confidence-Calibrated Perception and Topological Subgoal Selection

This paper presents a vision-based framework for legged robots that enables robust decision-driven semantic exploration by integrating confidence-calibrated perception, controlled-growth topological memory, and utility-driven subgoal selection to overcome the limitations of conventional geometry-centric navigation in open-world environments.

Guoyang Zhao, Yudong Li, Weiqing Qi, Kai Zhang, Bonan Liu, Kai Chen, Haoang Li, Jun Ma2026-03-09💻 cs

DeCLIP: Decoupled Prompting for CLIP-based Multi-Label Class-Incremental Learning

DeCLIP is a replay-free, parameter-efficient framework for Multi-Label Class-Incremental Learning that decouples CLIP representations through class-specific prompting and Adaptive Similarity Tempering to effectively mitigate catastrophic forgetting and reduce false-positive rates without violating CLIP's single image-text alignment paradigm.

Kaile Du, Zihan Ye, Junzhou Xie, Yixi Shen, Yuyang Li, Fuyuan Hu, Ling Shao, Guangcan Liu, Joost van de Weijer, Fan Lyu2026-03-09💻 cs

Beyond Flat Unknown Labels in Open-World Object Detection

The paper introduces BOUND, an open-world object detector that advances beyond simple "unknown" labeling by inferring coarse-grained, hierarchical categories for unseen objects to enable more informed decision-making while maintaining high performance on known classes.

Yuchen Zhang, Yao Lu, Johannes Betz2026-03-09💻 cs

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

The paper introduces LikePhys, a training-free evaluation method using likelihood preferences to assess intuitive physics understanding in video diffusion models, demonstrating that current models show improving capabilities in physical reasoning as they scale despite challenges with complex dynamics.

Jianhao Yuan, Fabio Pizzati, Francesco Pinto, Lars Kunze, Ivan Laptev, Paul Newman, Philip Torr, Daniele De Martini2026-03-09🤖 cs.AI

CanvasMAR: Improving Masked Autoregressive Video Prediction With Canvas

CanvasMAR enhances masked autoregressive video prediction by introducing a global "canvas" prior and a motion-aware curriculum to generate high-fidelity, coherent videos with fewer sampling steps, achieving performance that rivals advanced diffusion-based methods.

Zian Li, Muhan Zhang2026-03-09🤖 cs.AI

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

The paper introduces 3DThinker, a novel framework that enables vision-language models to perform 3D spatial reasoning from limited views by aligning their internal representations with a 3D foundation model and refining the reasoning process through outcome-based optimization, all without requiring explicit 3D prior inputs or labeled 3D training data.

Zhangquan Chen, Manyuan Zhang, Xinlei Yu, Xufang Luo, Mingze Sun, Zihao Pan, Xiang An, Yan Feng, Peng Pei, Xunliang Cai, Ruqi Huang2026-03-09🤖 cs.AI

AURASeg: Attention-guided Upsampling with Residual-Assistive Boundary Refinement for Onboard Robot Drivable-Area Segmentation

This paper introduces AURASeg, an attention-guided segmentation framework featuring a Residual Boundary Refinement Module and an Attention Progressive Upsampling Decoder to enhance drivable-area boundary precision and multi-scale feature representation for onboard robot navigation, demonstrating superior performance on multiple datasets and successful deployment on a Jetson Nano.

Narendhiran Vijayakumar, Sridevi. M2026-03-09💻 cs

Culture in Action: Evaluating Text-to-Image Models through Social Activities

This paper introduces CULTIVate, a comprehensive benchmark and evaluation framework designed to assess the cultural faithfulness of text-to-image models in depicting social activities across 16 countries, revealing significant performance disparities between Global North and South regions and demonstrating that its proposed metrics align more closely with human judgment than existing standards.

Sina Malakouti, Boqing Gong, Adriana Kovashka2026-03-09💻 cs

Decoupling Bias, Aligning Distributions: Synergistic Fairness Optimization for Deepfake Detection

This paper proposes a dual-mechanism collaborative optimization framework that synergistically integrates structural fairness decoupling and global distribution alignment to enhance both inter-group and intra-group fairness in deepfake detection without compromising overall accuracy.

Feng Ding, Wenhui Yi, Yunpeng Zhou, Xinan He, Hong Rao, Shu Hu2026-03-09💻 cs

LaxMotion: Rethinking Supervision Granularity for 3D Human Motion Generation

LaxMotion is a novel framework for 3D human motion generation that replaces precise 3D coordinate supervision with a relaxed paradigm based on global trajectories and monocular 2D cues, thereby enhancing model generalization and diversity while achieving performance comparable to fully supervised methods.

Sheng Liu, Yuanzhi Liang, Sidan Du2026-03-09💻 cs

The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models

This paper introduces the Cultural Reference Transformation (CRT) metric to evaluate how diffusion models navigate the tension between memorization and generalization in culturally iconic contexts, revealing that model behavior depends on distinct recognition and realization mechanisms influenced by factors like data frequency, textual uniqueness, and reference popularity.

Maria-Teresa De Rosa Palmini, Eva Cetinic2026-03-09🤖 cs.AI

Co-Layout: LLM-driven Co-optimization for Interior Layout

This paper presents Co-Layout, a novel framework that integrates large language models with grid-based integer programming and a coarse-to-fine optimization strategy to jointly optimize room layouts and furniture placement, significantly outperforming existing two-stage pipelines in both solution quality and computational efficiency.

Chucheng Xiang, Ruchao Bao, Biyin Feng, Wenzheng Wu, Zhongyuan Liu, Yirui Guan, Ligang Liu2026-03-09💬 cs.CL

SPARK: Jailbreaking T2V Models by Synergistically Prompting Auditory and Recontextualized Knowledge

This paper introduces SPARK, a jailbreak framework that exploits cross-modal associations in text-to-video models by combining neutral scene anchors, latent auditory triggers, and stylistic modulators to generate semantically unsafe videos that bypass safety guardrails while maintaining a benign appearance.

Zonghao Ying, Moyang Chen, Nizhang Li, Zhiqiang Wang, Wenxin Zhang, Quanchen Zou, Zonglei Jing, Aishan Liu, Xianglong Liu2026-03-09💻 cs

MRIQT: Physics-Aware Diffusion Model for Image Quality Transfer in Neonatal Ultra-Low-Field MRI

The paper introduces MRIQT, a physics-aware 3D conditional diffusion model that significantly enhances the image quality and diagnostic fidelity of portable ultra-low-field neonatal MRI by translating noisy scans into high-fidelity images comparable to high-field MRI.

Malek Al Abed, Sebiha Demir, Anne Groteklaes, Elodie Germani, Shahrooz Faghihroohi, Hemmen Sabir, Shadi Albarqouni2026-03-09💻 cs

FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

The paper introduces FunnyNodules, a fully parameterized synthetic dataset of lung nodule-like shapes with controllable visual attributes and known decision rules, designed to systematically evaluate and benchmark explainable AI models by verifying whether they learn correct attribute-target relations and align their attention with relevant diagnostic features.

Luisa Gallée, Yiheng Xiong, Meinrad Beer, Michael Götz2026-03-09💻 cs

FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle

The paper introduces FireScope, a novel VLM-based framework and accompanying FireScope-Bench dataset that leverage chain-of-thought reasoning to significantly improve the generalization, interpretability, and accuracy of cross-continental wildfire risk prediction by integrating visual, climatic, and geographic factors.

Mario Markov (INSAIT, Sofia University "St. Kliment Ohridski"), Stefan Maria Ailuro (INSAIT, Sofia University "St. Kliment Ohridski"), Luc Van Gool (INSAIT, Sofia University "St. Kliment Ohridski"), Konrad Schindler (ETH Zurich), Danda Pani Paudel (INSAIT, Sofia University "St. Kliment Ohridski")2026-03-09🤖 cs.LG

← Previous Next →