SSL-SLR: Self-Supervised Representation Learning for Sign Language Recognition

This paper proposes SSL-SLR, a self-supervised learning framework for sign language recognition that addresses the limitations of standard contrastive methods by introducing free-negative pairs and a novel data augmentation technique to better handle video redundancy and shared movements, thereby achieving significant accuracy improvements across various evaluation settings.

Ariel Basso Madjoukeng, Jérôme Fink, Pierre Poitier, Edith Belise Kenmogne, Benoit Frenay2026-03-09💻 cs

Decision-Driven Semantic Object Exploration for Legged Robots via Confidence-Calibrated Perception and Topological Subgoal Selection

This paper presents a vision-based framework for legged robots that enables robust decision-driven semantic exploration by integrating confidence-calibrated perception, controlled-growth topological memory, and utility-driven subgoal selection to overcome the limitations of conventional geometry-centric navigation in open-world environments.

Guoyang Zhao, Yudong Li, Weiqing Qi, Kai Zhang, Bonan Liu, Kai Chen, Haoang Li, Jun Ma2026-03-09💻 cs

DeCLIP: Decoupled Prompting for CLIP-based Multi-Label Class-Incremental Learning

DeCLIP is a replay-free, parameter-efficient framework for Multi-Label Class-Incremental Learning that decouples CLIP representations through class-specific prompting and Adaptive Similarity Tempering to effectively mitigate catastrophic forgetting and reduce false-positive rates without violating CLIP's single image-text alignment paradigm.

Kaile Du, Zihan Ye, Junzhou Xie, Yixi Shen, Yuyang Li, Fuyuan Hu, Ling Shao, Guangcan Liu, Joost van de Weijer, Fan Lyu2026-03-09💻 cs

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

The paper introduces LikePhys, a training-free evaluation method using likelihood preferences to assess intuitive physics understanding in video diffusion models, demonstrating that current models show improving capabilities in physical reasoning as they scale despite challenges with complex dynamics.

Jianhao Yuan, Fabio Pizzati, Francesco Pinto, Lars Kunze, Ivan Laptev, Paul Newman, Philip Torr, Daniele De Martini2026-03-09🤖 cs.AI

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

The paper introduces 3DThinker, a novel framework that enables vision-language models to perform 3D spatial reasoning from limited views by aligning their internal representations with a 3D foundation model and refining the reasoning process through outcome-based optimization, all without requiring explicit 3D prior inputs or labeled 3D training data.

Zhangquan Chen, Manyuan Zhang, Xinlei Yu, Xufang Luo, Mingze Sun, Zihao Pan, Xiang An, Yan Feng, Peng Pei, Xunliang Cai, Ruqi Huang2026-03-09🤖 cs.AI

AURASeg: Attention-guided Upsampling with Residual-Assistive Boundary Refinement for Onboard Robot Drivable-Area Segmentation

This paper introduces AURASeg, an attention-guided segmentation framework featuring a Residual Boundary Refinement Module and an Attention Progressive Upsampling Decoder to enhance drivable-area boundary precision and multi-scale feature representation for onboard robot navigation, demonstrating superior performance on multiple datasets and successful deployment on a Jetson Nano.

Narendhiran Vijayakumar, Sridevi. M2026-03-09💻 cs

Culture in Action: Evaluating Text-to-Image Models through Social Activities

This paper introduces CULTIVate, a comprehensive benchmark and evaluation framework designed to assess the cultural faithfulness of text-to-image models in depicting social activities across 16 countries, revealing significant performance disparities between Global North and South regions and demonstrating that its proposed metrics align more closely with human judgment than existing standards.

Sina Malakouti, Boqing Gong, Adriana Kovashka2026-03-09💻 cs

The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models

This paper introduces the Cultural Reference Transformation (CRT) metric to evaluate how diffusion models navigate the tension between memorization and generalization in culturally iconic contexts, revealing that model behavior depends on distinct recognition and realization mechanisms influenced by factors like data frequency, textual uniqueness, and reference popularity.

Maria-Teresa De Rosa Palmini, Eva Cetinic2026-03-09🤖 cs.AI

Co-Layout: LLM-driven Co-optimization for Interior Layout

This paper presents Co-Layout, a novel framework that integrates large language models with grid-based integer programming and a coarse-to-fine optimization strategy to jointly optimize room layouts and furniture placement, significantly outperforming existing two-stage pipelines in both solution quality and computational efficiency.

Chucheng Xiang, Ruchao Bao, Biyin Feng, Wenzheng Wu, Zhongyuan Liu, Yirui Guan, Ligang Liu2026-03-09💬 cs.CL

SPARK: Jailbreaking T2V Models by Synergistically Prompting Auditory and Recontextualized Knowledge

This paper introduces SPARK, a jailbreak framework that exploits cross-modal associations in text-to-video models by combining neutral scene anchors, latent auditory triggers, and stylistic modulators to generate semantically unsafe videos that bypass safety guardrails while maintaining a benign appearance.

Zonghao Ying, Moyang Chen, Nizhang Li, Zhiqiang Wang, Wenxin Zhang, Quanchen Zou, Zonglei Jing, Aishan Liu, Xianglong Liu2026-03-09💻 cs

FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

The paper introduces FunnyNodules, a fully parameterized synthetic dataset of lung nodule-like shapes with controllable visual attributes and known decision rules, designed to systematically evaluate and benchmark explainable AI models by verifying whether they learn correct attribute-target relations and align their attention with relevant diagnostic features.

Luisa Gallée, Yiheng Xiong, Meinrad Beer, Michael Götz2026-03-09💻 cs

FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle

The paper introduces FireScope, a novel VLM-based framework and accompanying FireScope-Bench dataset that leverage chain-of-thought reasoning to significantly improve the generalization, interpretability, and accuracy of cross-continental wildfire risk prediction by integrating visual, climatic, and geographic factors.

Mario Markov (INSAIT, Sofia University "St. Kliment Ohridski"), Stefan Maria Ailuro (INSAIT, Sofia University "St. Kliment Ohridski"), Luc Van Gool (INSAIT, Sofia University "St. Kliment Ohridski"), Konrad Schindler (ETH Zurich), Danda Pani Paudel (INSAIT, Sofia University "St. Kliment Ohridski")2026-03-09🤖 cs.LG