Family Matters: A Systematic Study of Spatial vs. Frequency Masking for Continual Test-Time Adaptation

This paper presents a systematic study isolating the impact of masking families in continual test-time adaptation, revealing that spatial masking generally outperforms frequency masking on patch-tokenized architectures by preserving structural coherence, while the optimal choice ultimately depends on the alignment between the specific architecture and task.

Chandler Timm C. Doloriel, Yunbei Zhang, Yeonguk Yu + 6 more2026-03-03💻 cs

ββ-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment

This paper introduces β\beta-CLIP, a multi-granular text-conditioned contrastive learning framework that employs cross-attention and a novel β\beta-Contextualized Contrastive Alignment Loss to achieve state-of-the-art dense vision-language alignment by hierarchically matching textual descriptions of varying lengths to corresponding visual regions.

Fatimah Zohra, Chen Zhao, Hani Itani + 1 more2026-03-03💻 cs

CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

CRISP is a novel method that recovers physically plausible, simulation-ready human motion and scene geometry from monocular video by fitting planar primitives to point clouds, leveraging contact modeling for occluded regions, and validating interactions through reinforcement learning, thereby significantly reducing motion tracking failures and accelerating real-to-sim applications.

Zihan Wang, Jiashun Wang, Jeff Tan + 4 more2026-03-03💻 cs

AI-Powered Dermatological Diagnosis: From Interpretable Models to Clinical Implementation A Comprehensive Framework for Accessible and Trustworthy Skin Disease Detection

This research proposes a comprehensive, interpretable multi-modal AI framework that integrates deep learning image analysis with family history data to enhance the accuracy and personalization of dermatological diagnosis, with plans for prospective clinical trials to validate its real-world implementation.

Satya Narayana Panda, Vaishnavi Kukkala, Spandana Iyer2026-03-03🤖 cs.AI

GeoTeacher: Geometry-Guided Semi-Supervised 3D Object Detection

The paper proposes GeoTeacher, a semi-supervised 3D object detection framework that enhances student model performance on limited labeled data by employing a keypoint-based geometric relation supervision module and a distance-decay voxel-wise data augmentation strategy to better capture and understand object geometries, achieving state-of-the-art results on the ONCE and Waymo datasets.

Jingyu Li, Xiaolong Zhao, Zhe Liu + 2 more2026-03-03💻 cs

Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization

This paper introduces CEM, a model-agnostic, plug-and-play plugin that utilizes a dynamic programming algorithm guided by cumulative error minimization to dynamically optimize caching strategies, thereby significantly enhancing the generation fidelity of accelerated Diffusion Transformer models without incurring additional computational overhead.

Tong Shao, Yusen Fu, Guoying Sun + 3 more2026-03-03💻 cs

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Vision-DeepResearch introduces a novel multimodal deep-research paradigm that leverages multi-turn, multi-entity, and multi-scale visual and textual search, trained via cold-start supervision and reinforcement learning, to significantly outperform existing models and strong closed-source foundation models in solving complex, noise-heavy real-world questions.

Wenxuan Huang, Yu Zeng, Qiuchen Wang + 13 more2026-03-03🤖 cs.AI

Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning

This paper introduces CaCoVID, a reinforcement learning-based token compression framework for video large language models that optimizes token selection by explicitly maximizing their contribution to correct predictions rather than relying on attention scores, thereby significantly reducing computational overhead while maintaining performance.

Yinchao Ma, Qiang Zhou, Zhibin Wang + 4 more2026-03-03🤖 cs.AI