Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models

This paper introduces FAMDA, a simple yet effective unsupervised domain adaptation framework that leverages Vision Foundation Models as teachers within a self-training paradigm to generate high-quality pseudo-labels, enabling the training of highly efficient student networks that achieve state-of-the-art performance in multi-task dense prediction for resource-constrained robotics applications.

Beomseok Kang, Niluthpol Chowdhury Mithun, Mikhail Sizintsev, Han-Pang Chiu, Supun Samarasekera2026-03-10💻 cs

QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification

QuantSparse is a unified framework that effectively combines model quantization and attention sparsification for video diffusion transformers by introducing Multi-Scale Salient Attention Distillation and Second-Order Sparse Attention Reparameterization to mitigate information loss, thereby achieving significant storage reduction and inference acceleration while substantially outperforming existing baselines in generation quality.

Weilun Feng, Chuanguang Yang, Haotong Qin, Mingqiang Wu, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu2026-03-10💻 cs

PHASE-Net: Physics-Grounded Harmonic Attention System for Efficient Remote Photoplethysmography Measurement

This paper introduces PHASE-Net, a lightweight and theoretically grounded remote photoplethysmography model that leverages hemodynamic principles to derive a causal Temporal Convolutional Network, enhanced by novel spatial mixing and filtering modules to achieve state-of-the-art accuracy and efficiency in non-contact physiological monitoring under challenging conditions.

Bo Zhao, Dan Guo, Junzhe Cao, Yong Xu, Bochao Zou, Tao Tan, Yue Sun, Zitong Yu2026-03-10💻 cs

LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

This paper introduces LMOD+, a large-scale multimodal ophthalmology benchmark dataset and evaluation framework featuring 32,633 annotated instances across 12 conditions and 5 imaging modalities, designed to advance and systematically assess the capabilities of multimodal large language models in vision-threatening disease diagnosis, staging, and bias detection.

Zhenyue Qin, Yang Liu, Yu Yin, Jinyu Ding, Haoran Zhang, Anran Li, Dylan Campbell, Xuansheng Wu, Ke Zou, Tiarnan D. L. Keenan, Emily Y. Chew, Zhiyong Lu, Yih Chung Tham, Ninghao Liu, Xiuzhen Zhang, Qingyu Chen2026-03-10💻 cs

Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!

The paper introduces REVEL, a new task for streaming, fine-grained interactive video manipulation on any object at any time, and proposes DragStream, a training-free method that resolves latent distribution drift and context interference in autoregressive video diffusion models through adaptive distribution self-rectification and spatial-frequency selective optimization.

Junbao Zhou, Yuan Zhou, Kesen Zhao, Qingshan Xu, Beier Zhu, Richang Hong, Hanwang Zhang2026-03-10💻 cs

Unsupervised Deep Generative Models for Anomaly Detection in Neuroimaging: A Systematic Scoping Review

This systematic scoping review synthesizes thirty-three studies on unsupervised deep generative models for neuroimaging anomaly detection, highlighting their potential for pathology-agnostic localization in data-scarce settings while identifying key challenges such as methodological heterogeneity and limited external validation.

Youwan Mahé, Elise Bannier, Stéphanie Leplaideur, Elisa Fromont, Francesca Galassi2026-03-10💻 cs

Taming Modality Entanglement in Continual Audio-Visual Segmentation

This paper introduces the Continual Audio-Visual Segmentation (CAVS) task and proposes a Collision-based Multi-modal Rehearsal (CMR) framework that effectively addresses multi-modal semantic drift and co-occurrence confusion through novel sample selection and frequency adjustment strategies, significantly outperforming existing single-modal continual learning methods.

Yuyang Hong, Qi Yang, Tao Zhang, Zili Wang, Zhaojin Fu, Kun Ding, Bin Fan, Shiming Xiang2026-03-10💻 cs

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

This paper introduces Dream4Drive, a novel synthetic data generation framework that leverages 3D-aware guidance and a fine-tuned driving world model to create diverse, multi-view corner cases, effectively enhancing downstream perception tasks in autonomous driving without the performance gains being negated by increased training epochs.

Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang2026-03-10💻 cs

Detecting AI-Generated Images via Diffusion Snap-Back Reconstruction: A Forensic Approach

This paper proposes a forensic method called "diffusion snap-back reconstruction," which detects AI-generated images by analyzing how perceptual similarity metrics change when an image is perturbed and reconstructed by a diffusion model, achieving high accuracy (AUROC of 0.993) and robustness against common distortions without relying on traditional pixel-level artifacts.

Mohd Ruhul Ameen, Akif Islam2026-03-10💻 cs

Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper

This paper introduces "Jr. AI Scientist," an autonomous system that mimics a novice researcher's workflow to generate novel, scientifically valuable papers building on real academic works, while simultaneously evaluating its performance through rigorous automated and human assessments to identify both its capabilities and the significant risks and limitations of current AI-driven scientific exploration.

Atsuyuki Miyai, Mashiro Toyooka, Takashi Otonari, Zaiying Zhao, Kiyoharu Aizawa2026-03-10🤖 cs.LG

MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks

This paper introduces MUGSQA, a novel framework comprising a multi-uncertainty-based Gaussian Splatting quality assessment dataset, a unified multi-distance subjective evaluation method, and two benchmarks designed to rigorously assess the robustness of reconstruction methods and the performance of existing quality metrics under varying input conditions.

Tianang Chen, Jian Jin, Shilv Cai, Zhuangzi Li, Weisi Lin2026-03-10💻 cs

Counting Through Occlusion: Framework for Open World Amodal Counting

This paper introduces CountOCC, a novel amodal counting framework that overcomes the limitations of existing methods under occlusion by hierarchically reconstructing complete object features through multimodal guidance and visual equivalence objectives, achieving state-of-the-art performance on newly established occlusion-augmented benchmarks.

Safaeid Hossain Arib, Rabeya Akter, Abdul Monaf Chowdhury, Md Jubair Ahmed Sourov, Md Mehedi Hasan2026-03-10💻 cs