Selective Transfer Learning of Cross-Modality Distillation for Monocular 3D Object Detection

This paper introduces MonoSTL, a selective transfer learning framework that addresses the negative transfer caused by modality gaps in cross-modality distillation for monocular 3D object detection by employing similar architectures and novel depth-aware selective distillation modules to effectively transfer LiDAR depth information to image-based networks, achieving state-of-the-art performance on KITTI and NuScenes benchmarks.

Rui Ding, Meng Yang, Nanning Zheng2026-03-10💻 cs

Classifying Novel 3D-Printed Objects without Retraining: Towards Post-Production Automation in Additive Manufacturing

This paper introduces the ThingiPrint dataset and a contrastive fine-tuning approach that enables the classification of novel 3D-printed objects using their CAD models without requiring model retraining, thereby addressing a critical bottleneck in automating industrial post-production workflows.

Fanis Mathioulakis, Gorjan Radevski, Silke GC Cleuren, Michel Janssens, Brecht Das, Koen Schauwaert, Tinne Tuytelaars2026-03-10💻 cs

FedEU: Evidential Uncertainty-Driven Federated Fine-Tuning of Vision Foundation Models for Remote Sensing Image Segmentation

FedEU is a novel federated learning framework that enhances remote sensing image segmentation by integrating evidential uncertainty quantification and client-specific feature embeddings to guide adaptive global aggregation, thereby improving model robustness and reliability across heterogeneous distributed datasets.

Xiaokang Zhang, Xuran Xiong, Jianzhong Huang, Lefei Zhang2026-03-10💻 cs

RobustSCI: Beyond Reconstruction to Restoration for Snapshot Compressive Imaging under Real-World Degradations

This paper introduces RobustSCI, a pioneering framework that shifts snapshot compressive imaging from simple reconstruction to robust restoration by proposing a novel network architecture and a large-scale benchmark to effectively recover pristine scenes from real-world degraded measurements caused by motion blur and low light.

Hao Wang, Yuanfan Li, Qi Zhou, Zhankuo Xu, Jiong Ni, Xin Yuan2026-03-10💻 cs

RayD3D: Distilling Depth Knowledge Along the Ray for Robust Multi-View 3D Object Detection

The paper proposes RayD3D, a novel cross-modal distillation framework that transfers depth knowledge specifically along the camera-to-object ray to filter out irrelevant LiDAR information, thereby significantly enhancing the robustness of multi-view 3D object detection models against real-world data corruptions without increasing inference costs.

Rui Ding, Zhaonian Kuang, Zongwei Zhou, Meng Yang, Xinhu Zheng, Gang Hua2026-03-10💻 cs

DocCogito: Aligning Layout Cognition and Step-Level Grounded Reasoning for Document Understanding

DocCogito is a unified framework for document understanding that aligns global layout perception with structured, region-grounded reasoning through a lightweight layout tower and a deterministic Visual-Semantic Chain, achieving state-of-the-art performance on multiple benchmarks by enforcing systematic coupling between layout priors and evidence-based reasoning.

Yuchuan Wu, Minghan Zhuo, Teng Fu, Mengyang Zhao, Bin Li, Xiangyang Xue2026-03-10💻 cs

A Unified View of Drifting and Score-Based Models

This paper establishes a unified theoretical framework demonstrating that drifting models, which optimize kernel-based mean-shift discrepancies, are mathematically equivalent to score-matching objectives on kernel-smoothed distributions, thereby precisely connecting them to diffusion models and clarifying their relationship with Distribution Matching Distillation.

Chieh-Hsin Lai, Bac Nguyen, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, Molei Tao2026-03-10🤖 cs.LG

EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification

The paper proposes EvolveReason, a self-evolving reasoning paradigm that combines a human-like chain-of-thought framework, a forgery latent-space distribution capture module, and a reinforcement learning-based self-evolution strategy to enhance the accuracy, detail, and reliability of explainable deepfake facial image identification.

Binjia Zhou, Dawei Luo, Shuai Chen, Feng Xu, Seow, Haoyuan Li, Jiachi Wang, Jiawen Wang, Zunlei Feng, Yijun Bei2026-03-10💻 cs

SketchGraphNet: A Memory-Efficient Hybrid Graph Transformer for Large-Scale Sketch Corpora Recognition

This paper introduces SketchGraphNet, a memory-efficient hybrid graph transformer that models free-hand sketches as structured graphs to achieve state-of-the-art recognition accuracy on the newly constructed 3.44-million-sample SketchGraph benchmark while significantly reducing computational resource requirements.

Shilong Chen, Mingyuan Li, Zhaoyang Wang, Zhonglin Ye, Haixing Zhao2026-03-10💻 cs

Scale-Aware UAV-to-Satellite Cross-View Geo-Localization: A Semantic Geometric Approach

This paper proposes a semantic geometric framework that leverages small vehicles as metric anchors within a decoupled stereoscopic projection model to recover absolute scale from monocular UAV images, thereby enabling scale-adaptive satellite image cropping and significantly improving cross-view geo-localization robustness under real-world scale ambiguity.

Yibin Ye, Shuo Chen, Kun Wang, Xiaokai Song, Jisheng Dang, Qifeng Yu, Xichao Teng, Zhang Li2026-03-10💻 cs

How Long Can Unified Multimodal Models Generate Images Reliably? Taming Long-Horizon Interleaved Image Generation via Context Curation

This paper introduces UniLongGen, a training-free inference strategy that improves long-horizon interleaved image generation by dynamically curating context to discard accumulated visual noise, thereby overcoming the reliability collapse caused by dense visual token interference in unified multimodal models.

Haoyu Chen, Qing Liu, Yuqian Zhou, He Zhang, Zhaowen Wang, Mengwei Ren, Jingjing Ren, Xiang Wang, Zhe Lin, Lei Zhu2026-03-10💻 cs

CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization

The paper introduces CONSTANT, a novel one-shot handwriting generation framework that leverages Style-Aware Quantization and a latent patch-based contrastive objective within a diffusion model to overcome existing limitations in capturing diverse writer styles and generating high-quality, realistic handwritten images across multiple languages.

Anh-Duy Le, Van-Linh Pham, Thanh-Nam Vo, Xuan Toan Mai, Tuan-Anh Tran2026-03-10💻 cs

DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration

DreamSAC is a framework that enhances extrapolative generalization in physics simulations by combining an unsupervised symmetry exploration strategy, which actively probes conservation laws via a Hamiltonian-based curiosity bonus, with a Hamiltonian-based world model that learns invariant physical states from raw observations through a novel contrastive objective.

Jinzhou Tang, Fan Feng, Minghao Fu, Wenjun Lin, Biwei Huang, Keze Wang2026-03-10🤖 cs.LG

ReconDrive: Fast Feed-Forward 4D Gaussian Splatting for Autonomous Driving Scene Reconstruction

ReconDrive is a fast, feed-forward framework that adapts the VGGT foundation model with hybrid prediction heads and static-dynamic composition to achieve high-fidelity, scalable 4D Gaussian Splatting for autonomous driving scenes, outperforming existing feed-forward methods while matching the quality of slower optimization-based approaches.

Haibao Yu, Kuntao Xiao, Jiahang Wang, Ruiyang Hao, Yuxin Huang, Guoran Hu, Haifang Qin, Bowen Jing, Yuntian Bo, Ping Luo2026-03-10💻 cs