Prompt-Based Caption Generation for Single-Tooth Dental Images Using Vision-Language Models

This paper addresses the lack of specialized dental datasets by proposing a framework that uses Vision-Language Models with guided prompts to generate high-quality, holistic captions for single-tooth RGB images, thereby enabling more comprehensive dental image analysis.

Anastasiia Sukhanova, Aiden Taylor, Julian Myers, Zichun Wang, Kartha Veerya Jammuladinne, Satya Sri Rajiteswari Nimmagadda, Aniruddha Maiti, Ananya JanaTue, 10 Ma💻 cs

UnSCAR: Universal, Scalable, Controllable, and Adaptable Image Restoration

The paper introduces UnSCAR, a scalable and controllable universal image restoration framework that utilizes a multi-branch mixture-of-experts architecture to overcome the limitations of catastrophic forgetting and performance degradation in existing all-in-one models when handling multiple real-world degradations.

Debabrata Mandal, Soumitri Chattopadhyay, Yujie Wang, Marc Niethammer, Praneeth ChakravarthulaTue, 10 Ma💻 cs

Generalization in Online Reinforcement Learning for Mobile Agents

This paper addresses the underexplored challenge of generalization in online reinforcement learning for mobile GUI agents by introducing the AndroidWorld-Generalization benchmark and a scalable GRPO-based training system, demonstrating that while RL significantly improves zero-shot performance on unseen task instances, generalization to new templates and applications remains difficult and benefits from test-time few-shot adaptation.

Li Gu, Zihuan Jiang, Zhixiang Chi, Huan Liu, Ziqiang Wang, Yuanhao Yu, Glen Berseth, Yang WangTue, 10 Ma🤖 cs.LG

DogWeave: High-Fidelity 3D Canine Reconstruction from a Single Image via Normal Fusion and Conditional Inpainting

DogWeave is a novel framework that reconstructs high-fidelity 3D canine models from a single RGB image by refining parametric meshes into detailed SDF representations via diffusion-enhanced normal optimization and generating view-consistent textures through conditional inpainting, thereby overcoming challenges like self-occlusion and fur detail to outperform existing state-of-the-art methods.

Shufan Sun, Chenchen Wang, Zongfu YuTue, 10 Ma💻 cs

Med-Evo: Test-time Self-evolution for Medical Multimodal Large Language Models

Med-Evo is a novel self-evolution framework for medical multimodal large language models that leverages label-free reinforcement learning, featuring Feature-driven Pseudo Labeling and Hard-Soft Reward mechanisms, to significantly enhance model performance on unlabeled test data without requiring additional annotated medical datasets.

Dunyuan Xu, Xikai Yang, Juzheng Miao, Yaoqian Li, Jinpeng Li, Pheng-Ann HengTue, 10 Ma💻 cs

SLNet: A Super-Lightweight Geometry-Adaptive Network for 3D Point Cloud Recognition

The paper introduces SLNet, a super-lightweight 3D point cloud recognition network utilizing Nonparametric Adaptive Point Embedding (NAPE) and Geometric Modulation Units (GMU) to achieve state-of-the-art accuracy on benchmarks like ModelNet40 and ScanObjectNN with significantly fewer parameters and computational costs compared to existing models.

Mohammad Saeid, Amir Salarpour, Pedram MohajerAnsari, Mert D. PeséTue, 10 Ma🤖 cs.LG

Selective Transfer Learning of Cross-Modality Distillation for Monocular 3D Object Detection

This paper introduces MonoSTL, a selective transfer learning framework that addresses the negative transfer caused by modality gaps in cross-modality distillation for monocular 3D object detection by employing similar architectures and novel depth-aware selective distillation modules to effectively transfer LiDAR depth information to image-based networks, achieving state-of-the-art performance on KITTI and NuScenes benchmarks.

Rui Ding, Meng Yang, Nanning ZhengTue, 10 Ma💻 cs

Classifying Novel 3D-Printed Objects without Retraining: Towards Post-Production Automation in Additive Manufacturing

This paper introduces the ThingiPrint dataset and a contrastive fine-tuning approach that enables the classification of novel 3D-printed objects using their CAD models without requiring model retraining, thereby addressing a critical bottleneck in automating industrial post-production workflows.

Fanis Mathioulakis, Gorjan Radevski, Silke GC Cleuren, Michel Janssens, Brecht Das, Koen Schauwaert, Tinne TuytelaarsTue, 10 Ma💻 cs

FedEU: Evidential Uncertainty-Driven Federated Fine-Tuning of Vision Foundation Models for Remote Sensing Image Segmentation

FedEU is a novel federated learning framework that enhances remote sensing image segmentation by integrating evidential uncertainty quantification and client-specific feature embeddings to guide adaptive global aggregation, thereby improving model robustness and reliability across heterogeneous distributed datasets.

Xiaokang Zhang, Xuran Xiong, Jianzhong Huang, Lefei ZhangTue, 10 Ma💻 cs

RobustSCI: Beyond Reconstruction to Restoration for Snapshot Compressive Imaging under Real-World Degradations

This paper introduces RobustSCI, a pioneering framework that shifts snapshot compressive imaging from simple reconstruction to robust restoration by proposing a novel network architecture and a large-scale benchmark to effectively recover pristine scenes from real-world degraded measurements caused by motion blur and low light.

Hao Wang, Yuanfan Li, Qi Zhou, Zhankuo Xu, Jiong Ni, Xin YuanTue, 10 Ma💻 cs