Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume

This paper introduces UMPIRE, a training-free, efficient uncertainty quantification framework for Multimodal Large Language Models that leverages internal modality features to compute incoherence-adjusted semantic volumes, demonstrating superior performance in error detection and calibration across diverse modalities and challenging settings without relying on external tools.

Gregory Kang Ruey Lau, Hieu Dao, Nicole Kan Hui Lin + 1 more2026-03-02💬 cs.CL

Histopathology Image Normalization via Latent Manifold Compaction

This paper introduces Latent Manifold Compaction (LMC), an unsupervised framework that harmonizes histopathology images by compacting stain-induced latent manifolds to learn batch-invariant embeddings, thereby significantly improving cross-batch generalization and outperforming state-of-the-art normalization methods in downstream classification and detection tasks.

Xiaolong Zhang, Jianwei Zhang, Selim Sevim + 3 more2026-03-02🤖 cs.LG

SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

This paper introduces SGIFormer, a novel 3D instance segmentation method that combines Semantic-guided Mix Query initialization with a Geometric-enhanced Interleaving Transformer decoder to overcome existing limitations in query initialization and scalability, achieving state-of-the-art performance on major benchmarks while balancing accuracy and efficiency.

Lei Yao, Yi Wang, Moyun Liu + 1 more2026-02-27💻 cs

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

This paper proposes a framework that enhances Open Vocabulary Object Detection models for open-world settings by introducing Pseudo Unknown Embedding and Multi-Scale Contrastive Anchor Learning to identify and incrementally learn novel objects, thereby addressing limitations in detecting far-out-of-distribution items and reducing misclassifications while maintaining state-of-the-art performance.

Zizhao Li, Zhengkang Xiang, Joseph West + 1 more2026-02-27🤖 cs.AI

Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

This paper proposes a novel text-to-sketch-animation method that leverages a pre-trained text-to-video diffusion model guided by SDS loss, while introducing length-area regularization for temporal consistency and As-Rigid-As-Possible loss to preserve sketch topology, thereby outperforming state-of-the-art approaches in both quantitative and qualitative evaluations.

Gaurav Rai, Ojaswa Sharma2026-02-27💻 cs