SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

This paper introduces SGIFormer, a novel 3D instance segmentation method that combines Semantic-guided Mix Query initialization with a Geometric-enhanced Interleaving Transformer decoder to overcome existing limitations in query initialization and scalability, achieving state-of-the-art performance on major benchmarks while balancing accuracy and efficiency.

Lei Yao, Yi Wang, Moyun Liu + 1 more2026-02-27💻 cs

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

This paper proposes a framework that enhances Open Vocabulary Object Detection models for open-world settings by introducing Pseudo Unknown Embedding and Multi-Scale Contrastive Anchor Learning to identify and incrementally learn novel objects, thereby addressing limitations in detecting far-out-of-distribution items and reducing misclassifications while maintaining state-of-the-art performance.

Zizhao Li, Zhengkang Xiang, Joseph West + 1 more2026-02-27🤖 cs.AI

Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

This paper proposes a novel text-to-sketch-animation method that leverages a pre-trained text-to-video diffusion model guided by SDS loss, while introducing length-area regularization for temporal consistency and As-Rigid-As-Possible loss to preserve sketch topology, thereby outperforming state-of-the-art approaches in both quantitative and qualitative evaluations.

Gaurav Rai, Ojaswa Sharma2026-02-27💻 cs

Diffusion or Non-Diffusion Adversarial Defenses: Rethinking the Relation between Classifier and Adversarial Purifier

This paper challenges the prevailing reliance on diffusion models for adversarial defense by demonstrating that non-diffusion purifiers can achieve superior robustness, transferability, and cross-dataset generalization, notably outperforming ImageNet-trained diffusion models when applied to ImageNet despite being trained only on CIFAR-10.

Yuan-Chih Chen, Chun-Shien Lu2026-02-27💻 cs

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

The paper introduces ViT-Linearizer, a cross-architecture distillation framework that transfers the rich representations of quadratic-complexity Vision Transformers into efficient linear-time recurrent models (such as Mamba) via activation matching and masked prediction, achieving competitive ImageNet accuracy while significantly reducing inference costs for high-resolution tasks.

Guoyizhe Wei, Rama Chellappa2026-02-27🤖 cs.AI

Reflectance Prediction-based Knowledge Distillation for Robust 3D Object Detection in Compressed Point Clouds

This paper proposes a Reflectance Prediction-based Knowledge Distillation (RPKD) framework that enhances 3D object detection robustness in low-bitrate compressed point clouds by discarding reflectance during transmission, reconstructing it via a geometry-based prediction module, and utilizing a cross-source distillation strategy to transfer knowledge from raw to compressed data.

Hao Jing, Anhong Wang, Yifan Zhang + 2 more2026-02-27💻 cs