From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

This paper proposes a framework that enhances Open Vocabulary Object Detection models for open-world settings by introducing Pseudo Unknown Embedding and Multi-Scale Contrastive Anchor Learning to identify and incrementally learn novel objects, thereby addressing limitations in detecting far-out-of-distribution items and reducing misclassifications while maintaining state-of-the-art performance.

Zizhao Li, Zhengkang Xiang, Joseph West + 1 more2026-02-27🤖 cs.AI

Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

This paper proposes a novel text-to-sketch-animation method that leverages a pre-trained text-to-video diffusion model guided by SDS loss, while introducing length-area regularization for temporal consistency and As-Rigid-As-Possible loss to preserve sketch topology, thereby outperforming state-of-the-art approaches in both quantitative and qualitative evaluations.

Gaurav Rai, Ojaswa Sharma2026-02-27💻 cs

Diffusion or Non-Diffusion Adversarial Defenses: Rethinking the Relation between Classifier and Adversarial Purifier

This paper challenges the prevailing reliance on diffusion models for adversarial defense by demonstrating that non-diffusion purifiers can achieve superior robustness, transferability, and cross-dataset generalization, notably outperforming ImageNet-trained diffusion models when applied to ImageNet despite being trained only on CIFAR-10.

Yuan-Chih Chen, Chun-Shien Lu2026-02-27💻 cs

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

The paper introduces ViT-Linearizer, a cross-architecture distillation framework that transfers the rich representations of quadratic-complexity Vision Transformers into efficient linear-time recurrent models (such as Mamba) via activation matching and masked prediction, achieving competitive ImageNet accuracy while significantly reducing inference costs for high-resolution tasks.

Guoyizhe Wei, Rama Chellappa2026-02-27🤖 cs.AI

Reflectance Prediction-based Knowledge Distillation for Robust 3D Object Detection in Compressed Point Clouds

This paper proposes a Reflectance Prediction-based Knowledge Distillation (RPKD) framework that enhances 3D object detection robustness in low-bitrate compressed point clouds by discarding reflectance during transmission, reconstructing it via a geometry-based prediction module, and utilizing a cross-source distillation strategy to transfer knowledge from raw to compressed data.

Hao Jing, Anhong Wang, Yifan Zhang + 2 more2026-02-27💻 cs

LinGuinE: Longitudinal Guidance Estimation for Volumetric Tumour Segmentation

LinGuinE is a novel, training-free PyTorch framework that achieves state-of-the-art longitudinal volumetric tumour segmentation and lesion tracking across multiple datasets by combining image registration with guided segmentation from a single radiologist prompt, enabling flexible, direction-agnostic analysis without requiring longitudinal data training.

Nadine Garibli, Mayank Patwari, Bence Csiba + 2 more2026-02-27⚡ eess