VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer
VisualAD is a language-free, zero-shot anomaly detection framework that leverages a frozen Vision Transformer backbone with learnable normality and abnormality tokens, along with spatial-aware cross-attention and self-alignment modules, to achieve state-of-the-art performance across industrial and medical domains without relying on text encoders or cross-modal alignment.