SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
This paper introduces SCAM, the largest and most diverse real-world dataset of typographic attack images, to evaluate and demonstrate the significant vulnerability of state-of-the-art multimodal foundation models to such attacks while providing empirical insights into how model architecture and training data influence robustness.