An Effective Data Augmentation Method by Asking Questions about Scene Text Images
This paper proposes a VQA-inspired data augmentation framework that generates natural-language questions about character-level attributes to enhance scene and handwritten text recognition models, resulting in significant improvements in transcription accuracy on benchmark datasets.