Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts
This paper introduces "Whisperer," a sample-efficient visual prompting framework that bootstraps frozen OCR models by using a four-stage behavioral cloning curriculum to learn diffusion-based preprocessors that enhance degraded text inputs, achieving an 8% absolute reduction in Character Error Rate without modifying the downstream model's weights.