ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
This paper introduces the ORIC framework and benchmark to evaluate and improve Large Vision-Language Models' object recognition capabilities under contextual incongruity, demonstrating that such scenarios significantly degrade performance and that targeted Visual Reinforcement Fine-Tuning can effectively mitigate these failures.