Multimodal Adaptive Retrieval Augmented Generation through Internal Representation Learning
The paper proposes Multimodal Adaptive RAG (MMA-RAG), a framework that dynamically decides whether to incorporate retrieved external knowledge by analyzing the model's internal visual and textual representations, thereby effectively reducing hallucinations and improving performance in Visual Question Answering tasks.