Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
This paper proposes the Deferred Visual Ingestion (DVI) framework, which replaces the lossy pre-embedding of visual content with a structure-based hierarchical indexing and deferred VLM analysis strategy, achieving significantly higher accuracy on visual-dense engineering document QA by overcoming the retrieval and detail-loss limitations of existing Pre-Ingestion methods.