IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time

Imagine you are a detective trying to solve a complex mystery. The clues aren't all in one place; they are scattered across different files in a massive library.

Clue A (in File 1) says: "The suspect drove a red car."
Clue B (in File 2) says: "The red car was last seen in Paris."
Clue C (in File 3) says: "The suspect was born in London."

To answer the question, "Where was the driver of the red car born?", you need to connect Clue A, Clue B, and Clue C.

The Problem with Old Methods (Naive RAG)

Traditional AI search engines work like a librarian who grabs the first few books that look like they match your question.

If you ask, "Where was the driver born?", the librarian might grab File 1 (talking about the car) and File 3 (talking about the birth), but they might miss the connection between them.
Or, they might grab File 1 and File 2, but forget File 3.
Because the AI has to guess the connection while it's trying to answer, it often gets confused or gives a wrong answer (like saying the driver's name instead of their birthplace).

To fix this, other advanced methods try to build a giant map (Graph) of all the connections between books while you are asking the question. But building a map on the fly is slow, expensive, and requires the librarian to run back and forth between shelves multiple times.

The Solution: IndexRAG (The "Pre-Made Bridge" Method)

The authors of this paper, IndexRAG, had a brilliant idea: Why not build the bridges between the files before you even ask the question?

They call this "Index-Time Reasoning." Instead of waiting for you to ask a question to figure out how the files connect, they do the hard work while the library is being organized.

How it works (The Analogy):

The "Bridge Builder" (Offline Indexing):
Imagine a super-smart robot librarian who reads every single file in the library before any customers arrive.
- It sees that File 1 mentions "Henry Edwards" (the director).
- It sees that File 2 mentions "Henry Edwards" (the actor).
- It sees that File 3 mentions "Henry Edwards" (born in Weston-super-Mare).
- Instead of just leaving them as separate files, the robot writes a new, special note called a "Bridging Fact."
- The Bridging Fact says: "The director of the film Aylwin (from File 1) was born in Weston-super-Mare (from File 3)."
This new note is a standalone clue that contains the answer to the multi-step puzzle. It's like building a physical bridge between two islands so you don't have to swim between them later.
The "Search" (Online Inference):
Now, when you ask your question, the librarian doesn't need to build a map or swim between islands.
- You ask: "Where was the director of Aylwin born?"
- The librarian searches the library. Because they pre-made the "Bridging Fact," they find that special note immediately.
- The note says: "Weston-super-Mare."
- Boom. The answer is found in one single step, instantly.

Why is this a big deal?

Speed: It's like taking a shortcut. You don't have to stop and think about how to connect the dots; the dots are already connected for you.
Cost: It's cheaper. You only need to ask the AI (the librarian) to give you the answer once, instead of asking it to search, think, search again, and think again.
Accuracy: Because the "Bridging Facts" are written specifically to answer these types of questions, the AI is much less likely to get confused or hallucinate (make things up).

The Result

The paper tested this on three difficult "mystery" datasets.

Old methods (Naive RAG) often got stuck because they couldn't find the hidden connections.
Graph methods (building maps on the fly) were accurate but slow and expensive.
IndexRAG was fast, cheap, and the most accurate on average. It solved the puzzles better than the others without needing to do any extra work while you were waiting for the answer.

In short: IndexRAG is like pre-packing a lunch for a long hike. Instead of trying to cook a meal while you're walking (which is messy and slow), you prepare the perfect meal beforehand. When you get hungry, you just eat and keep moving.

1. Problem Statement

Multi-hop Question Answering (QA) requires reasoning across multiple documents to synthesize an answer (e.g., "Who directed the film Aylwin?" $\rightarrow$ "Henry Edwards" $\rightarrow$ "Where was Henry Edwards born?").

Limitations of Naive RAG: Standard Retrieval-Augmented Generation (RAG) retrieves passages independently. If the answer requires connecting information from two separate documents (Document A: Film Director; Document B: Director's Birthplace), the retrieval system often fails to retrieve the second document because the query does not semantically match it directly.
Limitations of Existing Solutions:
- Graph-based RAG (e.g., HippoRAG, GraphRAG): Constructs knowledge graphs to link entities. However, they require expensive online processing (entity extraction, graph traversal, multiple LLM calls) during inference, increasing latency and cost.
- Iterative Methods (e.g., IRCoT): Decompose queries into multiple rounds of retrieval and generation. While effective, this results in high inference latency and multiple LLM calls.

Core Challenge: How to enable cross-document reasoning with single-pass retrieval and a single LLM call at inference time, without sacrificing accuracy.

2. Methodology: IndexRAG

IndexRAG proposes shifting the burden of cross-document reasoning from online inference to offline indexing. The core insight is that connections between documents are determined by content, not specific queries, allowing these connections to be precomputed.

The pipeline consists of two phases:

A. Offline Indexing (Two Stages)

Stage 1: Atomic Knowledge Unit (AKU) & Entity Extraction
- For each document, an LLM extracts Atomic Knowledge Units (AKUs) (structured as question-answer pairs) and associated Entities.
- AKUs serve as the minimal retrievable units, replacing raw text chunks to ensure denser, more query-aligned retrieval.
- Entities are extracted to identify potential "bridge" points between documents.
Stage 2: Bridging Fact Generation
- Bridge Entity Identification: The system aggregates entities across all documents. Entities appearing in $\ge 2$ documents (but below a frequency threshold $\tau$ to avoid generic terms) are identified as Bridge Entities.
- Fact Generation: For each bridge entity, the system retrieves relevant facts from all documents mentioning that entity. An LLM is prompted to generate Bridging Facts.
  - Example: If Doc A says "Aylwin is directed by Henry Edwards" and Doc B says "Henry Edwards was born in Weston-super-Mare," the system generates a new fact: "The director of the film Aylwin was born in Weston-super-Mare."
- Storage: Both original AKUs and the newly generated Bridging Facts are encoded into a unified flat vector store.

B. Online Inference

Single-Pass Retrieval: The user query is embedded and retrieved against the unified vector store.
Balanced Context Selection: Since Bridging Facts are shorter than AKUs, they might dominate the top- $k$ $k$ results. IndexRAG employs a Balanced Context Selection mechanism:
- It greedily selects retrieved items.
- It ensures a maximum limit ( $k_b$ ) of Bridging Facts are included to prevent them from crowding out information-dense AKUs.
Generation: The selected context (a mix of AKUs and Bridging Facts) is fed to the LLM for a single-pass answer generation.

3. Key Contributions

Index-Time Reasoning Paradigm: A novel approach that precomputes cross-document reasoning connections during indexing, eliminating the need for graph traversal or iterative loops at inference time.
Bridging Facts: Introduction of a new retrieval unit type that explicitly encodes multi-hop reasoning chains, making implicit cross-document connections directly retrievable via standard vector search.
Training-Free & Agnostic Framework: The method requires no fine-tuning of embedding models or LLMs. It is agnostic to the underlying retrieval strategy and can be combined with iterative methods (like IRCoT).
Efficiency: Achieves cross-document reasoning with only one retrieval pass and one LLM call during inference.

4. Experimental Results

The authors evaluated IndexRAG on three standard multi-hop QA benchmarks: HotpotQA, 2WikiMultiHopQA, and MuSiQue.

Performance vs. Single-Call Baselines:
- IndexRAG achieved the highest average F1 score (51.7) among all methods requiring a single LLM call.
- It outperformed Naive RAG by +4.6 F1 points on average.
- It surpassed FastGraphRAG (+2.3) and RAPTOR (+4.7).
- On the challenging MuSiQue dataset, IndexRAG improved F1 from 29.9 (Naive RAG) to 34.4.
Performance vs. Multi-Call Baselines:
- When combined with IRCoT (Iterative Retrieval CoT), IndexRAG achieved an average F1 of 55.0, outperforming the graph-based HippoRAG (54.1) and standalone IRCoT (41.6).
- This demonstrates that Bridging Facts provide context that iterative reasoning alone cannot capture.
Efficiency (Latency & Cost):
- Latency: IndexRAG (0.30s on MuSiQue) is nearly as fast as Naive RAG (0.29s) and 8.5x faster than FastGraphRAG (2.55s).
- LLM Calls: IndexRAG uses 1 call at inference, whereas HippoRAG requires 2 calls (for entity extraction + generation) and IRCoT requires multiple calls.
- Trade-off: IndexRAG shifts the computational cost to the offline indexing phase, resulting in minimal online overhead.
Ablation Studies:
- Recall vs. Performance: Adding Bridging Facts slightly reduced Recall@10 (because they compete with original passages for slots) but significantly increased Exact Match (EM). This confirms that the reasoning encoded in Bridging Facts is more valuable than the raw text they replace.
- Question Types: Bridging Facts showed the most significant gains on Compositional and Inference questions (requiring synthesis of evidence) but less on "Bridge Comparison" questions (requiring parallel independent lookups).

5. Significance

Paradigm Shift: IndexRAG challenges the prevailing trend of complex, multi-step online reasoning by demonstrating that offline pre-computation can effectively solve multi-hop problems.
Scalability: By relying on flat vector search rather than graph traversal, IndexRAG offers a highly scalable solution for large-scale RAG systems where inference latency and cost are critical constraints.
Simplicity: It achieves state-of-the-art performance without complex graph construction algorithms or fine-tuning, making it easier to deploy in production environments.
Future Direction: The paper suggests that shifting reasoning complexity to the indexing phase is a viable and efficient strategy for next-generation RAG systems, particularly for domains requiring deep cross-document synthesis.

IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time

The Problem with Old Methods (Naive RAG)

The Solution: IndexRAG (The "Pre-Made Bridge" Method)

How it works (The Analogy):

Why is this a big deal?

The Result

1. Problem Statement

2. Methodology: IndexRAG

A. Offline Indexing (Two Stages)

B. Online Inference

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling

WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain

Text-as-Signal: Quantitative Semantic Scoring with Embeddings, Logprobs, and Noise Reduction

A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context