DocSage: An Information Structuring Agent for Multi-Doc Multi-Entity Question Answering

DocSage is an end-to-end agentic framework that addresses the limitations of existing RAG systems in multi-document, multi-entity question answering by integrating dynamic schema discovery, error-aware structured extraction, and schema-aware relational reasoning to significantly improve cross-document evidence aggregation and accuracy.

Teng Lin, Yizhang Zhu, Zhengxuan Zhang, Yuyu Luo, Nan Tang

Published 2026-03-13
📖 5 min read🧠 Deep dive

🧐 The Problem: The "Needle in a Haystack" Nightmare

Imagine you are a detective trying to solve a complex mystery. But instead of one notebook, you have 100 different notebooks scattered across a room. Each notebook contains a few clues, but the clues are messy, written in different handwriting, and sometimes contradict each other.

Your boss asks: "Who is the mastermind, and how much money did they steal?"

To answer this, you can't just read one page. You have to:

  1. Find the right pages in 100 different books.
  2. Connect the dots between a name in Book A, a bank account in Book B, and a date in Book C.
  3. Ignore the irrelevant noise.

Current AI tools (like standard LLMs or RAG) struggle here.

  • Standard AI tries to read all 100 books at once. It gets overwhelmed, forgets the details in the middle, and starts guessing. It's like trying to drink from a firehose.
  • Standard RAG (Retrieval-Augmented Generation) is like a librarian who only finds books based on keywords. If you ask about "money," it might bring you a book about "money" but miss the specific page about the theft. It's too "coarse-grained."
  • Graph-based AI tries to draw a giant map of connections. But with 100 messy books, the map becomes a tangled ball of yarn that takes forever to untangle.

🦉 The Solution: Meet DocSage

DocSage is a new AI agent designed specifically to solve this "Multi-Document, Multi-Entity" puzzle. Instead of just reading, it acts like a super-organized project manager who turns the messy pile of notebooks into a clean, structured filing system before trying to answer the question.

Think of DocSage as a Master Chef who doesn't just throw ingredients into a pot; they first chop, measure, and organize everything into labeled bowls.

DocSage works in three magical steps:

1. The "Active Detective" (Schema Discovery)

  • What it does: Before digging into the books, DocSage asks itself: "What exactly do I need to find the answer?"
  • The Analogy: Imagine you are looking for a specific person in a crowd. Instead of looking at everyone, you first decide: "I need to find someone wearing a red hat, holding a blue umbrella, and standing near the fountain."
  • How it works: DocSage reads a little bit, then asks itself clarifying questions like, "Wait, did I miss the connection between the CEO and the Bank?" It builds a custom "search map" (called a Schema) specifically for your question. It doesn't just guess; it actively hunts for the missing pieces of the puzzle.

2. The "Quality Control Inspector" (Structured Extraction)

  • What it does: It takes the messy text from the documents and turns it into neat, clean tables (like an Excel spreadsheet).
  • The Analogy: Imagine a factory line where workers pull parts out of a junk pile. Most workers just grab whatever looks right. DocSage has a Quality Control Inspector who checks every part.
    • If a part says "Age: 180," the Inspector says, "That's impossible! Fix it."
    • If a part says "Company: Apple" but the database doesn't have an Apple entry, the Inspector says, "Go back and find the real source."
  • The Result: The messy text is converted into a clean database where every row makes sense and connects logically. This removes the "hallucinations" (lies) that AI usually makes.

3. The "SQL Detective" (Relational Reasoning)

  • What it does: Now that the data is in a clean table, DocSage doesn't "guess" the answer. It runs a precise database query (like a computer code called SQL) to join the tables and find the answer.
  • The Analogy: Instead of a detective wandering around a crime scene guessing, this is like a detective using a computer database to instantly link "Suspect A" to "Crime Scene B" and "Bank Account C."
  • The Benefit: Because the data is structured, the AI doesn't get confused by long texts. It can instantly see the path from A to B to C, even if they are in 50 different documents.

🏆 Why is DocSage a Game Changer?

The paper tested DocSage against the best AI models available (like GPT-4o) on two very hard tests:

  1. MEBench: A test with many different people and relationships.
  2. Loong: A test with extremely long documents (up to 250,000 words!).

The Results:

  • Accuracy: DocSage beat the competition by a huge margin (over 27% better).
  • Scalability: As the number of documents and entities grew, other AI models got confused and their scores dropped. DocSage stayed strong.
  • Reliability: It didn't just guess; it could show exactly where in the documents it found the answer (like a citation).

🚀 The Big Takeaway

DocSage proves that "Structure" is the secret sauce.

When humans face a complex problem, we don't just stare at the chaos; we draw a diagram, make a list, or build a spreadsheet. DocSage does the same thing for AI. By forcing the AI to organize the information first and check for errors before answering, it solves problems that were previously impossible for current AI.

It turns the "Needle in a Haystack" problem into a "Find the Needle in a Neatly Organized Box" problem.