MoDora: Tree-Based Semi-Structured Document Analysis System

MoDora is an LLM-powered system that addresses the challenges of analyzing semi-structured documents by employing a local-alignment aggregation strategy, a Component-Correlation Tree for hierarchical organization, and a question-type-aware retrieval mechanism to significantly improve accuracy in natural language question answering.

Bangrui Xu, Qihang Yao, Zirui Tang, Xuanhe Zhou, Yeye He, Shihan Yu, Qianqian Xu, Bin Wang, Guoliang Li, Conghui He, Fan Wu

Published 2026-03-02
📖 5 min read🧠 Deep dive

Imagine you have a massive, messy library. But this isn't a normal library with books neatly stacked on shelves. This library is filled with semi-structured documents.

Think of these documents like a chaotic collage: they have paragraphs of text, but also tables, charts, sidebars, footnotes, and images all mixed together on the same page. Sometimes a table is on page 1, but the explanation for it is on page 5. Sometimes a chart is floating in the middle of a paragraph.

The Problem:
If you ask a human librarian, "What was the feather score for the winter experiment?" they can easily flip through the pages, find the paragraph about "winter," look at the table on the next page, and give you the answer.

But if you ask a computer (specifically, current AI models) this question, it often gets lost.

  • Old Method 1 (The Scanner): It reads the text but ignores the layout. It sees the word "winter" and the word "table" but doesn't know they are connected. It's like reading a recipe where the ingredients list is on page 1 and the cooking instructions are on page 10, but the book is torn up and shuffled.
  • Old Method 2 (The Image Viewer): It looks at the whole page as a picture. It might see the table, but it misses the tiny text saying "this experiment happened in winter." It's like looking at a map but ignoring the legend.
  • Old Method 3 (The Search Engine): It grabs random chunks of text that sound similar to your question but misses the big picture. It's like finding a sentence about "feathers" in a completely different chapter.

The Solution: MoDora
The authors of this paper built a new system called MoDora. Think of MoDora not as a scanner or a search engine, but as a super-organized architect who rebuilds the messy library into a perfect, logical treehouse.

Here is how MoDora works, step-by-step:

1. The "Local-Alignment" Strategy (Grouping the Clutter)

Imagine you walk into a room where someone has thrown all the furniture, books, and pictures on the floor.

  • What MoDora does: It doesn't just look at individual items. It says, "Okay, this title belongs with these three paragraphs. This chart belongs with that specific table."
  • The Analogy: It's like a detective gathering clues. It groups the "Title" with the "Story" and the "Chart" with its "Data" into neat, self-contained bundles called Components. It ignores the messy page numbers and footers for a moment to focus on the meaningful groups.

2. Building the "CCTree" (The Organized Treehouse)

Once the items are grouped, MoDora builds a Component-Correlation Tree (CCTree).

  • The Analogy: Imagine a family tree, but instead of people, it's made of document sections.
    • The Root is the main title of the document.
    • The Branches are the chapters (Introduction, Methods, Results).
    • The Leaves are the specific details (a paragraph, a table, a chart).
  • Why this helps: In a normal document, a table might be far away from the text explaining it. In MoDora's tree, the table is a "child" of the text that explains it. The system knows, "Ah, this chart is a child of the 'Experiment Design' branch." It preserves the hierarchy (who is the boss of what) and the layout (where things are on the page).

3. The "Bottom-Up Summarization" (The Smart Summary)

Now, imagine you are the boss of this treehouse. You don't want to read every single leaf to know what's happening.

  • What MoDora does: It starts at the bottom (the leaves) and asks the AI, "What is the main point of this paragraph?" Then it moves up one level and asks, "What is the main point of this chapter, including the summary of the paragraph below?"
  • The Analogy: It's like a news network. The reporter at the scene (the leaf) sends a quick summary to the editor (the branch), who sends a summary to the anchor (the root). By the time the question reaches the top, the system has a "map" of the whole document without needing to read every single word again.

4. The "Question-Type-Aware" Detective (Finding the Answer)

When you ask a question, MoDora doesn't just search for keywords. It acts like a detective with a specific plan based on the type of question:

  • If you ask "Where is the logo?" (Location Question): MoDora looks at the "map" of the tree. It knows the logo is in the "Header" branch, specifically in the top-right corner of Page 1. It goes straight there.
  • If you ask "What did the experiment find?" (Semantic Question): MoDora uses its "Smart Summary" (the CCTree) to skip irrelevant branches. It says, "The 'Introduction' branch isn't relevant, but the 'Results' branch is." It then zooms in on the specific table or paragraph needed.
  • The Safety Net: If the AI isn't 100% sure, it uses a "Verifier" (another AI) to double-check the evidence before giving an answer, ensuring it doesn't make things up (hallucinate).

The Result

In tests, MoDora was much better at answering questions about these messy documents than previous methods.

  • Old methods were like trying to find a needle in a haystack by guessing.
  • MoDora is like having a magnet that knows exactly where the needle is, because it understands the structure of the haystack.

In short: MoDora takes a chaotic, multi-page document, organizes it into a logical family tree, summarizes the relationships, and then uses that structure to find the exact answer you need, whether it's a number in a table or a location on a page. It turns a messy collage into a clear story.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →