Decomposition-Driven Multi-Table Retrieval and Reasoning for Numerical Question Answering

This paper proposes DMRAL, a decomposition-driven framework that constructs a table relationship graph and employs aligned question decomposition with coverage-aware retrieval and sub-question guided reasoning to significantly outperform existing methods in numerical multi-table question answering over large-scale table collections.

Feng Luo, Hai Lan, Hui Luo, Zhifeng Bao, Xiaoli Wang, J. Shane Culpepper, Shazia Sadiq

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a complex mystery: "How many total citations do all female Nobel Physics laureates have after 2010?"

To solve this, you don't just need one piece of paper. You need to dig through a massive, messy library containing 73,000 to 100,000 different spreadsheets (tables). Some are about Nobel prizes, some about scientists' genders, and some about their citation counts. Many of these spreadsheets are missing titles, have typos, or are split into tiny, confusing fragments.

Existing methods for solving these puzzles are like detectives who only know how to work in a neat, organized police station with a perfect filing system. When they walk into your messy library, they get lost, pick the wrong files, or give up.

This paper introduces a new detective team called DMRAL. Here is how they solve the case, explained with simple analogies:

1. The Problem: The "Messy Library"

Most current AI tools are trained on small, perfect databases (like a single police station). But the real world is a giant, chaotic data lake.

  • The Scale: There are too many tables to read one by one.
  • The Mess: Some files are missing labels (metadata), and the connections between files are hidden.
  • The Complexity: To answer the question, you might need to join (connect) two files, union (stack) two similar files together, and then do math.

2. The Solution: The DMRAL Detective Team

The authors built a three-step system to handle this chaos.

Step A: The "Smart Question Breaker" (Table-Aligned Question Decomposer)

Instead of asking the AI, "Find the answer to this huge question," DMRAL breaks the question down into tiny, manageable clues, just like a detective breaking a big case into smaller leads.

  • The Old Way: The AI guesses the sub-questions blindly. It might ask, "Who are the laureates?" and "Who is female?" but fail to realize it needs to look at specific columns in specific tables.
  • The DMRAL Way: It looks at the library's structure first. It says, "Okay, to find 'female laureates,' I need to look at the 'Gender' column in the 'Nobel' table. To find 'citations,' I need the 'Citations' column." It aligns the clues with the actual shelves in the library before searching. This ensures no clue is missed and no time is wasted.

Step B: The "Coverage Detective" (Coverage-Aware Retriever)

Once the clues are broken down, the team needs to find the right files.

  • The Old Way: They search for files that look somewhat related. They might grab a file about "Physics" but miss the specific file about "Post-2010," leading to an incomplete answer.
  • The DMRAL Way: They use a "Coverage Score." Imagine a checklist. If the AI picks a file, it checks: "Does this file cover the 'Gender' clue? Does it cover the 'Year' clue?"
    • If the checklist isn't full, the AI doesn't stop. It asks, "What's missing?" and goes back to find a complementary file to fill the gap. It keeps searching until the entire question is covered by the selected files.

Step C: The "Step-by-Step Reasoner" (Sub-question Guided Reasoner)

Now that they have the right files, they need to do the math.

  • The Old Way: The AI tries to write a giant, complex computer program (SQL) all at once to solve the whole puzzle. If it makes one small mistake (like a typo in a formula), the whole answer is wrong.
  • The DMRAL Way: It builds the program like a ladder, one rung at a time.
    1. First, it writes a tiny program to find the female laureates.
    2. Then, it writes a second tiny program to get their citations.
    3. Finally, it combines them.
    • The Safety Net: After writing each step, it runs the code. If it crashes or gives an error, it immediately fixes the code before moving to the next step. This prevents small mistakes from ruining the final answer.

3. The Results: Why It Matters

The authors tested this system on two massive datasets they created (called SpiderWild and BirdWild), which simulate real-world messiness.

  • Better Hunting: DMRAL found the right files 24% more often than the best existing methods.
  • Better Answers: Because it found the right files and built the program carefully, it got the correct final number 55% more often.

The Big Picture

Think of existing AI as a brilliant student who can solve math problems perfectly if the teacher gives them a clean, single textbook.

DMRAL is like a seasoned field agent. It knows how to navigate a chaotic, messy archive, knows how to break a big problem into small tasks, knows how to double-check its work, and knows how to stitch together information from dozens of different sources to get the right answer.

This is a huge step forward for letting computers help us analyze the massive, messy data that exists in the real world today.