DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

This paper introduces DataFactory, a collaborative multi-agent framework that overcomes the context, hallucination, and reasoning limitations of existing TableQA systems by orchestrating specialized agents for structured and relational reasoning, thereby achieving significant accuracy improvements across multiple benchmarks.

Tong Wang, Chi Jin, Yongkang Chen, Huan Deng, Xiaohui Kuang, Gang Zhao

Published Wed, 11 Ma
📖 4 min read☕ Coffee break read

Imagine you have a massive, messy warehouse full of data. Some of it is in neat, organized filing cabinets (structured tables), and some of it is a tangled web of sticky notes connected by strings (relationships between things).

You want to ask a question like, "Which sales team had the best quarter, and who are the key people connecting them to other departments?"

If you ask a standard AI (a "single-agent" model) to do this, it's like asking one overworked intern to do everything: open the cabinets, read the files, untangle the sticky notes, do the math, and write the report. The intern gets overwhelmed, forgets details, makes up facts (hallucinations), or gives up because the task is too big.

DataFactory is the solution. Instead of one intern, it builds a specialized factory team with three distinct roles working together under a smart manager.

Here is how it works, using simple analogies:

1. The Three-Team Factory

Instead of one brain trying to do everything, DataFactory splits the work into three specialized "departments":

  • The Data Leader (The Manager):
    • Role: This is the project manager. It doesn't do the heavy lifting itself. Instead, it listens to your question, breaks it down into smaller steps, and decides which team to call.
    • Superpower: It uses a "Think-Act-Observe" loop (called ReAct). If it asks a team for data and the answer is weird, it stops, thinks, and asks a different question. It's like a detective who checks their clues before moving to the next suspect.
  • The Database Team (The Accountants):
    • Role: These are the experts in the filing cabinets. They are great at math, sorting, counting, and finding exact numbers.
    • Superpower: They speak "SQL" (the language of databases). If you ask, "Who sold the most?", they instantly run a precise calculation to get the exact number. They are fast and accurate with hard facts.
  • The Knowledge Graph Team (The Detectives):
    • Role: These are the experts in the tangled web of sticky notes. They understand how things connect.
    • Superpower: They speak "Cypher" (the language of graphs). If you ask, "Who knows the people in the marketing team?", they can trace the invisible lines between people to find hidden connections that a simple list can't show.

2. How They Work Together (The Magic)

The real magic happens when these teams talk to each other.

  • The Problem with Old AI: Usually, an AI tries to guess the answer by reading the whole document at once. If the document is too long, it gets confused.
  • The DataFactory Way:
    1. The Manager hears your question: "Find the top sales team and see who they collaborate with."
    2. Step 1: The Manager asks the Accountants: "Who sold the most?" The Accountants run a quick math check and say, "Team A sold $1 million."
    3. Step 2: The Manager takes that result and asks the Detectives: "Now, show me all the connections Team A has with other people."
    4. Step 3: The Detectives trace the web and find, "Team A works closely with the Design team."
    5. Final Answer: The Manager combines these two facts into a clear, human-friendly answer: "Team A was the top seller, and they collaborate closely with the Design team."

3. Why This is Better (The "No Hallucination" Rule)

One of the biggest problems with AI is that it sometimes "makes things up" to sound smart.

  • Old Way: The AI guesses, "Maybe Team A worked with the Design team?" (It's just guessing).
  • DataFactory Way: The AI checks the facts first. The Accountants verify the sales numbers. The Detectives verify the connections. If the data isn't there, the team admits, "We couldn't find that connection," instead of making one up.

4. The Results

The paper tested this "Factory" against other AI methods on three different types of difficult puzzles.

  • The Result: The Factory team got 20% to 24% more correct answers than the single-intern AI.
  • The Secret Sauce: By splitting the work, the system didn't get confused. It could handle complex questions that required both math (Accountants) and relationship tracing (Detectives) at the same time.

In a Nutshell

DataFactory is like replacing a single, exhausted librarian who tries to memorize the whole library with a well-oiled team: a manager who directs traffic, a calculator who crunches numbers, and a detective who finds hidden links. By letting them talk to each other in plain English, they can solve complex data puzzles faster, more accurately, and without making things up.