DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

Imagine you have a massive, messy warehouse full of data. Some of it is in neat, organized filing cabinets (structured tables), and some of it is a tangled web of sticky notes connected by strings (relationships between things).

You want to ask a question like, "Which sales team had the best quarter, and who are the key people connecting them to other departments?"

If you ask a standard AI (a "single-agent" model) to do this, it's like asking one overworked intern to do everything: open the cabinets, read the files, untangle the sticky notes, do the math, and write the report. The intern gets overwhelmed, forgets details, makes up facts (hallucinations), or gives up because the task is too big.

DataFactory is the solution. Instead of one intern, it builds a specialized factory team with three distinct roles working together under a smart manager.

Here is how it works, using simple analogies:

1. The Three-Team Factory

Instead of one brain trying to do everything, DataFactory splits the work into three specialized "departments":

The Data Leader (The Manager):
- Role: This is the project manager. It doesn't do the heavy lifting itself. Instead, it listens to your question, breaks it down into smaller steps, and decides which team to call.
- Superpower: It uses a "Think-Act-Observe" loop (called ReAct). If it asks a team for data and the answer is weird, it stops, thinks, and asks a different question. It's like a detective who checks their clues before moving to the next suspect.
The Database Team (The Accountants):
- Role: These are the experts in the filing cabinets. They are great at math, sorting, counting, and finding exact numbers.
- Superpower: They speak "SQL" (the language of databases). If you ask, "Who sold the most?", they instantly run a precise calculation to get the exact number. They are fast and accurate with hard facts.
The Knowledge Graph Team (The Detectives):
- Role: These are the experts in the tangled web of sticky notes. They understand how things connect.
- Superpower: They speak "Cypher" (the language of graphs). If you ask, "Who knows the people in the marketing team?", they can trace the invisible lines between people to find hidden connections that a simple list can't show.

2. How They Work Together (The Magic)

The real magic happens when these teams talk to each other.

The Problem with Old AI: Usually, an AI tries to guess the answer by reading the whole document at once. If the document is too long, it gets confused.
The DataFactory Way:
1. The Manager hears your question: "Find the top sales team and see who they collaborate with."
2. Step 1: The Manager asks the Accountants: "Who sold the most?" The Accountants run a quick math check and say, "Team A sold $1 million."
3. Step 2: The Manager takes that result and asks the Detectives: "Now, show me all the connections Team A has with other people."
4. Step 3: The Detectives trace the web and find, "Team A works closely with the Design team."
5. Final Answer: The Manager combines these two facts into a clear, human-friendly answer: "Team A was the top seller, and they collaborate closely with the Design team."

3. Why This is Better (The "No Hallucination" Rule)

One of the biggest problems with AI is that it sometimes "makes things up" to sound smart.

Old Way: The AI guesses, "Maybe Team A worked with the Design team?" (It's just guessing).
DataFactory Way: The AI checks the facts first. The Accountants verify the sales numbers. The Detectives verify the connections. If the data isn't there, the team admits, "We couldn't find that connection," instead of making one up.

4. The Results

The paper tested this "Factory" against other AI methods on three different types of difficult puzzles.

The Result: The Factory team got 20% to 24% more correct answers than the single-intern AI.
The Secret Sauce: By splitting the work, the system didn't get confused. It could handle complex questions that required both math (Accountants) and relationship tracing (Detectives) at the same time.

In a Nutshell

DataFactory is like replacing a single, exhausted librarian who tries to memorize the whole library with a well-oiled team: a manager who directs traffic, a calculator who crunches numbers, and a detective who finds hidden links. By letting them talk to each other in plain English, they can solve complex data puzzles faster, more accurately, and without making things up.

Here is a detailed technical summary of the paper "DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering."

1. Problem Statement

Table Question Answering (TableQA) enables natural language interaction with structured tabular data. However, existing Large Language Model (LLM) approaches face three critical limitations:

Context Length Constraints: Direct prompting methods struggle with large tables, leading to information truncation.
Hallucinations: Models often generate answers not supported by the data, especially when relying solely on internal knowledge or insufficient context.
Reasoning Limitations: Single-agent architectures fail to handle complex multi-hop reasoning, semantic relationships, and cross-row synthesis effectively. They often lack the ability to dynamically switch between structured retrieval (SQL) and relational inference (Knowledge Graphs).

2. Methodology: The DataFactory Framework

The authors propose DataFactory, a collaborative multi-agent framework designed to overcome these limitations through specialized team coordination and automated knowledge transformation. The system operates on a tripartite architecture consisting of a Data Leader, a Database Team, and a Knowledge Graph (KG) Team.

A. Core Components

Data Leader (Orchestrator):
- Role: Acts as the central reasoning engine employing the ReAct (Reasoning and Acting) paradigm.
- Mechanism: Instead of rigid workflows, the Leader engages in natural language-based consultation with specialist teams. It decomposes complex user queries into subtasks (data exploration, strategy formulation, answer synthesis).
- Three-Stage Principle:
  1. Data Discovery: Explores available tables and graph structures before querying to avoid blind assumptions.
  2. Evidence-Based Planning: Samples data to verify entity existence and plans queries based on actual data patterns.
  3. Comprehensive Synthesis: Integrates results from both teams to generate coherent, multi-perspective answers.
Database Team (Structured Retrieval):
- Focus: Numerical computation, aggregation, filtering, and precise data retrieval.
- Agents: Includes Information Processing, Retrieval, Analysis, and Visualization agents.
- Innovation: Uses Context-Enhanced Text-to-SQL generation. It integrates historical QA pairs, schema definitions (DDL), and domain knowledge via Retrieval-Augmented Generation (RAG) to minimize hallucinations and ensure SQL correctness.
Knowledge Graph Team (Relational Reasoning):
- Focus: Semantic relationships, multi-hop reasoning, and entity association.
- Agents: Includes agents for processing, retrieval (Text-to-Cypher), analysis, and visualization.
- Innovation: Performs Automated Data-to-KG Transformation. It formalizes a mapping function $\mathcal{G} = \mathcal{F}(\mathcal{D}, \mathcal{S}, \mathcal{R})$ where tabular data ( $\mathcal{D}$ ), schema ( $\mathcal{S}$ ), and relationship rules ( $\mathcal{R}$ ) are automatically converted into a Knowledge Graph ( $\mathcal{G}$ ). This enables the discovery of implicit semantic links that SQL alone cannot capture.

B. Workflow

The system operates in three phases:

Information Storage: Automated ingestion of tabular data into both a SQL database and a Knowledge Graph (Neo4j) using LLM-assisted schema understanding.
Knowledge Extraction: The Data Leader directs teams to generate SQL or Cypher queries using context-enhanced prompts (incorporating history, schema, and domain rules).
Insight Generation: The Leader synthesizes results from both teams, resolving conflicts via data provenance analysis, and generates natural language answers with visualizations.

3. Key Contributions

Specialized Team Coordination: Moves beyond single-agent limitations by establishing dedicated Database and KG teams. This allows for systematic task decomposition where structured data processing and relational reasoning complement each other.
Automated Knowledge Integration: Introduces a formalized algorithm for transforming raw tabular data into semantic knowledge graphs, enabling consistent entity resolution and multi-hop reasoning without manual schema engineering.
Dynamic Reasoning Orchestration: Implements a ReAct-based Data Leader that uses natural language consultation rather than rigid workflows. This allows for adaptive strategy adjustment based on intermediate findings and query complexity.
Context Engineering: Reduces hallucinations by integrating historical QA patterns, DDL, and domain knowledge into the prompt engineering for both SQL and Cypher generation.

4. Experimental Results

The framework was evaluated on three benchmark datasets (TabFact, WikiTableQuestions, FeTaQA) using 8 LLMs from 5 providers (including GPT-4o, Claude 4.0, DeepSeek-V3, and Qwen3 series).

Performance Gains:
- TabFact: Improved accuracy by 20.2% over baselines.
- WikiTableQuestions: Improved accuracy by 23.9% over baselines.
- FeTaQA: Showed significant improvements in ROUGE-2 scores (up to 17.1% improvement over single-team variants).
- Statistical Significance: Cohen's $d$ values were consistently greater than 1, indicating large effect sizes.
Ablation Studies (RQ4): Removing the Knowledge Graph Team caused significant performance drops, particularly on multi-hop tasks (e.g., 14.4% drop on WikiTQ, up to 17.1% on FeTaQA), proving the necessity of relational reasoning.
Model Scalability (RQ2): The framework adapts well across model sizes. Even smaller models (e.g., Qwen3-14B) achieved competitive results through specialized team collaboration, though larger models (Claude 4.0 Sonnet) performed best overall.
Collaboration Frequency (RQ5): An inverted U-shaped relationship was found between team interaction frequency and performance. Optimal performance occurred at 1–3 interactions; excessive collaboration (>6 calls) led to error accumulation and performance degradation.

5. Significance and Impact

Theoretical: The paper advances multi-agent system theory by demonstrating that natural language consultation between specialized agents is superior to rigid workflow orchestration for complex reasoning. It bridges the gap between structured data processing and semantic knowledge representation.
Practical: The framework provides a production-ready platform (available at wisdomindata.netlify.app) that allows non-technical users to perform complex data analysis, multi-hop reasoning, and visualization without writing SQL or Cypher.
Robustness: By decoupling data ingestion, reasoning, and visualization, the system offers a scalable solution for enterprise data analysis, effectively reducing hallucinations and improving the interpretability of AI-driven insights.

In conclusion, DataFactory represents a significant leap in TableQA by treating data analysis as a collaborative, multi-disciplinary task rather than a single-model generation problem, successfully addressing the challenges of scale, reasoning depth, and reliability.

DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

1. The Three-Team Factory

2. How They Work Together (The Magic)

3. Why This is Better (The "No Hallucination" Rule)

4. The Results

In a Nutshell

1. Problem Statement

2. Methodology: The DataFactory Framework

A. Core Components

B. Workflow

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue

LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks

On the Strengths and Weaknesses of Data for Open-set Embodied Assistance

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning