Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL

Imagine you are trying to give a complex set of instructions to a very smart, but slightly forgetful, assistant who speaks a different language (SQL, the language of databases).

In a single-turn conversation, you say, "Show me all the dogs," and the assistant gets it right. But in a multi-turn conversation, things get messy. You might say:

"Show me all the dogs."
"Now, show me the ones that are brown."
"Actually, just the ones owned by students."

As the conversation gets longer, the assistant starts to get confused. It might forget which table in the database holds "student" info, or it might get lost trying to connect "brown" to the previous "dog" request. It's like trying to navigate a city while someone keeps changing the street signs and adding new neighborhoods without telling you.

This paper introduces Track-SQL, a new framework designed to be the "super-assistant" that never loses its place.

The Two Main Problems

The authors identified two big headaches for AI in these long conversations:

The "Where am I?" Problem (Schema Linking): Databases have thousands of tables and columns. As you talk, the AI needs to know exactly which parts of the database are relevant right now. If it looks at the wrong "street" (table), it gives the wrong answer.
The "What did we just say?" Problem (Context Tracking): The AI needs to remember what you asked in the previous turns to understand your current question. If it forgets the context, it can't connect the dots.

The Solution: Track-SQL's "Dual-Extractive Modules"

Track-SQL doesn't just guess; it uses two specialized tools (modules) to clean up the mess before the main AI even starts writing the SQL code. Think of these as a Research Librarian and a Memory Keeper.

1. The Research Librarian (Semantic-enhanced Schema Extractor)

The Metaphor: Imagine you are looking for a specific book in a massive library. A normal AI might try to read every single book title. The Research Librarian, however, knows your current topic.
How it works: Before the AI tries to write the answer, this module scans the database and says, "Hey, the user is talking about 'students' and 'pets'. We don't need to look at the 'cars' or 'weather' tables. Let's just pull out the 'Students' and 'Pets' shelves."
The Magic: It also fixes confusing names. If a column is named "cont_id" (which is vague), the librarian uses a smart tool to realize it actually means "Continent ID" and clarifies it for the AI. This prevents the AI from getting lost in a sea of irrelevant data.

2. The Memory Keeper (Schema-aware Context Extractor)

The Metaphor: Imagine you are writing a story with a friend. You say, "And then the hero..." Your friend needs to know which hero you are talking about. The Memory Keeper looks back at your previous sentences to find the right character.
How it works: When you ask a new question, this module looks at your past questions and the SQL answers that were generated before. It finds the most relevant "base" answer from the past and says, "Okay, the user is building on this specific previous answer. Let's use that as a starting point."
The Magic: It filters out the noise. Instead of feeding the AI the entire history of the conversation (which might be confusing), it gives the AI a clean, relevant "cheat sheet" of what matters right now.

The Result: A Clearer Path

By using these two tools, Track-SQL acts like a filter. It strips away the confusing, irrelevant parts of the database and the conversation history, leaving the main AI with a clean, focused prompt.

Without Track-SQL: The AI is like a driver trying to drive through a city with foggy windows and changing road signs. It often crashes (makes mistakes).
With Track-SQL: The fog is cleared, the road signs are fixed, and the driver has a GPS that knows exactly where they've been and where they need to go.

Why It Matters

The researchers tested this on two major datasets (SParC and CoSQL) which are like "final exams" for AI in this field.

The Score: Track-SQL scored significantly higher than previous methods.
The Improvement: It improved the accuracy of the answers by about 7% to 9.5%. In the world of AI, that's a massive jump. It means the AI is much more reliable at handling complex, back-and-forth conversations about data.

In a Nutshell

Track-SQL is a system that teaches AI to organize its notes and remember its context before it tries to solve a problem. Instead of guessing in the dark, it shines a flashlight on the right parts of the database and the right parts of the conversation, leading to much smarter and more accurate answers.

Here is a detailed technical summary of the paper "Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL".

1. Problem Statement

While Generative Language Models (LLMs) have shown significant success in single-turn Text-to-SQL tasks, their performance degrades significantly in multi-turn scenarios. The authors identify two primary bottlenecks in current approaches:

Dynamic Schema Linking: As dialogues progress, the relevant database schema (tables and columns) changes dynamically. Existing methods often struggle with redundant links (linking to irrelevant schema items as the schema graph grows) or semantic inconsistencies (e.g., ambiguous column names like "continent" in different tables having different meanings). Static schema linking methods fail to adapt to evolving user intents.
Context Information Filtering: In multi-turn interactions, users often omit information or reference prior turns (coreference). Existing models struggle to track the evolving context, leading to error propagation when they fail to retrieve the correct historical SQL or relevant schema elements from previous turns.

2. Methodology: The Track-SQL Framework

The authors propose Track-SQL, a framework designed to enhance generative LLMs by integrating two dual-extractive modules before the final SQL generation step. The core philosophy is to perform dynamic schema linking and context filtering explicitly, providing the generative model with a streamlined, high-quality input.

The framework consists of three main stages:

A. Semantic-enhanced Schema Extractor (SESE)

This module addresses the schema linking challenge by filtering redundant schema items and resolving semantic ambiguities.

Semantic Enhancement: The system uses an LLM (GPT-3.5) to generate descriptive annotations for database columns and tables based on their names, types, and sample values. This enriches the schema with open-domain knowledge, bridging the semantic gap between user queries and database structures.
Gating Mechanism: A semantic enhancement layer uses an attention gating mechanism to aggregate the original schema embeddings with the generated annotation embeddings. This helps resolve ambiguities (e.g., distinguishing "continent name" from "continent ID").
All-Column Intent Detection (ACID): The extractor specifically detects implicit user intents where a user implies "all columns" (e.g., "Show all data"). It treats the wildcard * as a special column identifier to ensure the model retrieves all necessary data.
Output: It outputs a probability distribution over schema items, filtering out irrelevant tables/columns based on a threshold.

B. Schema-aware Context Extractor (SACE)

This module addresses the context tracking challenge by selecting the most relevant historical SQL to serve as a "base" for the current turn.

Dual-Metric Scoring: To select the best historical SQL ( $SQL_{base}$ $S Q L_{ba se}$ ) from the dialogue history, SACE calculates a comprehensive relevance score ( $R_h$ $R_{h}$ ) based on two factors:
1. Semantic Similarity ( $S_{sim}$ ): Measured using SentenceBERT between the current question and historical questions.
2. Schema Overlap ( $P_{sim}$ ): Measured using Jensen-Shannon divergence between the schema item probability vectors of the current turn and historical turns. This ensures the historical SQL shares the same underlying database entities.
Error Mitigation: By selecting a structurally similar and semantically relevant historical SQL, the model reduces the risk of error propagation common in methods that blindly copy previous turns.

C. SQL Generation Fine-tuning

The filtered schema (from SESE) and the selected base SQL (from SACE) are concatenated with the multi-turn question history to form the input for the generative model.

Input Format: $Q_{\le m}$ (concatenated questions) + $E(S)$ (extracted schema) + $SQL_{base}$ (historical reference).
Training: The generative model (e.g., CodeLlama, DeepSeek, Mistral) is fine-tuned using Supervised Fine-Tuning (SFT) with LoRA to generate the target SQL ( $s_m$ ). This transforms the multi-turn problem into a constrained single-turn generation task with high-quality context.

3. Key Contributions

Dual-Extractive Framework: The introduction of Track-SQL, which decouples schema linking and context tracking from the generative process, allowing for explicit optimization of these critical sub-tasks.
Semantic-Enhanced Schema Extractor (SESE): A novel approach combining LLM-generated schema annotations with a gating mechanism to handle semantic ambiguity and a specific module (ACID) to detect "all-column" intents.
Schema-aware Context Extractor (SACE): A retrieval mechanism that combines semantic similarity with schema overlap metrics to select the most appropriate historical SQL, improving coreference resolution.
State-of-the-Art Performance: The framework achieves leading results on authoritative benchmarks without relying on complex post-processing or beam search strategies.

4. Experimental Results

The authors evaluated Track-SQL on the SParC and CoSQL datasets using 7B-scale models (CodeLlama, DeepSeek, Mistral).

Performance Gains:
- On SParC, Track-SQL improved Execution Accuracy (EX) by 7.1% and Test Suite Accuracy (TS) by 7.35% over the baseline.
- On CoSQL, it improved EX by 9.55% and TS by 5.8%.
- It outperformed both In-Context Learning (ICL) methods (e.g., ACT-SQL, CoE-SQL) and other Fine-tuned baselines (e.g., RASAT, HIE-SQL).
Ablation Studies:
- Removing SESE caused the largest performance drop (approx. 6-7% in EX), highlighting the critical importance of precise schema linking.
- Removing SACE significantly impacted multi-turn accuracy (approx. 5-6% drop), confirming the necessity of effective context retrieval.
- ACID showed modest but consistent gains, particularly in handling wildcard queries.
Efficiency: The framework maintains low latency (approx. 1.35s end-to-end inference) and is computationally efficient, converging within ~1.5 hours for the generator and ~31 hours for the extractor on standard hardware.

5. Significance and Impact

Bridging the Gap: Track-SQL effectively bridges the performance gap between single-turn and multi-turn Text-to-SQL, a critical step toward deploying reliable conversational database assistants.
Explainability: By explicitly extracting schemas and selecting historical contexts, the system's decision-making process becomes more transparent compared to "black-box" end-to-end generation.
Generalizability: The framework is model-agnostic and demonstrated robust performance across different 7B parameter LLMs, suggesting it can be widely adopted to enhance existing generative models.
Practicality: The open-sourced implementation and efficient inference times make it a viable solution for real-world applications requiring interactive database querying.

In conclusion, Track-SQL demonstrates that explicitly modeling schema dynamics and context dependencies via extractive modules is a superior strategy to relying solely on the generative capabilities of LLMs for complex, multi-turn database interactions.