EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

This paper introduces EvoSchema, a comprehensive benchmark featuring a novel taxonomy of ten schema perturbation types to evaluate and enhance the robustness of text-to-SQL models against real-world database schema evolution, revealing that table-level changes significantly impact performance and demonstrating that training on diverse schema designs improves model resilience.

Tianshu Zhang, Kun Qian, Siddhartha Sahai, Yuan Tian, Shaddy Garg, Huan Sun, Yunyao Li

Published Thu, 12 Ma
📖 4 min read☕ Coffee break read

Imagine you have a very smart, highly trained personal assistant named SQL-Steve. Steve's job is to listen to your questions in plain English (like "Show me all customers who bought red shoes last week") and instantly write a complex computer code (SQL) to get the answer from a giant digital filing cabinet (the database).

Steve is great at his job, but he has a major weakness: he is rigid.

The Problem: The Filing Cabinet Moves

In the real world, companies constantly reorganize their filing cabinets.

  • Sometimes they rename a folder from "Customers" to "Clients."
  • Sometimes they split one big folder into two smaller ones (e.g., separating "Personal Info" from "Medical History").
  • Sometimes they throw away old folders entirely.

If the filing cabinet changes, but Steve was only trained on the old layout, he gets confused. He might look for a folder that no longer exists or try to open a drawer that has been moved. His performance crashes.

Most previous research tried to make Steve smarter by teaching him to handle tricky wording or simple name changes. But they didn't prepare him for the big, structural changes that happen in real life.

The Solution: EvoSchema (The "Chaos Simulator")

The authors of this paper created a new training ground called EvoSchema. Think of it as a simulation game where they intentionally break and rebuild the filing cabinet in 10 different ways to test Steve's adaptability.

They categorized these changes into two levels:

  1. Column-Level (The "Drawer" Level): Changing the labels on the drawers inside a folder (e.g., renaming "Phone Number" to "Contact Info" or splitting "Full Name" into "First Name" and "Last Name").
  2. Table-Level (The "Folder" Level): This is the big stuff. Merging two folders into one, splitting one folder into three, or deleting a whole section.

The Big Discovery:
When they tested various AI models (both open-source and big corporate ones like GPT-4), they found something surprising: Steve is much more confused by Folder-level changes than Drawer-level changes.

  • If you just rename a drawer, Steve can usually figure it out.
  • If you merge two folders or split one into three, Steve often panics and fails completely.

The Fix: "Chaos Training"

So, how do you fix Steve? You don't just teach him the old layout; you teach him the new layout while keeping the old one in mind.

The authors introduced a new training method:

  • Old Way: Teach Steve: "Question A + Old Layout = Answer A."
  • New Way (EvoSchema): Teach Steve: "Question A + Old Layout = Answer A," AND "Question A + New Layout (with renamed/split folders) = Answer A."

By forcing Steve to answer the same question using different database structures, he stops memorizing specific folder names. Instead, he learns the logic of how to find the answer, no matter how the filing cabinet is rearranged.

The Results

  • Robustness: Models trained with this "Chaos Training" became much tougher. When faced with a completely reorganized database, they didn't crash; they adapted.
  • The Gap: Models trained this way actually outperformed even the most expensive, closed-source AI models (like GPT-4) when the database structure changed.
  • The Lesson: To build a truly reliable AI, you can't just train it on a static, perfect world. You have to train it in a world that changes, breaks, and evolves.

In a Nutshell

This paper is about teaching AI to be flexible. Instead of building a robot that only works in a perfectly organized office, they built a training program that throws the office into chaos (moving desks, renaming rooms, merging departments) so the robot learns to find the answers no matter what the office looks like.

EvoSchema is the gym where these AI models go to get strong enough to handle the messy, evolving reality of the real world.