TabStruct: Measuring Structural Fidelity of Tabular Data

This paper introduces TabStruct, a comprehensive evaluation benchmark and a novel "global utility" metric that jointly assesses the structural fidelity and conventional performance of tabular data generators across 29 real-world datasets without requiring ground-truth causal structures.

Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are a chef trying to teach a robot to cook a perfect steak. You give the robot a database of thousands of real steak recipes (the "real data"). The robot then tries to generate its own new recipes (the "synthetic data").

In the past, people checked if the robot was doing a good job by asking two simple questions:

  1. Does it taste like a steak? (Density Estimation: Does the new recipe look statistically similar to the old ones?)
  2. Can you use it to win a cooking contest? (ML Efficacy: If you use the robot's recipes to train a new chef, does that chef win awards?)

The Problem:
The authors of this paper, TabStruct, realized there's a hidden flaw. A robot could cheat! It could memorize the flavor of the steak perfectly and help a chef win a contest, but it might completely mess up the physics of cooking. For example, it might generate a recipe that says, "If you add more heat, the steak gets colder."

In the real world, data isn't just a list of numbers; it has a hidden "skeleton" or "causal structure." Just like gravity always pulls things down, variables in a dataset often have cause-and-effect relationships (e.g., "More rain causes more grass growth"). If a robot generates data that breaks these rules, it's useless for scientific discovery or understanding the real world, even if it looks good on a test score.

The Solution: TabStruct
The paper introduces a new way to test these data-generating robots called TabStruct. Think of it as a "Physics Test" for data.

Here is how they do it, using simple analogies:

1. The "Toy vs. Real World" Problem

Previously, researchers tested robots on "Toy Datasets"—simple, made-up worlds where they knew the exact rules (like a video game with known physics). But in the real world (like healthcare or finance), we don't have a "rulebook" to check against. We don't know the true causal structure of a disease or a stock market crash.

The Analogy: Imagine testing a self-driving car only in a simulator where you know exactly where every pothole is. It passes! But when you put it on a real road with unknown potholes, it crashes.

2. The New Metric: "Global Utility"

Since we can't check the "rulebook" for real-world data, the authors invented a clever trick called Global Utility.

The Analogy: Imagine you have a puzzle.

  • Old way: You check if the puzzle pieces look like the picture on the box (Density Estimation).
  • New way (Global Utility): You take every single piece of the puzzle, hide it, and ask the robot to guess what that piece is based on all the other pieces.
    • If the robot is good, it can guess the hidden piece perfectly because it understands how the pieces fit together (the causal structure).
    • If the robot is bad, it guesses randomly because it only memorized the picture, not the logic of how the pieces connect.

By testing every variable this way, they get a score that tells you: "Does this robot understand the deep logic of the data, or is it just faking it?"

3. The Big Discovery

The authors tested 13 different types of data-generating robots (from simple math tricks to complex AI) on 29 different real-world datasets.

The Surprise:

  • The "Cheaters": Some popular methods (like SMOTE) were great at making data that looked like the original and helped win prediction contests. But when you tested their "physics" (Global Utility), they failed miserably. They broke the causal rules.
  • The "Architects": Newer models based on Diffusion (a technique similar to how AI generates images by slowly adding and removing noise) turned out to be the best at preserving the true "skeleton" of the data. They didn't just mimic the surface; they understood the deep connections.

Why This Matters

If you are a doctor using AI to generate fake patient data to train a new diagnostic tool, you don't just want data that looks real. You want data that respects the laws of biology. If the AI thinks "taking aspirin causes a fever," your new doctor will learn the wrong lesson and hurt patients.

TabStruct gives us a way to check if the AI is respecting the laws of the universe (or the specific domain) before we trust it with real-world decisions. It shifts the focus from "Does it look good?" to "Does it make sense?"

In a nutshell:

  • Old Test: "Does the fake data look like the real data?"
  • TabStruct Test: "Does the fake data follow the same hidden rules and cause-and-effect relationships as the real data?"
  • Result: We found that many popular AI tools are great at faking the look, but terrible at understanding the logic. We need to start using "Global Utility" to find the ones that truly understand the data.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →