Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals

This paper introduces "behavioral fidelity" as a critical new evaluation dimension for synthetic tabular data, demonstrating through a formal taxonomy and benchmarking that existing generators structurally fail to preserve essential temporal, velocity, and multi-account fraud patterns required for real-world detection systems.

Bhavana Sajja

Published 2026-04-16
📖 5 min read🧠 Deep dive

Imagine you are a bank trying to catch a thief. You don't just look at what they bought; you look at how they bought it. Did they swipe their card 50 times in one minute? Did they buy a $5,000 TV and a $5 coffee in the same second? Did they use the same Wi-Fi router as 20 other people who just bought gift cards?

These "behavioral fingerprints" are how banks catch fraud.

Now, imagine you want to train your AI to catch these thieves, but you can't show it real bank data because of privacy laws (like GDPR). So, you ask a computer program to make up fake bank data that looks just like the real thing. This is called "Synthetic Data."

The Big Problem:
This paper argues that the current "fake data makers" are terrible at copying the behavior. They are great at copying the stats, but they fail to copy the story.

Here is the breakdown using simple analogies:

1. The "Statistical Mirror" vs. The "Behavioral Ghost"

Think of the current fake data generators as photocopiers.

  • What they do well (Statistical Fidelity): If you have a bag of 1,000 marbles (990 blue, 10 red), the photocopier makes a new bag with 990 blue and 10 red marbles. It gets the counts right. It also gets the average size of the marbles right.
  • What they fail at (Behavioral Fidelity): In the real world, the 10 red marbles (the fraudsters) are usually clumped together in a tight, frantic pile because they are acting fast. The fake data maker scatters those 10 red marbles randomly throughout the bag.
  • The Result: The AI trained on the fake data thinks fraudsters are calm and scattered. When it goes to the real world, it misses the frantic, clumped-up thieves because it was never taught to look for that specific "panic pattern."

2. The Three Layers of Testing

The authors created a new way to test these fake data makers, like a three-level video game:

  • Level 1: The Look (Statistical Fidelity): Does the fake data look like the real data on a spreadsheet? (e.g., Are the average transaction amounts the same?)
    • Verdict: The generators pass this easily.
  • Level 2: The Test Score (Downstream Utility): If we train a fraud detector on the fake data, does it get a good grade on a test?
    • Verdict: The generators pass this too! They get high scores.
  • Level 3: The Behavior (Behavioral Fidelity): Does the fake data actually act like a fraudster?
    • Verdict: CATASTROPHIC FAILURE. This is where the paper breaks the news. The generators fail miserably here.

3. The Four "Behavioral Crimes" They Missed

The authors defined four specific ways fraudsters behave that the fake data completely destroys:

  • P1: The "Speed Burst" (Inter-Event Time): Real fraudsters act fast. They buy, buy, buy in seconds.
    • The Fake Data: The fake transactions are spaced out randomly, like a normal person shopping over a week. The "burst" is gone.
  • P2: The "Sprint and Vanish" (Burst Structure): Fraudsters often do a quick burst of activity and then disappear for days.
    • The Fake Data: The fake accounts are either always active or never active. They don't have that "sprint" rhythm.
  • P3: The "Secret Club" (Shared Infrastructure): Real fraud rings often share devices (like one laptop used by 50 different fake accounts).
    • The Fake Data: The generators give every single fake account its own unique, brand-new laptop. The "Secret Club" structure is completely erased.
    • Analogy: It's like a detective trying to find a gang of thieves, but the fake data shows every thief using a different, unconnected car. The detective can't see the connection.
  • P4: The "Speed Limit" (Velocity Rules): Banks have rules like "If you buy more than 3 items in 1 hour, we flag you."
    • The Fake Data: Because the fake data is so spread out, these rules almost never trigger. If you tune your alarm system based on this fake data, you will set the sensitivity way too low, and the real thieves will walk right past your alarm.

4. The "Row-Independence" Trap

Why do these generators fail? The paper explains a fundamental flaw in how they work.

Imagine a writer trying to write a novel where every character is a bank customer.

  • Current Generators: They write one page at a time, completely forgetting the previous page. They write "Customer A bought a shoe." Then they write "Customer B bought a hat." They don't know that Customer A and B are actually the same person acting fast, or that they are using the same computer.
  • The Problem: Because they generate each row (transaction) independently, they physically cannot create the complex connections (like shared devices) or the time-based rhythms (like bursts) that define fraud.

Even the most advanced "Autoregressive" generator (which tries to look at the previous item in the same row) couldn't fix this. It's like trying to understand a conversation by only reading one sentence at a time without remembering the previous sentence.

5. The Conclusion: Don't Trust the Fake Data Yet

The paper concludes that we cannot currently use synthetic data to train fraud detection systems if those systems rely on timing, speed, or shared devices.

  • The Good News: The authors released their "Behavioral Fidelity" test kit as open source. Now, anyone can test if a new fake data generator actually preserves these "behavioral ghosts."
  • The Bad News: Until we invent a new type of AI that can remember the "story" of a customer across multiple transactions and devices, using fake data for fraud detection is like training a security guard with a map of a city that has no streets—only random dots.

In short: The current fake data looks like the real thing, but it doesn't act like the real thing. And in the world of fraud, action is everything.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →