The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

This paper argues that current evaluations of attacks on PII removal techniques are flawed due to unmitigated data leakage and contamination, creating a paradox where trustworthy research requires access to private data that is inherently restricted from public scrutiny.

Sebastian Ochs, Ivan Habernal

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you have a very important, private diary. You want to share it with the world so researchers can learn from your experiences, but you don't want anyone to know who you are.

So, you take a red marker and cross out your name, your address, your phone number, and your birthday. You think, "Perfect! Now it's anonymous. My secrets are safe."

This paper is about a group of researchers who are trying to figure out if that red marker is actually strong enough to protect you. They have a big problem: They suspect that the tests used to prove the red marker is weak are actually rigged.

Here is the breakdown of their argument, using some simple analogies.

1. The "Rigged Test" Problem

Imagine a security guard (the researcher) trying to see if a new lock (the PII removal tool) is good. To test it, they hire a thief (the attack model) to try to break in.

The researchers in this paper looked at many recent "thief vs. lock" tests and found a major flaw: The thief was often cheating.

  • The "News Leak" Cheat: In some tests, the thief was given the diary and the local newspaper. The newspaper had already reported on the diary's owner. So, when the thief guessed the name, they weren't breaking the lock; they were just reading the newspaper. The test made the lock look weak, but it was actually the newspaper that was the problem.
  • The "Memory Leak" Cheat: In other tests, the thief was a super-smart AI that had already read the original diary before it was redacted. When the thief saw the redacted version, they didn't have to guess; they just remembered what they had read before. It's like asking someone to solve a math problem they already memorized the answer to. The test concluded the lock was weak, but really, the thief just had the answer key.

The Conclusion: The researchers argue that we have been overestimating how easy it is to hack these privacy tools because the "thieves" in the experiments had unfair advantages (like access to the original data or public news).

2. The "Impossible Lab" Problem

So, the researchers asked: "Okay, let's do a fair test. Let's make a new lock, give it to a thief who has NEVER seen the original diary, and see if they can break it."

But here is the catch: They can't do this test.

To test the lock properly, you need a "real" private diary that no one has ever seen before. But:

  • Real private data (like real medical records or court files) is locked away by strict laws. Researchers aren't allowed to touch it.
  • Fake data (made by computers) sounds like a good idea, but computers are trained on the internet. If you ask a computer to write a "fake" private diary, it might accidentally copy real secrets from the internet it learned from. So, the "fake" diary isn't actually fake; it's just a mix of public secrets.

The Paradox:

  • If you use real data, you can't publish the results because that would violate privacy laws.
  • If you use fake or public data, the test is flawed because the "thief" might already know the answers.

The researchers call this a "conundrum." It's like trying to test a bomb-proof safe, but you aren't allowed to put a real bomb inside it, and if you use a fake bomb, the test doesn't count.

3. The "Leaky Bucket" Reality

To prove their point, the researchers did a small, safe experiment. They took two types of data that were unlikely to be in an AI's memory:

  1. Old Czech Court Announcements: These were posted online for a month and then deleted.
  2. New YouTube Travel Vlogs: Videos uploaded after the AI was trained.

They redacted the names and places, then asked a powerful AI to guess what was missing.

  • The Result: The AI guessed correctly about 19% of the time for the videos and 5% for the court papers.
  • Why? Not because the AI was a genius detective, but because the "red marker" missed some clues. For example, if the AI saw "I Love NY gift shop" left unmasked, it could guess the location was New York. Once it knew the location, it could guess the names of famous people associated with that place.

This showed that even with "new" data, the tools aren't perfect. But the researchers emphasize that we can't know for sure if the tools are truly broken without being able to test them on massive amounts of real, private data that we aren't allowed to touch.

The Big Takeaway

The paper ends with a sad but honest conclusion:

We are currently stuck.
We cannot scientifically prove whether our privacy tools (redacting names from text) are safe or unsafe in the real world.

  • The tests we can do are flawed because they use data the AI might already know.
  • The tests we should do (using real private data) are illegal or unethical to publish.

The Solution?
The researchers suggest we need a new way of thinking. Instead of just trying to "break" the locks with hackers, we need to build a mathematical theory of privacy that accounts for how AI learns and how information flows. We need to define the rules of the game before we start playing, so we know exactly what "safe" means in a world where super-smart computers can read between the lines.

Until then, we are flying blind, hoping our red markers are strong enough to keep our secrets safe.