STAMP: Selective Task-Aware Mechanism for Text Privacy

STAMP is a novel framework for text privacy that optimizes the privacy-utility trade-off by selectively allocating privacy budgets based on token importance and sensitivity, utilizing a polar mechanism to perturb embedding directions while preserving magnitude and semantic structure.

Fengwei Tian, Payel Bhattacharjee, Heidi Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon

Published Fri, 13 Ma
📖 4 min read☕ Coffee break read

Imagine you are sending a very important letter to a friend, but you have to hand it to a suspicious courier (the AI model) to deliver it. You want your friend to understand the message perfectly, but you don't want the courier to see your secret address, your bank account number, or your mother's maiden name.

Traditionally, people tried to solve this by scrubbing the whole letter. They would take a giant eraser and blur out every single word equally.

  • The Problem: If you blur out the word "Einstein" in a question about physics, your friend can't answer the question. If you blur out "apple" in a recipe, the recipe is ruined. You lose the utility (usefulness) of the letter just to protect the secrets.

This paper introduces STAMP, a smarter way to handle this. Think of STAMP as a High-Tech, Selective Redaction Pen that knows exactly what to hide and what to keep clear.

Here is how it works, broken down into three simple concepts:

1. The "Traffic Light" System (Selective Budgeting)

Instead of treating every word the same, STAMP looks at every word in your sentence and asks two questions:

  1. Is this word a secret? (e.g., "John Smith," "Credit Card #1234")
  2. Is this word important for the task? (e.g., "Einstein" for a physics question, "Delicious" for a food review).

It then sorts words into four groups, like traffic lights:

  • 🔴 Red (Secret + Unimportant): These are sensitive words that don't help the task (like a name in a weather report). STAMP gives these the maximum protection. It scrambles them heavily so the courier can't guess them at all.
  • 🟢 Green (Not Secret + Important): These are the words your friend needs to understand the message (like "rain" in a weather report). STAMP gives these almost no protection. They stay clear and crisp.
  • 🟡 Yellow (Secret + Important): These are tricky (like a name that is also the answer to a riddle). STAMP has to balance them, giving them a "medium" amount of scrambling.
  • ⚪ White (Not Secret + Unimportant): Words like "the," "and," or "very." These get a little bit of scrambling, but not much.

The Analogy: Imagine you are packing a suitcase for a trip. You don't wrap your entire suitcase in bubble wrap. You wrap your fragile, expensive vase (the secret) in thick bubble wrap, but you leave your t-shirt (the important info) loose so it's easy to grab. STAMP does exactly this with words.

2. The "Spinning Top" Trick (The Polar Mechanism)

Once STAMP decides how much to scramble a word, it has to actually change the word without making it look like gibberish.

Most old methods tried to scramble words by adding "static noise" (like turning up the volume on a radio until it's just static). This often breaks the meaning.

STAMP uses a clever geometric trick called the Polar Mechanism.

  • The Analogy: Imagine every word is a spinning top standing on a table. The top has a height (how strong the word is) and a direction it is pointing (what the word means).
  • The Magic: STAMP only spins the top to change its direction. It leaves the height exactly the same.
  • Why this helps: In the world of AI, the "direction" of a word is what gives it meaning. By only spinning the direction slightly, the word stays in the same "neighborhood" of meaning. "Cat" might spin slightly to become "Kitten" or "Feline," but it won't accidentally turn into "Banana." This keeps the sentence readable while still hiding the exact original word.

3. The Result: A Better Trade-Off

The paper tested this on three different tasks:

  • Answering Questions (SQuAD): Can the AI answer "Who developed relativity?" even if the name "Einstein" is hidden? Yes, because STAMP kept the context words clear.
  • Sentiment Analysis (Yelp): Can the AI tell if a restaurant review is positive or negative? Yes, because the words describing the food weren't scrambled.
  • News Classification (AG News): Can the AI tell if an article is about Sports or Politics? Yes.

The Bottom Line:
Old methods were like putting a blindfold on the whole team. STAMP is like putting a blindfold only on the players who are holding the secrets, while letting the players who need to see the ball keep their eyes open.

This allows you to send your data to the cloud (or an AI) with stronger privacy for your secrets, but much better performance for the task you actually want to do. It's the best of both worlds: you get your privacy and your utility.