Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning

Imagine you are a chef trying to create the perfect, secret family recipe for a new dish. You have a small, private notebook of your grandmother's notes (the Private Data), but you don't want anyone to see it. Instead, you decide to use those notes only as a guide to pick the best ingredients from a massive, public supermarket (the Public Data).

You look at your grandmother's notes, find the flavors you love, and then go to the supermarket to buy only the specific tomatoes, spices, and herbs that match those flavors. You throw away the rest. Finally, you cook your dish using only the supermarket ingredients.

The assumption was: "Since I never cooked with my grandmother's actual notes, and I only used public supermarket ingredients, no one can figure out what was in my secret notebook."

This paper says: That assumption is wrong.

The researchers discovered that the very act of choosing those ingredients leaks secrets about your grandmother's notebook. Even if you never show the notebook, the specific combination of supermarket items you bought, the way you ranked them, and even the final taste of the dish can give away exactly what was in your private notes.

Here is how they broke it down, using simple analogies:

1. The Three Ways Secrets Leak

The researchers found that privacy leaks happen at three different stages of the "cooking" process:

Stage 1: The Shopping List (The Scores)
Before you even buy anything, you might write down a "score" for every item in the supermarket based on how well it matches your grandmother's notes.
- The Leak: If you publish these scores, an attacker can look at them and reverse-engineer your notes. It's like if you wrote, "This tomato is a 9/10 match for Grandma's recipe." An attacker can look at that 9/10 and say, "Aha! Grandma must have had a recipe that loves this specific type of tomato."
- The Analogy: It's like leaving a trail of breadcrumbs. If you say, "I picked the red apple because it's the closest match to my secret fruit," the attacker knows you have a secret red apple.
Stage 2: The Basket (The Selected Subset)
You take the items you bought home. You didn't buy the whole supermarket, just a specific basket of items.
- The Leak: Even if you hide the scores and only show the basket, an attacker can still guess what was in your notebook. If your basket contains 50 specific spices and no others, the attacker can deduce that your secret recipe must have required those exact 50 spices.
- The Analogy: Imagine you tell a friend, "I only bought the red, green, and yellow peppers." Your friend can guess, "You must be making a salad that needs those three colors specifically." The absence of other items is just as revealing as the presence of the ones you picked.
Stage 3: The Final Dish (The Trained Model)
You cook the meal. The final dish is the "Model."
- The Leak: This is the sneakiest part. The researchers showed that if an attacker is clever, they can "poison" the supermarket with a few fake items before you shop.
- The Analogy: Imagine the attacker sneaks a few jars of "Ghost Pepper" into the supermarket, but they are labeled with a secret code. If your grandmother's notes made you pick those specific jars, the final dish will taste like Ghost Pepper. If the dish doesn't taste like Ghost Pepper, the attacker knows your grandmother's notes didn't include that flavor. By tasting the final dish, the attacker can guess exactly what was in your private notebook.

2. Two Different "Shopping" Strategies

The paper tested two common ways people do this curation:

Strategy A: The "Look-Alike" Method (Image-Based)
You pick items that look exactly like the ones in your notes.
- Result: Very Leaky. Because you are picking the single best match, it's very easy for an attacker to figure out exactly which note you were looking at. It's like saying, "I picked the shoe that fits my foot perfectly." The attacker knows exactly what your foot looks like.
Strategy B: The "Average" Method (TRAK)
You pick items that, on average, improve the recipe. You don't just pick the single best match; you look at how all the items work together.
- Result: Safer, but not safe. If you have a huge notebook (lots of data), this method hides your secrets well because the "average" smooths out the details. But, if your notebook is small (which is common in sensitive fields like medicine or finance), the "average" is still too easy to reverse-engineer.

3. The Solution: The "Noise" Shield

The researchers also tested a defense called Differential Privacy.

The Analogy: Imagine you are writing your shopping list, but you add a little bit of static noise or static electricity to the paper. You write, "I need a tomato," but the paper is slightly smudged so it looks like "I need a t-mato" or "I need a tomato or maybe a potato."
The Result: This noise makes it impossible for the attacker to be 100% sure what you picked. It protects the secret, but it might make your shopping list slightly less efficient (you might buy a slightly less perfect tomato). The paper shows that adding this "noise" effectively stops the leaks.

The Big Takeaway

For a long time, people thought: "If I don't train my AI on private data, but only use private data to pick public data, I'm safe."

This paper proves that you are not safe. The process of selection itself is a privacy risk. Whether it's the scores you calculate, the list of items you choose, or the final model you build, all of them can act as a mirror reflecting your private secrets back to an attacker.

The Lesson: If you want to use private data to guide your AI, you can't just "curate" the data. You have to build privacy protections (like adding noise) directly into the curation process itself.

1. Problem Statement

Data curation is a standard practice in modern machine learning where a small, sensitive, private dataset ( $T$ ) is used to select a high-value subset from a large, public dataset ( $D$ ). The resulting curated subset ( $\tilde{D}$ ) is then used to train a model ( $M$ ). The prevailing assumption is that because the final model is trained only on public data and never directly exposed to the private data $T$ , the privacy of $T$ is preserved.

The Core Problem: This paper challenges that assumption. It demonstrates that the curation process itself acts as a side channel that leaks membership information about the private target dataset. Even if the final model never sees the private data, the intermediate steps (scoring, selection) and the final model can reveal whether specific private samples were used to guide the curation.

2. Methodology

The authors propose a systematic framework for attacking data curation pipelines at three distinct stages. They focus on two representative curation methods:

Image-based Curation: Uses nearest-neighbor search in embedding space (e.g., CLIP embeddings) to score public samples based on similarity to the private target.
TRAK (Tracing with the Randomly-projected After Kernel): Uses gradient-based influence functions to score public samples based on their expected utility for the target task.

Threat Model

Adversary Knowledge: The adversary knows the full public pool $D$ , the private target set $T$ (or a candidate set), and the curation algorithm.
Adversary Capabilities:
- Passive: Can observe curation scores ( $s$ ) or the binary selection mask ( $m$ ) of the curated subset.
- Active (End-to-End): Can inject a small number of "fingerprinted" samples into the public pool before curation to detect if they are selected and how they affect the final model.
Goal: Perform Membership Inference Attacks (MIA) to determine if a specific sample $t \in T$ was part of the private set used for curation.

Attack Strategies

The authors develop 7 distinct attacks across the pipeline:

A. Attacks on Curation Scores (Continuous Output)

LiRA Adaptation: They adapt the Likelihood Ratio Attack (LiRA) by replacing shadow models with shadow curation runs. They generate random subsets of $T$ , curate $D$ for each, and model the distribution of scores for "member" vs. "non-member" scenarios.
Custom Voting (Image-based): Exploits the deterministic nature of nearest-neighbor selection. If a public sample's score matches the similarity to a specific target $t$ , $t$ is voted as a member. If a target $t$ would have produced a higher score but didn't, it is voted as a non-member.
Least Squares (TRAK): Formulates the score recovery as a sparse linear system. Since TRAK scores are linear combinations of target gradients, the adversary solves for the membership mask that best explains the observed scores.

B. Attacks on Curated Subsets (Binary Output)

Binary LiRA: Adapts LiRA to binary selection masks (whether a sample was picked or not) using Bernoulli distributions.
Iterative Voting (Image-based): An iterative reconstruction algorithm. The adversary hypothesizes a target set, curates $D$ , compares the result to the observed subset, and updates the hypothesis based on "over-selected" and "under-selected" samples until convergence.

C. End-to-End Attacks on Trained Models

Fingerprinting: The adversary injects specific "fingerprint" samples into the public pool $D$ $D$ .
- Image-based: Uses mislabeled captions (e.g., an image of a dog labeled "ratatouille"). If the private target $t$ is present, the fingerprint is selected (due to embedding similarity), imprinting a detectable signal in the final model (high probability on "ratatouille").
- TRAK: Uses orthogonal information (e.g., appending "and ratatouille" to a correct caption). This preserves the TRAK score (gradient alignment) but imprints a signal in the model.
Detection: The adversary queries the final model. If the fingerprint signal is present, it implies the specific target $t$ was part of the curation set.

3. Key Contributions

First Comprehensive Privacy Analysis of Curation: The paper establishes that curation pipelines leak private information at every stage: scores, selection masks, and final models.
Novel Attack Vectors: Introduction of custom attacks tailored to the mathematical structures of embedding-based (nearest-neighbor) and gradient-based (TRAK) curation.
End-to-End Leakage via Poisoning: Demonstration that even models trained exclusively on public data leak private target membership if the public pool was curated using that private data, provided the adversary can inject fingerprints.
Defense via Differential Privacy (DP): The authors propose and evaluate DP adaptations (Noisy Max for Image-based, Noisy Mean for TRAK) that effectively mitigate these leaks.

4. Key Results

The evaluation was conducted on six datasets (CIFAR-10/100, STL-10, RESISC45, PCam, Food101) and a public pool of 12.8M images (CommonPool).

Image-based Curation (Highly Vulnerable):
- Scores: The voting attack achieves near-oracle performance (AUC $\approx$ 0.95) because the nearest-neighbor mechanism creates a deterministic link between scores and specific targets.
- Subsets: Even without scores, iterative reconstruction successfully recovers the private set for samples with non-zero influence.
- Models: End-to-end attacks show consistent leakage (TPR up to 21.4% at 1% FPR) regardless of target set size, as long as the target influences the curation.
- Sparsity Issue: A significant portion of targets (e.g., 98% in PCAM) have zero influence on public scores (they are not nearest neighbors to any public sample), offering natural protection for those specific samples but leaving others highly exposed.
TRAK Curation (Size-Dependent Vulnerability):
- Scores: TRAK is more robust due to gradient averaging, which dilutes individual signals. Attack success is near random (AUC $\approx$ 0.5) for large target sets.
- Small Target Sets: Vulnerability increases drastically for small target sets (common in sensitive domains like healthcare). For $|T| < 1000$ , attack success is high.
- Models: End-to-end attacks are highly effective for small target sets but diminish as the target set grows, as the averaging effect shields individual contributions.
Mitigation:
- Applying Differential Privacy (e.g., $\epsilon=10$ ) reduces attack success to near-baseline levels (TPR $\approx$ 1-2%) for both methods.
- Simply removing "vulnerable" samples from the target set is ineffective and can sometimes increase leakage (Privacy Onion Effect) by exposing previously shielded samples.

5. Significance and Implications

Paradigm Shift: The paper proves that "training on public data only" is not sufficient for privacy if the selection of that public data was guided by private data. Privacy assessment must extend to the data selection process.
Real-World Risk: The findings are critical for industries using curation for sensitive domains (finance, healthcare) and for "Data-as-a-Service" markets where curation scores and subsets are exchanged.
Design Requirement: Future curation systems must incorporate formal privacy guarantees (like DP) by design, rather than relying on the assumption that the private data is never seen by the final model.
Practicality: The attacks are computationally efficient (training-free for score/subset attacks) and require minimal active intervention (few fingerprints) for end-to-end attacks, making them realistic threats.

In conclusion, Curation Leaks reveals a fundamental privacy flaw in a widely adopted ML practice, demonstrating that the process of selecting data is as privacy-sensitive as the process of training on it.

Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning

1. The Three Ways Secrets Leak

2. Two Different "Shopping" Strategies

3. The Solution: The "Noise" Shield

The Big Takeaway

1. Problem Statement

2. Methodology

Threat Model

Attack Strategies

3. Key Contributions

4. Key Results

5. Significance and Implications

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank