Challenges in Enabling Private Data Valuation

The Big Idea: The "Credit Score" Dilemma

Imagine you and a group of friends build a giant, incredibly smart robot together. You all contributed different things: you brought the blueprints, your friend brought the batteries, another brought the code, and someone else brought the raw materials.

Now, the robot is amazing. But who deserves the most credit? Who is the "star" of the team?

Data Valuation is the process of trying to answer that question. It's a mathematical way to say, "How much did your specific piece of data help train this AI?" This is becoming huge because companies want to buy and sell data, and they need to know how much it's worth.

The Problem: To figure out who deserves credit, you have to look at the data very closely. But in doing so, you might accidentally reveal secrets about the people who provided that data.

This paper asks a tough question: Can we give credit to data without spying on the people who gave it?

The authors say: It's incredibly hard, and maybe impossible with the tools we have right now. Here is why, broken down into four main stories.

1. The "Magnifying Glass" Problem (Influence Functions)

The Analogy: Imagine you are trying to see how much a single grain of sand affects a sandcastle. To do this, you use a super-powerful magnifying glass (math called "Inverse Hessian").

The Issue:

The Good: This magnifying glass is great at finding the "special" grains of sand that hold the castle together.
The Bad: Because the glass is so powerful, if there is even one weird, jagged rock in the sand, the magnifying glass makes it look like a giant boulder. It blows the importance of that one grain out of proportion.
The Privacy Risk: If you try to hide the secret of who provided that "giant boulder" grain by adding "noise" (static) to the answer, the static becomes so loud that it drowns out the signal for everyone else. You end up with a result that is either too scary to release (because it reveals the jagged rock) or so fuzzy that it's useless.

The Takeaway: Trying to measure the exact impact of one person's data is like trying to whisper a secret in a hurricane. The math needed to be precise is the same math that makes the secret impossible to hide.

2. The "Team Roster" Problem (Shapley Values)

The Analogy: Imagine you want to know how much each player contributed to a soccer team's win. The "Shapley Value" method says: "Let's try every possible combination of players. If we take Player A off the team, does the score drop? If we put them back, does it go up?"

The Issue:

The Good: This is the fairest way to judge everyone.
The Bad: There are billions of possible team combinations. To get a precise answer, you have to test almost all of them.
The Privacy Risk: In the world of privacy, we have to add "noise" to protect the team. But because we are testing so many combinations, the "noise" required to hide the fact that one specific player was on the team gets huge.
The Paradox: If you try to hide the player's contribution, you have to add so much static that you can no longer tell who the best players are. The "fairness" of the math destroys the "privacy" of the players.

3. The "Movie Reel" Problem (Trajectory Methods)

The Analogy: Instead of looking at the final robot, imagine we watch the movie of how the robot was built, frame by frame. We see exactly which tools were used at which second.

The Issue:

The Good: This is very accurate. We can see exactly when a specific piece of data was used.
The Bad: To protect privacy, the "movie" itself needs to be blurry (this is called Differential Privacy).
The Privacy Risk: If the movie is blurry enough to protect the data, the "credits" we assign at the end become fuzzy. We can't tell if the robot was built by a genius or a novice.
The Catch: If we try to keep the movie sharp so we can give accurate credits, we accidentally reveal the private data used to build the robot. You can't have a sharp movie and a secret cast.

4. The "Proxy" Problem (Surrogate Models)

The Analogy: Instead of building the real robot, we build a cheap, fake version (a "surrogate") that acts like the real one. We use the fake one to guess who deserves credit.

The Issue:

The Good: It's fast and cheap.
The Bad: The fake robot is built using the real data. So, the fake robot still "remembers" the secrets of the real data.
The Privacy Risk: Even though we are only looking at the fake robot, the way it was built leaks information about the real people. It's like trying to hide a fingerprint by looking at a wax mold of it; the mold still has the unique ridges.

The Final Verdict: A Structural Contradiction

The authors conclude that this isn't just a technical bug we can fix with a patch. It is a fundamental contradiction.

Valuation wants to know: "How much did this specific person matter?" (It needs to be sensitive to individuals).
Privacy wants to say: "No one should be able to tell if this specific person mattered." (It needs to be insensitive to individuals).

The Conclusion:
You cannot easily have both. If you try to force them together with current methods, you either get:

Privacy: But the data valuation is useless (all the answers are just noise).
Valuation: But you have leaked private secrets about the data owners.

What's Next?
The paper suggests we need to invent entirely new ways of thinking. We can't just "add noise" to old methods. We need to design systems where the "credit" is calculated in a way that never requires looking at the individual data in the first place, or we need to accept that we can only give credit for groups of people, not individuals.

In short: We are trying to weigh a feather on a scale that is designed to ignore the weight of a feather. Until we build a new kind of scale, we can't do both perfectly.

1. Problem Statement

Data valuation methods quantify the contribution of individual training examples to a model's performance, behavior, or robustness. These methods are increasingly vital for dataset curation, auditing, and emerging data markets. However, they pose a severe privacy risk: valuation scores can reveal sensitive information about the training data, such as whether a specific individual's data was included (membership inference), if the data was rare/atypical, or if it exerted an outsized influence on the model.

The core conflict addressed in this paper is the fundamental tension between Differential Privacy (DP) and Data Valuation:

DP Requirement: Outputs must be insensitive to the inclusion or exclusion of any single record.
Valuation Goal: Explicitly designed to measure the sensitivity of a model to individual records.
The Paradox: Naive application of DP noise to valuation scores often destroys the fine-grained distinctions required to rank or attribute value, particularly in heterogeneous datasets where rare examples drive the signal. The paper argues that current approaches fail to provide meaningful privacy guarantees without rendering the valuation useless.

2. Methodology

The authors conduct a systematic analysis (Systematization of Knowledge, or SoK) of the landscape of modern data valuation methods. Rather than comparing algorithms solely by accuracy, they decompose valuation pipelines into shared structural primitives to identify where privacy failures originate.

They categorize valuation methods into four dominant families and analyze their interaction with DP:

Influence & Curvature Approximations: Methods using first-order approximations (e.g., Influence Functions, LiSSA, K-FAC) that rely on inverse Hessian operators to estimate the effect of removing a data point.
Weighted Marginal Contributions: Game-theoretic approaches (e.g., Shapley, Banzhaf, Beta Shapley) that estimate value based on a data point's marginal utility across subsets of the training data.
Trajectory-Based Approximations: Methods (e.g., TracIn, SOURCE, In-run Shapley) that attribute value based on the model's optimization path (gradient updates) during training.
Surrogate & Linearized Attribution: Methods (e.g., TRAK, Data Models) that replace complex training dynamics with linear surrogates in a feature space.

The authors analyze these methods through the lens of sensitivity drivers (e.g., curvature amplification, coalition extrema, trajectory accumulation) and evaluate why standard DP mechanisms (like clipping and noise addition) fail to provide dataset-independent guarantees.

3. Key Contributions

The paper identifies nine recurring challenges (C1–C9) that systematically obstruct DP-compatible data valuation and proposes three open problems (P1–P3) for future research.

Key Challenges Identified:

C1 & C2 (Curvature Amplification): Influence functions rely on the inverse Hessian ( $H^{-1}$ ). In deep learning, the Hessian is often ill-conditioned (flat loss landscape), leading to extreme eigenvalues. This causes "heavy-tailed" influence scores where a few outliers have massive values. Clipping these scores destroys the signal for outliers, while not clipping them requires noise levels that overwhelm the signal for the majority.
C3 (Privacy vs. Utility Paradox): In point-wise release, there is no trade-off sweet spot. Tight clipping loses outlier resolution; loose clipping requires noise that drowns out the signal.
C4 & C5 (Utility Instability & Aggregation): Shapley-based methods suffer from unbounded marginal utility in deep networks. A single data point can cause massive utility jumps in small subsets. Aggregating these contributions does not reduce sensitivity effectively because a single point participates in exponentially many subsets, causing sensitivity to accumulate linearly.
C6 (Sensitivity by Design): Standard clipping is insufficient. The paper argues for "sensitivity-by-design" architectures (e.g., Tk-NN) where the utility function is inherently bounded, rather than trying to privatize volatile utility functions post-hoc.
C7 (Trajectory Limits): While first-order trajectory methods (like TracIn) are theoretically compatible with DP if the training trajectory is already private (DP-SGD), they cannot utilize "hidden state" privacy amplification (which hides intermediate checkpoints) because valuation requires access to those checkpoints.
C8 (Compounding Sensitivity): Second-order trajectory methods (like SOURCE) require Hessian information, which is not protected by standard gradient noise, creating a fresh privacy leak.
C9 (Hidden Global Dependence): Surrogate methods (like TRAK) often use preconditioning matrices (e.g., Empirical Fisher) derived from the entire dataset. Computing these embeddings requires a global query to the private data, leaking information about the rest of the dataset even if the specific point's gradient is handled carefully.
Cross-Cutting Challenge: The "Multi-Query Privacy Bottleneck." Scoring an entire dataset compounds the privacy budget ( $\epsilon$ ) rapidly, making full-dataset valuation prohibitively expensive under DP.

Proposed Open Problems:

P1: Tighter Accounting for Trajectories: Developing specialized "Valuation Accountants" to quantify the privacy cost of releasing scalar products (gradient alignments) over time, rather than full high-dimensional trajectories.
P2: Static Task-Agnostic DP Valuation: Investigating whether meaningful attribution can be extracted from a private model without accessing private curvature ( $H^{-1}$ ) or exposing sensitive marginal utilities, potentially using public-data surrogates.
P3: Beyond Per-Record Release: Addressing Central Release (publishing the whole valuation vector) and Private Validation (where the validation set is also private), requiring solutions like Secure Multi-Party Computation (SMPC) or high-dimensional output perturbation.

4. Results and Empirical Findings

The authors provide empirical evidence supporting their theoretical claims:

Hessian Spectral Analysis: Experiments on MNIST show that the empirical Hessian has eigenvalues near zero, leading to inverse operators with massive norms. This results in influence scores with a heavy-tailed distribution (Figure 3), where a few points have scores orders of magnitude larger than the rest.
Clipping Failure: Figure 4 demonstrates that even with aggressive clipping, the ratio of estimated sensitivity to average score magnitude remains $>1$ . This implies that the noise required to satisfy DP is larger than the signal itself, rendering the valuation useless for the majority of data points.
Shapley Sensitivity: Table 2 shows that for Data Shapley, the empirical sensitivity (max change in score upon flipping a label) often exceeds the magnitude of the scores themselves (ratios up to 100x for Naive MC), making noise injection infeasible.
Trajectory Performance: Table 3 and Figure 5 show that while DP-SGD trained models retain some utility for tasks like mislabel detection, the overlap in top- $k$ influential examples drops significantly (to ~40-50%) compared to non-DP models, even under weak privacy budgets.

5. Significance

This paper is significant because it moves beyond the assumption that "adding noise" solves privacy in data valuation. It establishes that:

Structural Contradiction: The very mechanism that makes valuation useful (sensitivity to individual records) is the mechanism that makes it privacy-incompatible under standard DP.
Failure of Retrofitting: Simply applying standard DP techniques (clipping, noise addition) to existing valuation algorithms (Shapley, Influence Functions) is ineffective because it destroys the utility required for the task.
Future Direction: The field must shift from "privatizing existing algorithms" to rethinking the valuation objective itself. Future solutions likely require:
- Locality-constrained or bounded interaction scopes.
- Utility functions with intrinsic sensitivity bounds (sensitivity-by-design).
- Decoupling individual records from global dataset geometry (e.g., using public data priors).
- New privacy accounting frameworks specifically for valuation tasks.

In conclusion, the paper argues that achieving rigorous privacy in data valuation requires a fundamental redesign of valuation mechanisms rather than incremental improvements to current methods.

Challenges in Enabling Private Data Valuation

The Big Idea: The "Credit Score" Dilemma

1. The "Magnifying Glass" Problem (Influence Functions)

2. The "Team Roster" Problem (Shapley Values)

3. The "Movie Reel" Problem (Trajectory Methods)

4. The "Proxy" Problem (Surrogate Models)

The Final Verdict: A Structural Contradiction

1. Problem Statement

2. Methodology

3. Key Contributions

Key Challenges Identified:

Proposed Open Problems:

4. Results and Empirical Findings

5. Significance

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank