Privacy Against Agnostic Inference Attacks in Vertical… — Plain-Language Explanation

✨

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine a high-stakes game of 20 Questions, but played by two experts trying to solve a mystery together, while a third person watches.

The Setup: The Bank and the FinTech

Imagine a Bank (the "Active Party") and a FinTech Company (the "Passive Party").

The Bank knows who got a loan and whether they paid it back (the "Label"). They also know basic info like age and income.
The FinTech knows the same people, but they hold the secret sauce: detailed shopping habits, deposit history, and loan types. They don't know who paid back the loan.

To build a super-smart credit checker, they team up in Vertical Federated Learning (VFL). They train a model together without the Bank seeing the FinTech's secret data, and without the FinTech seeing the Bank's labels. A neutral Coordinator helps them mix their math.

The New Threat: The "Agnostic" Spy

Usually, hackers (or curious banks) try to steal secrets by eavesdropping on the final answers the model gives out. If the model says, "There is an 80% chance this person is a good borrower," the hacker can work backward to guess the secret shopping habits.

But this paper introduces a new, sneakier attack called the Agnostic Inference Attack.

The Analogy: The "Fake Detective"
Imagine the Bank doesn't need to eavesdrop on the final answer. Instead, the Bank builds its own private detective (called the Adversary Model or AM) using only the data it already has (age, income, and who paid back loans).

The Guess: The Bank's private detective looks at a new customer and guesses, "I think this person has an 80% chance of being a good borrower."
The Trap: Even though the Bank didn't get the real answer from the joint model, its own detective's guess is close enough.
The Leak: The Bank uses this "good enough" guess to run a math trick. Because the Bank knows the math rules of the joint model, it can reverse-engineer the FinTech's secret shopping habits from that guess.

Why is it scary?

No Eavesdropping Needed: The Bank doesn't need to steal the final score. It just needs its own data.
Training Data at Risk: Usually, hackers only attack new customers. This attack can also guess secrets about the training data (the historical records), which were thought to be safe.
The "Refined" Spy: If the Bank gets some real answers from the joint model (even just a few), it can train its private detective to be even better. This is called the Refined Adversary Model (RAM). It's like a detective who gets a few tips from the real case file and becomes a genius at guessing the rest.

The Defense: The "Distorted Mirror"

The paper asks: How do we stop this without breaking the partnership?

If the FinTech simply hides its data (Black Box), the Bank can't interpret why the model made a decision. In banking, you can't just say "The computer said no." You need to explain, "We said no because of high debt."

So, the authors propose Privacy-Preserving Schemes (PPS).

The Analogy: The Distorted Mirror
Instead of hiding the FinTech's data, they give the Bank a distorted mirror.

The FinTech takes its secret math parameters (the "weights" of the shopping habits) and twists them using a secret code (a rotation).
The Bank sees the twisted parameters. It can still use them to make predictions, so the model works perfectly.
But, if the Bank tries to reverse-engineer the shopping habits from these twisted numbers, the math falls apart. The "reflection" is too warped to figure out the original face.

The Trade-Off (The "Goldilocks" Zone)
The FinTech can control how much the mirror is distorted.

Too little distortion: The Bank can still guess the secrets (Privacy is low).
Too much distortion: The Bank can't explain the decisions anymore (Interpretability is low).
Just right: The Bank gets a model that works well and can explain decisions, but the secrets remain safe.

The Results: What the Experiments Showed

The authors tested this on real-world data (like credit cards and handwriting digits).

The Attack Works: Even without the real answers, the Bank's "Fake Detective" could guess the secrets surprisingly well, especially if the features (like age and shopping habits) are related.
The Defense Works: By applying the "Distorted Mirror" technique, they could make the Bank's guesses terrible (high error) while keeping the model's accuracy high.
The Sweet Spot: They found that a tiny bit of distortion creates a huge wall of privacy, allowing the Bank to still understand the model's logic.

The Bottom Line

This paper reveals that in collaborative AI, just hiding the final answer isn't enough. A smart partner can build their own version of the model to steal secrets. The solution isn't to stop sharing, but to share a slightly "scrambled" version of the math that keeps the model smart and explainable, but the secrets locked away.

1. Problem Statement

The paper addresses a critical privacy vulnerability in Vertical Federated Learning (VFL) under a white-box setting.

Context: In VFL, an active party (e.g., a bank) holds sample labels and some features, while a passive party (e.g., a FinTech company) holds disjoint features for the same samples. They collaborate to train a model (specifically Logistic Regression) without sharing raw data.
The Threat: Previous research focused on inference attacks where the active party reconstructs passive features using exact confidence scores (probabilities) released by the model during the prediction phase.
The Gap: This paper identifies a novel threat: Agnostic Inference Attacks. In this scenario, the active party (adversary) does not have access to the confidence scores of the target samples (e.g., training data or future prediction data). Instead, the adversary trains an independent Adversary Model (AM) using only their local data (active features + labels) to estimate the confidence scores.
The Risk: Even without the true scores, the adversary can use these estimated scores to reconstruct the passive party's private features via linear algebraic methods. This puts all samples (training and prediction) at risk, not just those queried during the prediction phase.

2. Methodology

A. The Agnostic Inference Attack

The attack proceeds in two main stages:

Adversary Model (AM) Construction: The active party trains a standalone classifier ( $f_{\theta_a}$ ) using only their local active features and ground truth labels. This model learns to predict the target labels.
Score Estimation & Reconstruction:
- The AM generates estimated confidence scores ( $\hat{c}$ ) for target samples.
- The adversary uses these estimated scores to form a system of linear equations based on the Logistic Regression structure: $J W_{pas} X = \hat{c}' - J W_{act} Y - Jb$ .
- Refined Adversary Model (RAM): To improve accuracy, the active party can further refine the AM by incorporating a small number of observed confidence scores from the VFL coordinator (CA) during training. This aligns the AM's output distribution closer to the true VFL model, significantly reducing the Mean Squared Error (MSE) of the feature reconstruction.
- Reconstruction: Depending on the relationship between the number of passive features ( $d$ $d$ ) and classes ( $k$ $k$ ):
  - If $d < k$ (Overdetermined): The adversary uses Ordinary Least Squares (OLS).
  - If $d \ge k$ (Underdetermined): The adversary uses the half* method (an optimal estimator for features in $[0,1]$ ) combined with the pseudo-inverse.

B. Privacy-Preserving Schemes (PPS)

The paper proposes a defense mechanism that shifts the focus from obfuscating confidence scores (which is ineffective against agnostic attacks) to distorting the passive party's model parameters ( $W_{pas}$ ).

Core Idea: The passive party releases a transformed version of their parameters ( $W_n$ ) to the active party. The transformation is designed to maximize the reconstruction error (MSE) for the adversary while minimizing the distortion to the model's interpretability.
Trade-off: There is an explicit trade-off between Privacy (high MSE for the adversary) and Interpretability (low deviation between released parameters $W_n$ and true parameters $W_{pas}$ ).
Optimization Formulation: The problem is formulated as a constrained optimization over Stiefel manifolds (orthonormal matrices) or general vector spaces, depending on the dimensions of $d$ and $k$ :
$\min_{R} f(R) + \lambda (g(R) - \varepsilon)^2$
Where $f(R)$ maximizes the adversary's MSE, $g(R)$ measures the distortion (loss of interpretability), and $\varepsilon$ is a tunable privacy budget.
Cases Covered:
1. $d \ge k > 2$ : Uses orthonormal matrix transformation ( $W_n = W_{pas}R$ ).
2. $1 < d < k$ : Uses sign-flipping or similar transformations.
3. $d = 1$ : Solves for a specific vector perturbation.
4. $k = 2, d > 1$ : Solves for general vector perturbation.

3. Key Contributions

Novel Attack Definition: Introduction of the Agnostic Inference Attack, demonstrating that an active party can reconstruct passive features without ever seeing the target's confidence scores, by leveraging a locally trained surrogate model.
Refinement Mechanism: Demonstration that the attack accuracy can be significantly boosted (approaching the performance of attacks with exact scores) by using a Refined Adversary Model (RAM) trained on a small set of observed scores.
Parameter-Level Defense: Proposing a defense strategy that operates on model parameters rather than output scores. This acknowledges that score-level obfuscation (noise/rounding) is insufficient against agnostic attacks.
Privacy-Interpretability Trade-off: Formulating the defense as an optimization problem that allows the passive party to tune the level of parameter distortion ( $\varepsilon$ ) to satisfy both privacy requirements and the active party's need for model interpretability (crucial for regulated industries like finance).
Theoretical Analysis: Providing closed-form derivations for the MSE of the attack under the proposed PPS and proving the Lipschitz continuity of the optimization objectives.

4. Experimental Results

The authors evaluated their methods on five real-world datasets: Bank, Adult, Satellite, PenDigits, and Grid.

Attack Effectiveness:
- The agnostic attack (using AM) achieved low MSE, comparable to attacks using exact scores, particularly on datasets with high feature correlation (Bank, Adult).
- The RAM significantly improved attack performance, reducing MSE further, especially when the number of passive features was small.
- On the Grid dataset (features are nearly uncorrelated), the attack was ineffective, confirming that feature correlation is a prerequisite for successful reconstruction.
Defense Performance:
- The proposed PPSs successfully increased the reconstruction MSE (privacy) as the distortion budget ( $\varepsilon$ ) increased.
- The results showed a non-linear Privacy-Interpretability (PI) trade-off. In many cases, a small amount of parameter distortion yielded a large increase in privacy (MSE) while maintaining acceptable interpretability.
- The PPSs were effective across different scenarios ( $d < k$ and $d \ge k$ ).
Overhead: The PPS optimization is performed offline (post-training) and involves negligible computational overhead (milliseconds) and no additional communication rounds during the VFL protocol.

5. Significance

This paper fundamentally shifts the security paradigm for Vertical Federated Learning:

Beyond Score Obfuscation: It proves that protecting only the output (confidence scores) is insufficient if the adversary can train a surrogate model. The vulnerability lies in the model parameters themselves.
Practical Balance: It offers a practical solution for industries where interpretability is legally or operationally required (e.g., credit scoring). Instead of forcing a "black box" (hiding parameters entirely) or a "white box" (full disclosure), it enables a tunable middle ground.
Robustness: The defense is robust against the "worst-case" assumption where the adversary has access to exact scores for some samples (via RAM refinement), as the distortion is applied to the parameters used in the linear reconstruction equations.

In conclusion, the paper provides a rigorous framework for understanding and mitigating a sophisticated class of inference attacks in VFL, offering a mathematically grounded method to balance privacy and utility in collaborative machine learning.

Privacy Against Agnostic Inference Attacks in Vertical Federated Learning