Identifying Adversary Characteristics from an Observed Attack

Imagine you are the security guard at a high-tech art gallery. One day, you notice a thief has managed to sneak in and subtly alter a painting so that the gallery's AI camera thinks it's a picture of a banana instead of a masterpiece.

Most security experts would immediately try to build a better camera or a stronger lock to stop the next theft. But this paper asks a different, more detective-like question: "Who exactly is this thief, and how do they think?"

Here is a simple breakdown of the paper's ideas, using everyday analogies.

The Core Problem: The "Black Box" Thief

In the world of Artificial Intelligence (AI), hackers don't usually break in with crowbars. Instead, they add tiny, invisible "noise" to data (like changing a few pixels in a photo) to trick the AI into making a mistake.

The problem is that defenders (the gallery owners) usually guess what the hacker is like. They assume, "Okay, the hacker is smart, has a supercomputer, and wants to break the system." But in reality, the hacker might be lazy, have a weak computer, or just want to cause a specific, silly error.

If you guess wrong about the hacker, your defense will fail. It's like trying to stop a pickpocket by building a fortress wall; you're solving the wrong problem.

The Big Discovery: The "Many Faces" Mystery

The authors of this paper discovered a tricky mathematical truth: Just seeing the attack isn't enough to know who did it.

The Analogy: Imagine you find a broken window in your house.

Hypothesis A: A baseball player hit it with a ball.
Hypothesis B: A storm blew a branch through it.
Hypothesis C: A prankster threw a rock.

All three scenarios result in the exact same broken window. Without seeing the person or the object, you can't be 100% sure who did it. In math terms, the authors call this "non-identifiable." Many different types of attackers could produce the exact same attack.

The Solution: The "Sherlock Holmes" Framework

Since we can't know the truth for sure, the paper proposes a framework to find the most likely suspect.

Think of it like a detective using a "Wanted Poster" (what they already believe about criminals) combined with the "Crime Scene Evidence" (the actual attack they just saw).

The Prior Belief (The Wanted Poster): The defender starts with a guess. "I think most hackers in this area use simple tools and want to cause chaos." This is their starting point.
The Observation (The Evidence): The defender sees the specific way the window was broken.
The Reverse Engineering: The framework works backward. It asks: "If I assume the hacker has these specific skills and goals, would they have broken the window exactly this way?"

It runs a complex calculation to find the combination of Knowledge (what the hacker knows), Capability (what tools they have), and Objective (what they want) that best explains the attack.

How It Works in Practice

The paper tested this idea on three different types of "galleries" (AI models):

Linear Regression: A simple, straight-line prediction. (Like a basic calculator).
Logistic Regression: A slightly more complex classifier. (Like a traffic light deciding red or green).
Neural Networks: A deep, complex brain. (Like a human expert).

The Results:

For the simple models, the framework was incredibly accurate (over 99% error reduction). It could almost perfectly guess the hacker's profile.
For the complex models, it was still very good, though slightly less precise. This is because complex systems are harder to reverse-engineer, much like trying to guess a master chef's recipe just by tasting one bite of soup.

Why Does This Matter?

Why bother guessing who the hacker is instead of just fixing the hole? The paper gives two great reasons:

Exogenous Mitigation (The "Catch the Criminal" Approach): If you know the hacker is a specific person with specific tools, you can stop them outside the computer system. You can track them down, ban their IP address, or change the physical security. You don't just patch the software; you neutralize the threat.
Tailored Defense (The "Custom Shield" Approach): If you know the hacker is using a specific type of tool, you can build a shield specifically designed to stop that tool, rather than a generic shield that stops everything but slows down your system.

The Bottom Line

This paper is about shifting the mindset from "How do I stop this attack?" to "Who is attacking me, and how do they think?"

By treating an attack as a puzzle to be solved rather than just a bug to be fixed, defenders can use "reverse engineering" to learn about their enemies. Even if they can't know the attacker's identity with 100% certainty, knowing the most probable profile allows them to build smarter, more effective defenses.

In short: Don't just fix the lock; figure out who has the key, and then take it away from them.

1. Problem Statement

Machine Learning (ML) systems are vulnerable to adversarial attacks, where an adversary (ATKR) manipulates input data to cause incorrect predictions by a defender's model (DFDR). Existing defense mechanisms typically assume a fixed threat model (specific attacker knowledge, capabilities, and objectives). However, in real-world scenarios, these parameters are often unknown, non-stationary, or vary between attackers.

The core problem addressed in this paper is Adversary Identification: Given an observed attack ( $\alpha_{obs}$ ), can a defender reverse-engineer the underlying characteristics of the attacker? Specifically, the authors aim to infer the attacker's:

Knowledge ( $K$ ): The attacker's estimate of the defender's model.
Capability ( $C$ ): The constraints on the perturbations the attacker can apply (e.g., $L_\infty$ box constraints or Mahalanobis constraints).
Objective ( $O$ ): The goal the attacker is optimizing (e.g., maximizing loss or targeting a specific class).

The paper establishes that without additional information, the attacker is non-identifiable. Multiple distinct combinations of $(K, C, O)$ can result in the exact same observed attack, making unique identification mathematically impossible in a general setting.

2. Methodology

The authors propose a domain-agnostic probabilistic framework to solve this non-identifiability problem by finding the most probable attacker rather than a unique one.

A. Threat Model

The attacker is modeled as an optimizer solving a specific problem based on their parameters $(K, C, O)$ . The defender observes the resulting optimal attack $\alpha_{opt}(K, C, O)$ and attempts to reverse-engineer the parameters.
The paper analyzes three specific configurations to demonstrate the framework:

Linear Regression: Repulsive attacker (maximizing prediction error) under Mahalanobis constraints.
Logistic Regression: Attractive attacker (targeting a specific class) under box constraints.
Multi-Layer Perceptron (MLP): Attractive attacker under box constraints.

B. The Inference Framework

The defender formulates the task as a bi-level optimization problem. Instead of assuming the attacker is perfectly optimal, the framework models the observed attack as a probability distribution centered around the theoretical optimal attack.

The defender seeks to maximize the posterior probability of the attacker's parameters given the observation:
$\hat{K}, \hat{C}, \hat{O} = \arg \max_{K,C,O} \left[ \lambda \cdot \log p(K, C, O) + \log p(\alpha_{obs} | \alpha_{opt}(K, C, O)) \right]$

Prior Beliefs ( $p(K, C, O)$ ): The defender incorporates soft prior beliefs about the attacker (modeled as independent Gaussian distributions). This regularizes the problem and helps resolve non-identifiability.
Likelihood ( $p(\alpha_{obs} | \alpha_{opt})$ ): This term measures how well the observed attack matches the optimal attack generated by the hypothesized parameters.
Weight ( $\lambda$ ): A scalar balancing the reliance on prior beliefs versus the observed data. A small $\lambda$ implies the attacker is highly optimal (data dominates); a large $\lambda$ implies the attacker may be noisy or suboptimal (prior dominates).

C. Mathematical Formulation

The framework transforms the inference into a constrained optimization problem:

Outer Loop: Optimizes the attacker parameters $(K, C, O)$ to maximize the objective function.
Inner Loop: For a given set of parameters, solves the attacker's optimization problem to find the theoretical $\alpha_{opt}$ $α_{o pt}$ .
- For Linear Regression, the inner problem has an analytical solution, allowing for efficient inference.
- For Non-linear models (Logistic Regression, MLP), the inner problem is non-convex and requires numerical approximation (e.g., Projected Gradient Descent), increasing computational complexity.

3. Key Contributions

Novel Framework: Introduced a general, domain-agnostic method for reverse-engineering attacker goals, knowledge, and capabilities from a single observed attack.
Proof of Non-Identifiability: Mathematically proved (Theorem 3.2) that for linear attackers, the tuple $(K, C, O)$ is non-identifiable; i.e., for any observed attack, there exist infinite sets of attacker parameters that could have produced it.
Probabilistic Solution: Proposed a solution to the non-identifiability challenge by using a probabilistic framework with prior distributions to identify the most probable attacker.
Empirical Validation: Demonstrated the feasibility of the approach across three different learner types (Linear, Logistic, and Neural Networks).

4. Results

The authors conducted experiments using synthetic data (Linear Regression) and the Pen-Based Recognition of Handwritten Digits dataset (Logistic Regression and MLP). They compared their learned parameters ( $\hat{K}, \hat{C}, \hat{O}$ ) against a baseline where the defender assumes the attacker's parameters are the mode of the prior distribution.

Performance Metric: Percent Error Reduction (PER) in predicting the optimal attack.

Defender/Attacker Type	Median PER	Max PER	% Trials with PER > 0
Linear Regression (Repulsive)	99.14%	99.65%	91%
Logistic Regression (Attractive)	13.35%	84.56%	66%
MLP (Attractive)	25.25%	71.68%	84%

Linear Models: The framework achieved near-perfect error reduction (median ~99%), demonstrating high efficacy when analytical solutions exist.
Non-Linear Models: While still showing significant improvement over the baseline, performance exhibited higher variance and lower median reduction. This is attributed to the non-convexity of the inner optimization problem and the suboptimality of practical attackers.

5. Significance and Implications

Exogenous Mitigation: By identifying the attacker's characteristics, defenders can take actions outside the learning algorithm, such as tracking the attacker, limiting their access, or altering system architecture to specifically counter the identified threat profile.
Tailored Defense: Knowledge of the specific attacker allows for the implementation of more effective defenses (e.g., adversarial regularization tuned to the specific attacker's objective and constraints) rather than generic, one-size-fits-all defenses.
Reverse Engineering Deception (RED): This work contributes to the emerging field of RED by providing a theoretical and practical foundation for extracting attacker intelligence from observed data, moving beyond simple attack detection to attacker profiling.
Limitations & Future Work: The paper acknowledges that non-linear models introduce non-convexity and higher variance, making the inference more reliant on strong priors. Future work aims to address these stability issues and extend the framework to handle campaigns of attacks rather than single instances.

In conclusion, the paper shifts the defensive paradigm from "assuming a fixed threat model" to "learning the threat model from evidence," providing a robust mathematical tool for understanding and countering sophisticated adversarial actors.