RaPA: Enhancing Transferable Targeted Attacks via Random Parameter Pruning

The Big Picture: The "Copycat" Problem

Imagine you are a master forger trying to create a fake painting that looks so real, it fools not just one art critic, but any critic in the world.

In the world of Artificial Intelligence (AI), this is called a Targeted Transfer Attack.

The Goal: You trick an AI into thinking a picture of a "cat" is actually a "dog."
The Challenge: You only have access to your own AI (the "Surrogate"). You don't know how the target AI (the "Black Box") works. You hope the trick you learned on your AI will work on theirs, too.

The Problem: Current methods are like a student who memorizes the exact answers to a practice test. They get 100% on the practice test (your AI), but if the real test (the target AI) asks the same question in a slightly different way, the student fails. The "trick" relies too heavily on the specific quirks of the practice test.

The Discovery: The "Shortcut" Addiction

The researchers discovered why these attacks fail to transfer.

They found that existing attack methods rely on a tiny, specific group of "super-parameters" inside the AI model. Think of these parameters as secret shortcuts in a maze.

The attack finds a path that uses these specific shortcuts to win.
The Issue: The target AI might have a different maze layout. It doesn't have those exact shortcuts. So, when the attack tries to use them, it hits a dead end.

The attack is "over-reliant" on a few lucky breaks that only exist in the model it was trained on.

The Solution: RaPA (Random Parameter Pruning Attack)

To fix this, the authors created RaPA. Here is how it works, using a metaphor:

The Analogy: The "Blindfolded Chef"

Imagine you are teaching a chef (the AI) to cook a dish that will fool a food critic.

Old Method: The chef learns the recipe by tasting every single ingredient perfectly. If the critic changes the brand of salt, the dish tastes wrong.
RaPA Method: Every time the chef practices, you blindfold them and randomly remove a few ingredients from the pantry.
- Iteration 1: You hide the salt. The chef has to learn to make the dish without it.
- Iteration 2: You hide the pepper. The chef learns to compensate.
- Iteration 3: You hide the garlic.

By forcing the chef to practice with random missing ingredients, they stop relying on any single ingredient. They learn the true essence of the dish.

In Technical Terms:
RaPA randomly "prunes" (turns off) a small percentage of the AI's internal connections (parameters) during the training of the attack.

It does this randomly every single step of the process.
This forces the attack to spread its "weight" across all parts of the model, rather than leaning on a few "shortcut" parameters.
The result is an attack that is robust enough to work even if the target AI has a completely different internal structure.

Why It's a Game Changer

The paper highlights three major wins for RaPA:

It's a "Magic Wand" (Training-Free):
Most advanced attacks require you to re-train the AI model, which takes days and massive computing power. RaPA requires zero retraining. You just apply the "blindfold" technique while generating the attack. It's like upgrading a car's engine without ever opening the hood.
It Bridges the Gap (CNN to Transformer):
There are two main types of AI architectures: CNNs (like ResNet, good at seeing patterns) and Transformers (like ViT, good at understanding context). Usually, an attack that works on a CNN fails miserably on a Transformer.
- The Result: RaPA smashed this barrier. It improved the success rate by 11.7% when moving from CNNs to Transformers. That's a massive jump in the AI security world.
It Gets Better with More Power:
If you give RaPA more time to think (more iterations), it gets significantly stronger. It scales up beautifully, whereas other methods hit a ceiling.

The Bottom Line

RaPA is a new way to hack AI that stops the hacker from "cheating" by memorizing the specific model they are attacking. Instead, it forces the attack to be generalizable.

By randomly turning off parts of the model during the attack creation, RaPA ensures the "fake" example is so fundamentally convincing that it works on any AI, regardless of how that AI was built. It's the difference between memorizing a specific lock's key and learning how to pick any lock.

1. Problem Statement

The paper addresses the limitations of targeted transfer-based adversarial attacks. While untargeted attacks have seen significant improvements, targeted attacks (where the goal is to force a model to classify an input into a specific incorrect category) suffer from low Attack Success Rates (ASRs) when transferring from a white-box surrogate model to unseen black-box target models.

Key Observation:
Existing methods generate adversarial examples that overfit the surrogate model by relying heavily on a tiny subset of its parameters. These "shortcut" parameters are often specific to the surrogate's training dynamics or architecture. Consequently, the generated perturbations fail to generalize to target models with different parameter configurations, leading to poor transferability.

2. Methodology: Random Parameter Pruning Attack (RaPA)

The authors propose RaPA, a training-free method that introduces parameter-level randomization during the attack optimization process to mitigate over-reliance on specific parameters.

Core Mechanism

Instead of optimizing against a static surrogate model, RaPA dynamically creates diverse, semantically consistent variants of the surrogate model at each optimization step.

Random Pruning: At every iteration, RaPA applies a random binary mask ( $M$ ) to the parameters ( $\theta$ ) of the surrogate model (specifically weights and biases in linear layers and transformation parameters in normalization layers).
DropConnect Implementation: The masking follows a Bernoulli distribution ( $M_i \sim \text{Bernoulli}(1-p)$ ), effectively implementing a DropConnect strategy during inference.
Ensemble Effect: The attack computes gradients based on these randomly masked variants. If multiple inferences ( $S$ ) are performed per iteration, the gradients are averaged.

Theoretical Justification

The authors derive that minimizing the expected loss over these random masks is mathematically equivalent to adding an importance-equalization regularizer to the original loss function:
$\mathbb{E}_M[L(f(x_{adv}; M \odot \theta))] \approx L(f(x_{adv}; \theta)) + \frac{p(1-p)}{2} \sum_i \frac{\partial^2 L}{\partial \theta_i^2} \theta_i^2$
The second term acts as a penalty on the squared importance of parameters. This forces the adversarial perturbation to distribute its reliance across a broader set of parameters rather than concentrating on a few dominant ones, thereby improving generalization to unseen models.

Algorithm

RaPA integrates seamlessly with existing gradient-based attacks (like MI-FGSM). In each iteration:

Generate random masks for selected layers.
Compute gradients using the masked model variants (potentially with input transformations).
Average the gradients and update the adversarial example.
Project the result to satisfy the perturbation budget.

3. Key Contributions

Identification of Over-Reliance: The paper provides empirical evidence (via parameter importance analysis using Optimal Brain Damage metrics) that existing targeted attacks rely disproportionately on the top 0.5% most important parameters. Pruning these causes a drastic drop in ASR, while pruning less important parameters has negligible effect.
RaPA Framework: Proposes a novel, training-free attack method that uses random parameter pruning to implicitly equalize parameter importance, acting as a regularizer against overfitting.
Theoretical Insight: Demonstrates that random parameter pruning is equivalent to adding a regularizer that flattens the importance distribution, reducing the "over-reliance" issue.
Scalability: Shows that RaPA benefits significantly from increased computational budgets (more iterations and more inferences per iteration), outperforming other methods as resources scale.

4. Experimental Results

The authors evaluated RaPA on the ImageNet-Compatible dataset against 16 target models, including both CNNs (ResNet, DenseNet, VGG, etc.) and Transformers (ViT, LeViT, ConViT, CLIP).

Cross-Architecture Transfer (CNN $\to$ Transformer): This is the most challenging scenario.
- Using ResNet-50 as the surrogate, RaPA achieved an average ASR of 45.0%, compared to 33.3% for the best baseline (FTM). This is a 11.7% improvement.
- Using DenseNet-121 as the surrogate, RaPA achieved 40.3% vs. 22.8% for the baseline (17.5% improvement).
Cross-Architecture Transfer (Transformer $\to$ CNN):
- Using ViT as the surrogate, RaPA achieved the highest average ASR of 51.2% against CNN targets, significantly outperforming self-ensemble methods like SE-ViT and MUP.
Defense Robustness: RaPA outperformed all baselines against strong defenses, including adversarially trained models (advRN), ensemble defenses (ensIR), and denoising techniques (HGD, Diffpure).
- Against ensIR, RaPA achieved 43.2% ASR, surpassing the second-best method by 29.4%.
Compatibility: RaPA is training-free but can be combined with training-based enhancement frameworks (like DSM or SASD-WS). When combined, it further boosts performance (e.g., combining with DSM raised average ASR from 20.6% to 58.3%).
Ablation Studies:
- Layer Selection: Applying pruning to both Normalization (BN/LN) and Linear (FC) layers yielded the best results.
- Probability: The method is stable across a range of DropConnect probabilities ( $p \in [0.03, 0.07]$ ), with peak performance around $p=0.05$ .
- Gini Coefficient: RaPA achieved the lowest Gini coefficient for parameter importance, confirming it successfully flattens the importance distribution.

5. Significance

Paradigm Shift: Moves beyond input transformation or gradient stabilization to address the root cause of poor transferability: parameter over-reliance.
Efficiency: Unlike methods requiring retraining of surrogate models (e.g., DSM, SASD-WS) or complex structural perturbations, RaPA is training-free, easy to implement, and computationally efficient.
State-of-the-Art: Sets new benchmarks for targeted transfer attacks, particularly in the difficult cross-architecture setting (CNN to Transformer), demonstrating that randomization at the parameter level is a powerful tool for enhancing adversarial generalization.
Security Implications: Highlights that even models with different architectures are vulnerable to targeted attacks if the surrogate model's parameter dependencies are not properly regularized, urging the development of more robust defenses.