Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Here is an explanation of the paper "Contract And Conquer" using simple language and creative analogies.

The Big Picture: The "Black Box" Problem

Imagine you have a Black Box (a complex AI model, like a self-driving car's brain or a medical diagnostic tool). You can put a picture in, and it tells you what it sees. But you can't see inside the box; you don't know how it thinks, and you can't see its internal gears.

Security experts want to know: "Is this box safe?" To test this, they try to trick the box by making tiny, almost invisible changes to the input (like adding a few pixels of noise to a stop sign so the AI thinks it's a speed limit sign). These tricked inputs are called Adversarial Examples.

The Problem: Most current methods to find these tricks are like throwing darts in the dark. They might hit the bullseye (find a trick), but they can't prove they will eventually hit it. If they miss, they just say, "I tried my best," without knowing if a trick actually exists or if they just weren't looking hard enough.

The Solution: "Contract and Conquer" (CAC)

The authors propose a new method called Contract and Conquer (CAC). Think of this as a smart, systematic way to hunt for the trick, rather than just throwing darts blindly.

Here is how it works, broken down into three simple steps:

1. The "Shadow Puppet" (Knowledge Distillation)

Since you can't see inside the Black Box, you build a Shadow Puppet (a smaller, simpler model) that tries to copy the Black Box's behavior.

How? You show the Shadow Puppet thousands of pictures and ask the Black Box what it thinks. You teach the Shadow Puppet to mimic the Black Box's answers.
The Goal: Now, instead of trying to trick the mysterious Black Box directly, you try to trick your own Shadow Puppet. Since you can see inside the Shadow Puppet, you know exactly how to break it.

2. The "Shrinking Room" (Contraction)

This is the clever part.

Imagine you are looking for a lost coin in a giant warehouse (the search space). You throw a dart at the Shadow Puppet and find a spot where it gets confused.
You check if that spot also confuses the real Black Box.
- If yes: You found your adversarial example! You win.
- If no: The Shadow Puppet was fooled, but the Black Box wasn't. This means your "Shadow Puppet" isn't a perfect copy in that specific area.
The Fix: Instead of giving up, you shrink the room. You take the spot where the Shadow Puppet failed and say, "Okay, the real answer must be very close to this spot." You cut the search area down to just the immediate neighborhood of that spot.
You then teach the Shadow Puppet again, focusing intensely on this tiny, shrinking neighborhood.

3. The "Guaranteed Win" (Convergence)

Because you keep shrinking the search area and improving your Shadow Puppet's knowledge of that tiny area, you are mathematically guaranteed to eventually find a spot that tricks the Black Box.

The Analogy: Imagine trying to find a specific key in a massive field. Instead of running around randomly, you keep narrowing your search to a smaller and smaller circle. The paper proves that if you keep shrinking the circle and learning the terrain better, you will find the key within a specific number of steps. You don't have to guess; you have a mathematical guarantee.

Why Does This Matter?

No More Guessing: Current methods are like "hoping" to find a weakness. CAC is like having a map that guarantees you will find the weakness if it exists.
Better Safety: For critical systems (like hospitals or self-driving cars), we need to know for sure if a system is vulnerable. If CAC says "I can't find a trick," we can be much more confident the system is actually safe.
Efficiency: The paper shows that CAC is actually faster and finds "better" tricks (ones that are harder to notice) than the current best methods, even on complex models like Vision Transformers.

Summary Metaphor

Think of the Black Box as a fortress with a hidden weakness.

Old methods are like soldiers throwing rocks at the walls hoping one hits a weak spot. They might succeed, but they can't prove the wall is weak if they miss.
Contract and Conquer is like a master locksmith. They build a fake door (the Shadow Puppet) that looks exactly like the real one. They practice picking the fake door until they find the exact mechanism that opens it. If the real door doesn't open, they realize their fake door was slightly wrong, so they make a smaller, more precise fake door and try again. They keep making the fake door more precise and the target area smaller until they guarantee they can open the real door.

The paper proves that this process will always work, giving us a reliable way to test and secure our AI systems.

Here is a detailed technical summary of the paper "Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?"

1. Problem Statement

The paper addresses the challenge of evaluating the robustness of deep neural networks (DNNs) in black-box settings, where an attacker has limited access to the model (only query access to inputs and outputs).

The Gap: Existing black-box adversarial attack methods are empirically effective but lack theoretical guarantees. They cannot prove that an adversarial example exists for a specific model within a given perturbation budget, nor can they guarantee finding one within a fixed number of iterations.
The Need: With emerging AI regulations (e.g., EU AI Act) requiring robustness standards, there is a critical need for methods that can provably demonstrate a model's lack of robustness (i.e., prove an adversarial example exists) rather than relying on heuristics that might fail to find an attack even if one exists.

2. Methodology: Contract And Conquer (CAC)

The authors propose Contract And Conquer (CAC), an iterative framework that combines knowledge distillation with a contracting search space to provably find adversarial examples.

Core Mechanism

The method alternates between two processes:

Knowledge Distillation (The "Conquer" Phase):
- A small surrogate model ( $S$ ) is trained to mimic the target black-box model ( $T$ ).
- The distillation dataset ( $D(S)$ ) is dynamically expanded. It starts with a random subset of data near the target point $x$ and includes the target point itself.
- The surrogate model is trained to match the predictions of $T$ on this dataset with high confidence.
White-Box Attack & Space Contraction (The "Contract" Phase):
- A white-box attack (specifically MI-FGSM) is performed on the surrogate model $S$ to find an adversarial example $z_j$ .
- Transferability Check: The candidate $z_j$ is queried against the black-box target $T$ . If $T(z_j) \neq T(x)$ , the attack succeeds, and the algorithm terminates.
- Contraction: If the attack fails (i.e., $z_j$ $z_{j}$ is not transferable), the algorithm "contracts" the search space:
  - The failed example $z_j$ and its true label $T(z_j)$ are added to the distillation dataset to improve $S$ for the next iteration.
  - The adversarial search space $U_\delta(x)$ is contracted around the new point $z_j$ . The new search radius $\rho_j$ is defined as a fraction ( $t \in (0,1)$ ) of the distance between the current and previous adversarial candidates ( $\|z_j - z_{j-1}\|_\infty$ ).
  - The search space becomes the intersection of the original space and the new, smaller vicinity: $U_\delta(x)_j = U_\delta(x) \cap U_{\rho_j}(z_j)$ .

Theoretical Guarantee

The paper provides a convergence guarantee (Lemma 3.4). Under mild assumptions (bounded gradients of the surrogate model and the ability to find an adversarial example for the surrogate on each iteration), the method is proven to yield a transferable adversarial example for the black-box target within a fixed number of iterations ( $n$ ).

The bound is derived as: $(n-1) \ln t \leq \ln \epsilon - \ln \delta - \ln \gamma$ , where $\epsilon$ is the confidence margin, $\delta$ is the initial perturbation budget, and $\gamma$ is the gradient bound.

3. Key Contributions

Novel Iterative Attack: Introduction of CAC, a transfer-based attack that iteratively refines a surrogate model and contracts the search space.
Provable Convergence: The first black-box attack method to provide a mathematical guarantee that an adversarial example will be found within a fixed number of iterations, provided the surrogate model can be successfully attacked.
State-of-the-Art Performance: Experimental demonstration that CAC outperforms existing black-box attacks in terms of attack success rate and the proximity of the adversarial example to the original input.

4. Experimental Results

The authors evaluated CAC on ImageNet and CIFAR-10 datasets using ResNet-50 and Vision Transformers (ViT-B) as target models.

Metrics: Attack Success Rate (ASR), Average Query Number (AQN), and average distance ( $L_2$ and $L_\infty$ norms) between the target and the adversarial example.
Baselines: Compared against HopSkipJump, Sign-OPT, GeoDA, SquareAttack, SparseRS, PAR, and AdvViT.
Key Findings:
- Success Rate: CAC achieved a 100% ASR across all tested configurations (Hard-label and Soft-label settings, ResNet and ViT targets).
- Efficiency: CAC required fewer queries on average than most baselines to achieve success (e.g., ~488 queries vs. ~500+ for HopSkipJump on ImageNet ResNet-50).
- Perturbation Size: CAC generated adversarial examples significantly closer to the original input points than competing methods.
  - Example (ImageNet, ResNet-50, Hard-label): CAC achieved an average $L_\infty$ distance of 0.153, whereas HopSkipJump ( $L_\infty$ ) was 0.361 and Sign-OPT was 0.551.
- Robustness to Architecture: The method performed consistently well on both CNNs (ResNet) and Transformers (ViT), outperforming specialized transformer attacks like AdvViT.

5. Significance

Regulatory Compliance: CAC offers a tool for verifying compliance with upcoming AI robustness standards. Unlike empirical defenses that rely on heuristics, CAC provides a formal proof that a model is not robust if an attack is found within the guaranteed iteration limit.
Bridging Theory and Practice: It successfully bridges the gap between theoretical certified robustness (which is often computationally expensive and degrades model performance) and practical black-box attacks (which lack guarantees).
Security Implications: By proving that a model can be broken within a fixed budget, CAC highlights the inherent vulnerabilities of black-box systems, urging developers to adopt more robust training techniques or certified defenses.

In summary, Contract And Conquer transforms the black-box adversarial attack from a heuristic search into a provably convergent algorithm, offering both high empirical performance and theoretical guarantees of success.