Decoupling Defense Strategies for Robust Image Watermarking

Imagine you have a precious secret message hidden inside a digital photo. This is called image watermarking. It's like hiding a tiny, invisible sticker inside a painting so that later, you can prove you own it or that it was made by an AI.

For a long time, artists (the watermark creators) and hackers (the attackers) have been in a cat-and-mouse game.

The Problem: Old methods tried to make the sticker super strong against everything at once—like trying to build a house that is simultaneously waterproof, fireproof, earthquake-proof, and bulletproof. The result? The house became so reinforced that it looked ugly (low image quality) and still had weak spots against new, fancy attacks like AI image generators or "adversarial" tricks that confuse the decoder.

The paper you shared introduces AdvMark, a new strategy that solves this by decoupling (separating) the defense into two smart stages. Think of it not as building one super-hard fortress, but as a two-step security upgrade.

The Two-Stage Strategy

Stage 1: The "Safe Zone" Move (Fighting Adversarial Attacks)

The Analogy: Imagine you are trying to hide a secret note in a crowded room.

The Old Way: You tried to build a giant wall around the note to stop anyone from touching it. But building that wall made the room so cramped that people couldn't even see the note clearly (this is the "loss of clean accuracy").
The AdvMark Way: Instead of building a wall, you simply move the note to the center of the room, far away from the doors and windows where the troublemakers hang out.
- How it works: The system tweaks the encoder (the tool that hides the message) to push the watermarked image into a "safe zone" in the mathematical space. This zone is naturally hard for hackers to reach.
- The Result: The image looks perfect (high quality), and because the note is in the middle of the room, the "adversarial" hackers can't find a way to knock it over without moving the whole room.

Stage 2: The "Reinforced Shield" (Fighting Distortion & AI Regeneration)

The Analogy: Now that the note is safe in the center, you need to protect it from things like rain (JPEG compression), wind (noise), or someone trying to repaint the whole wall over it (AI regeneration).

The Problem: If you just strengthen the note now, you might accidentally push it back toward the dangerous doors you avoided in Stage 1.
The AdvMark Way: The system takes the image from Stage 1 and directly optimizes the pixels (the image itself) to be tough against rain and wind.
- The Secret Sauce: They use a special "constrained loss" (a rulebook). This rulebook says: "Make the image stronger against rain, BUT don't let it move more than a tiny inch away from where it was in Stage 1."
- The Result: You get an image that is tough against AI re-generators and compression, but it hasn't drifted back into the "danger zone" where the hackers can trick it.

Why is this a Big Deal?

The paper compares their method to the old "Joint Training" (trying to do everything at once) and shows massive improvements:

Better Quality: The images look much clearer. It's like the difference between a blurry, muddy photo and a crisp HD photo. They improved image quality metrics by a huge margin (up to 46% better in some cases).
Stronger Defense:
- Against Distortion (like JPEG compression): Up to 29% better.
- Against AI Regeneration (where an AI tries to redraw the image to erase the watermark): Up to 33% better.
- Against Adversarial Attacks (tricks designed to fool the decoder): Up to 46% better.

The "Early Stop" Trick

One clever detail is how they handle the optimization. Usually, when you try to make something stronger, you might accidentally ruin its beauty. AdvMark uses a "Quality-Aware Early Stop."

Analogy: Imagine you are polishing a diamond. You keep polishing it to make it shine, but you have a rule: "Stop immediately if the diamond starts to look cloudy." This ensures the final image is always beautiful, even while being fortified.

Summary

AdvMark is like a master locksmith who realizes that trying to lock every door with one giant key is a bad idea. Instead, they:

Move the treasure to the safest spot in the vault (Stage 1).
Reinforce the walls around that specific spot without moving the treasure (Stage 2).

The result is a watermark that is invisible to the naked eye, survives AI attempts to erase it, and survives standard image compression, all while keeping the photo looking perfect.

1. Problem Statement

Deep learning-based image watermarking is essential for tracing AI-generated content (AIGC) and protecting intellectual property. However, existing methods face two critical challenges when defending against a comprehensive suite of attacks (distortion, regeneration, and adversarial):

Challenge 1: Trade-off between Clean Accuracy and Robustness. Conventional methods use Joint Adversarial Training (JAT), which jointly optimizes the encoder and decoder via a noise layer to simulate attacks. This approach inevitably degrades the "clean accuracy" (the ability to extract watermarks from unattacked images) because adversarial training distorts the decision boundary.
Challenge 2: Limited Robustness via Simultaneous Training. Attempting to defend against all attack types (e.g., JPEG compression, Diffusion-based regeneration, and adversarial perturbations) simultaneously in a monolithic training process leads to inefficient optimization. The model fails to achieve high robustness against complex attacks like diffusion-based regeneration or advanced adversarial examples (e.g., WEvade) while maintaining high visual quality.

2. Methodology: AdvMark

The authors propose AdvMark, a novel two-stage fine-tuning framework that decouples defense strategies to overcome the limitations of JAT.

Stage 1: Adversarial Encoder Fine-Tuning (EAT)

Goal: Address adversarial attacks while preserving clean accuracy.
Strategy: Instead of jointly training the encoder and decoder, AdvMark primarily fine-tunes the encoder.
- Defender-Tailored Adversarial Attack: The authors construct adversarial examples by optimizing the perturbation to force the decoded message toward a random label (deviating from the ground truth) rather than a specific target, ensuring the model learns to resist general evasion.
- Conditional Decoder Update: The decoder is updated only conditionally. If the bit accuracy on the attacked image falls below a threshold ( $\tau_1$ ), the decoder receives a single update. This prevents the "tampering" of the decision boundary that causes clean accuracy loss.
- Mechanism: The encoder learns to map watermarked images into a "non-attackable region" (the center of the decision space) rather than expanding the auxiliary boundary.

Stage 2: Quality-Aware Direct Image Optimization

Goal: Address distortion (e.g., JPEG, noise) and regeneration (e.g., Stable Diffusion) attacks without sacrificing the adversarial robustness gained in Stage 1.
Strategy: Instead of retraining the neural network, AdvMark performs direct optimization on the encoded image ( $x_{w2}$ $x_{w 2}$ ).
- Constrained Image Loss: A novel loss function is proposed to balance three objectives:
  1. Attack Loss: Minimize error against distortion and regeneration attacks.
  2. Clean Accuracy: Ensure the optimized image still decodes correctly.
  3. Robustness Preservation: A critical constraint term ( $l(x_{w2}, x_{w1})$ ) limits the deviation between the Stage 2 optimized image and the Stage 1 adversarially robust image. The authors provide a theoretical guarantee (Theorem 1) proving that if the deviation is within a certain bound, the adversarial robustness is preserved.
- Quality-Aware Early Stop: To ensure visual quality, the optimization uses a modified PGD (Projected Gradient Descent) approach. Instead of a standard $\epsilon$ -ball projection, it employs a quality-aware mapping that stops optimization if the PSNR drops below a specific budget ( $p$ ), guaranteeing a lower bound on visual quality.

3. Key Contributions

Systematic Evaluation: The paper is the first to systematically evaluate existing watermarking methods against the triad of distortion, regeneration, and adversarial attacks, identifying the specific failure modes of joint optimization.
AdvMark Framework: Introduction of a decoupled, two-stage defense strategy that separates adversarial defense (encoder-focused) from distortion/regeneration defense (image-focused).
Theoretical Guarantees: Derivation of a constrained image loss with a theoretical proof ensuring that direct image optimization does not compromise previously learned adversarial robustness.
Quality-Aware Optimization: Proposal of a metric-aware early-stop mechanism for image optimization, ensuring high visual fidelity (PSNR/SSIM/LPIPS) without manual tuning of perturbation budgets.

4. Experimental Results

The authors evaluated AdvMark against 9 baseline methods (including MBRS, HiDDeN, Stable Signature, VINE) across 10 different attacks on MS-COCO and DiffusionDB datasets.

Robustness Improvements:
- Distortion Attacks: Up to 29% accuracy improvement over SOTA.
- Regeneration Attacks: Up to 33% accuracy improvement.
- Adversarial Attacks: Up to 46% accuracy improvement (e.g., against WEvade and Black-Surrogate attacks).
- Clean Accuracy: Maintained near-perfect accuracy (1.00) on unattacked images, whereas JAT-based methods often dropped to ~0.94.
Visual Quality:
- AdvMark achieved the highest image quality among all methods.
- On MS-COCO (128x128), AdvMark achieved a PSNR of 37.0 (vs. 32.1 for MBRS) and an LPIPS of 0.01 (lower is better).
Ablation Studies:
- Removing Stage 1 caused a collapse in adversarial robustness (WEvade accuracy dropped from 0.98 to 0.50).
- Removing Stage 2 caused a drop in distortion robustness (JPEG accuracy dropped from 0.99 to 0.88).
- The constrained loss term was proven essential for maintaining the balance between robustness and quality.

5. Significance

This work fundamentally shifts the paradigm of robust watermarking from joint model training to decoupled defense strategies. By recognizing that different attack types require different mitigation mechanisms (encoder modification vs. image perturbation), AdvMark resolves the long-standing trade-off between clean accuracy and robustness.

The method is particularly significant for the era of Generative AI, where regeneration attacks (using diffusion models to strip watermarks) are becoming a major threat. AdvMark provides a theoretically grounded, computationally efficient solution that ensures watermarks survive both traditional transmission distortions and advanced AI-based erasure attempts, all while maintaining high visual fidelity suitable for real-world deployment.