RandMark: On Random Watermarking of Visual Foundation Models

Imagine you just built a magnificent, custom-made robot chef. You spent years gathering the best recipes, training it on millions of dishes, and tweaking its algorithms until it could chop vegetables and flip pancakes better than anyone else. This robot is your Visual Foundation Model (VFM). It's incredibly valuable.

Now, imagine you want to sell this robot's "brain" (the software) to other restaurants. But you're worried: what if a restaurant buys your robot, copies its brain, and then sells that copy to a third party without your permission? Or what if they tweak the brain slightly to make it faster, claiming it's a new invention?

You need a way to prove, "Hey, this robot brain is mine!" even if someone tries to hide it. This is where the paper RandMark comes in.

The Problem: Invisible Stolen Goods

Currently, if someone steals your robot's brain, it's hard to prove it's yours. Standard "fingerprints" (like checking the serial number) often get wiped out if the thief changes the code slightly (fine-tuning) or removes parts of the code to make it smaller (pruning).

The Solution: The "Magic Ink" Watermark

The authors propose a new method called RandMark. Think of it not as a permanent tattoo on the robot's skin, but as a magic ink hidden inside its thoughts.

Here is how it works, step-by-step:

1. The Secret Recipe (The Watermark)

Instead of changing the robot's entire brain, the authors use a special "encoder" (a tiny helper program) to inject a secret binary message (a string of 0s and 1s, like a secret code) into the robot's internal processing.

The Analogy: Imagine you give your robot chef a specific, slightly blurry photo of a tomato. You tell the robot, "When you see this specific blurry tomato, think of the secret code '10101'."
The robot learns to associate that specific image with that secret code. This happens inside the robot's hidden layers of thought.

2. The Random Twist (The "Random" in RandMark)

The clever part is that they don't just use one photo. They use randomly distorted versions of that photo.

The Analogy: You show the robot the tomato photo, but sometimes it's upside down, sometimes it's zoomed in, sometimes it's slightly blurry. No matter how you twist the photo, the robot is trained to still whisper the secret code "10101" in its mind.
Because the robot has to work hard to recognize the code through all these random twists, the "memory" of the code becomes deeply embedded in its neural pathways.

3. The Test (The Decoder)

Later, if you suspect someone has stolen your robot, you run a test.

You show the suspect robot the same set of twisted tomato photos.
You use a "decoder" (a detective program) to listen to what the robot whispers.
If it's your robot: Even if the thief tweaked the robot's brain to be faster or changed it to do a different job (like chopping onions instead of tomatoes), the robot will still whisper the secret code "10101" most of the time.
If it's a stranger's robot: A robot that wasn't trained with your secret code will just be confused. It might guess random codes, or say nothing. It won't consistently whisper "10101."

Why is this better than old methods?

The paper compares RandMark to other "fingerprinting" methods using a few key metaphors:

The "Heavy Hand" vs. The "Gentle Touch": Old methods often tried to change the robot's brain so drastically to hide the fingerprint that the robot started making mistakes (like burning the toast). RandMark is like a gentle touch; it hides the code so well that the robot still works perfectly.
The "Eraser" Test: Thieves often try to "prune" (cut out) parts of the code to remove the fingerprint.
- Old Method: If you cut out 20% of the robot's brain, the fingerprint disappears.
- RandMark: Because the code is woven into the robot's way of thinking about random images, even if you cut out 40% of the brain, the robot still remembers the secret code. It's like trying to erase a song from a person's memory by removing a few neurons; the melody is still there.

The Results

The researchers tested this on two very famous, powerful AI models (CLIP and DINOv2).

Success Rate: When they took their watermarked models and trained them on new tasks (like identifying food or products), the secret code was still there 100% of the time.
No False Alarms: When they tested completely different, innocent models that had nothing to do with their robot, the system correctly said, "No, this isn't yours." It didn't get confused.

The Bottom Line

RandMark is a way for AI creators to stamp their "copyright" directly into the way a model thinks. It's like teaching a model a secret handshake that it can't forget, even if someone tries to change its job, shrink its size, or retrain it. If the model can still do the secret handshake, you know it belongs to you.

Here is a detailed technical summary of the paper "RandMark: On Random Watermarking of Visual Foundation Models".

1. Problem Statement

Visual Foundation Models (VFMs), such as CLIP and DINOv2, are valuable assets due to the high cost of data collection and training. Owners distribute these models via licenses or subscriptions to protect their Intellectual Property Rights (IPR). However, unauthorized usage (e.g., integrating models into other services) remains a threat.

Existing protection methods face specific limitations when applied to VFMs:

Fingerprinting: Often does not alter the model but generates unique identifiers. However, many existing fingerprinting methods are designed for classifiers and fail to distinguish between independent models and functional copies of VFMs after fine-tuning or pruning.
Traditional Watermarking: Many approaches modify model weights or rely on specific input-output triggers designed for classification tasks. These are often not robust enough for the diverse downstream capabilities of VFMs (e.g., feature extraction, segmentation) or fail when the model undergoes functional perturbations like fine-tuning or unstructured pruning.

The Core Challenge: How to embed a robust, verifiable watermark into a Visual Foundation Model that survives downstream fine-tuning and pruning, while ensuring low false-positive rates for independent models.

2. Methodology: RandMark

The authors propose RandMark, a novel watermarking framework that embeds digital signatures into the internal hidden representations of the model rather than just the final output or weights.

Key Components

Architecture:
- Encoder ( $e$ ): A lightweight network that takes an input image $x$ and a binary message $m$ (the watermark) and produces a modified input representation.
- Source VFM ( $f$ ): The foundation model being watermarked.
- Decoder ( $d$ ): A lightweight network that extracts the binary message from the VFM's output embedding.
- Process: The system is trained jointly. The encoder injects the message into the representation of a "trigger" image (often perturbed with Gaussian noise $\epsilon$ ), and the decoder attempts to recover the message from the VFM's output.
Random Watermark Embedding:
- Unlike deterministic triggers, RandMark relies on randomness. The input image is transformed with random noise ( $x + \epsilon$ ), making the extracted watermark a random variable.
- The verification process involves applying randomized input transformations to a set of trigger images and analyzing the statistical distribution of the extracted messages.
Training Objective (Loss Function):
The model is trained to minimize two terms:
- Representation Preservation: Ensure the watermarked model's features ( $\tilde{f}(x)$ ) do not deviate significantly from the original model ( $f(x)$ ).
- Message Recovery: Minimize the distance between the embedded message $m$ and the extracted message $m'$ across $K$ random transformations.
  $L = \|f(x) - \tilde{f}(x)\|_2 + \frac{\lambda}{K} \sum_{j=1}^K \|m - m'_j\|_2$
Verification & Decision Rule:
- Metric: The system calculates the Hamming distance (number of bit errors) between the original message and the extracted message over $K$ trials.
- Thresholding: A model is deemed watermarked if the average bit error rate is below a threshold $\tau$ .
- Statistical Bounds: The authors derive theoretical upper bounds for:
  - False Positive (FP): Detecting a watermark in an independent model ( $g \perp f$ ).
  - False Negative (FN): Failing to detect a watermark in a functional copy ( $f' \sim f$ ) after fine-tuning or pruning.

3. Key Contributions

Novel Methodology: Introduced RandMark, the first watermarking approach specifically designed for Visual Foundation Models that embeds signatures into hidden representations via a set of trigger images.
Theoretical Guarantees: Provided theoretical derivations for the upper bounds of false positive and false negative detection probabilities, proving the method's reliability in distinguishing functional copies from independent models.
Robustness to Perturbations: Demonstrated that the watermark survives:
- Fine-tuning: Adapting the model for downstream tasks (classification and segmentation).
- Unstructured Pruning: Removing up to 40% of model weights.
Model Agnostic: The method does not require modifying the backbone architecture significantly and works with different VFM architectures (e.g., CLIP, DINOv2).

4. Experimental Results

The authors evaluated RandMark on CLIP and DINOv2 using datasets for e-commerce classification and food segmentation.

Detection Rates:
- Positive Suspects (Functional Copies): RandMark achieved near-perfect detection rates (up to 100%) for models fine-tuned on classification/segmentation and models subjected to 20% and 40% pruning.
- Negative Suspects (Independent Models): The method yielded a 0% false detection rate for independent models (e.g., a different VFM architecture).
Comparison with Baselines:
- Compared against ADV-TRA and IPGuard (fingerprinting methods) and Randomized Smoothing (weight-based watermarking).
- Result: RandMark significantly outperformed baselines. For example, while ADV-TRA and IPGuard failed to detect watermarks in fine-tuned or pruned models (0% detection), RandMark maintained high accuracy.
- Task Performance: Unlike weight-smoothing baselines which degraded downstream task performance (e.g., segmentation accuracy dropped significantly), RandMark preserved high task performance while maintaining watermark integrity.
Covariance Analysis: The authors introduced a covariance metric between decoded messages. Independent models showed near-zero covariance, while watermarked functional copies showed positive covariance, providing an additional layer of verification.

5. Significance and Conclusion

RandMark addresses a critical gap in the security of Visual Foundation Models. By leveraging random watermark embedding in hidden representations, it offers a robust solution that:

Protects IP: Allows owners to verify ownership even after the model has been adapted for specific commercial tasks.
Minimizes False Accusations: Theoretical and experimental results confirm a very low probability of falsely accusing independent models of being copies.
Preserves Utility: Unlike some watermarking techniques that degrade model performance, RandMark maintains the model's effectiveness for downstream tasks.

This work establishes a new standard for IP protection in the era of large-scale, transferable foundation models, ensuring that the value of these assets can be secured without compromising their utility.