Prototype Perturbation for Relaxing Alignment Constraints in Backward-Compatible Learning

Imagine you run a massive library (a Retrieval System) where every book has a unique "fingerprint" (an embedding) that helps the librarian find it quickly.

The Problem: The "Library Renovation" Dilemma

One day, you decide to upgrade your librarian with a new, super-smart AI. This new AI is much better at understanding books and can find them faster. However, there's a catch:

The Old Way (Backfilling): To use the new AI, you have to re-fingerprint every single book in the library using the new system. If your library has 10 million books, this takes weeks of work and costs a fortune in computing power. It's like hiring a whole new team to re-shelve every book just because you bought a new shelf.
The "Backward-Compatible" Way (BCL): To save time, researchers developed a method where the new AI learns to speak the same "language" as the old one. This way, the old fingerprints still work, and you don't need to re-shelve anything.

But here's the twist: The old AI had some blind spots. It couldn't tell the difference between two very similar books (let's call them "Book A" and "Book B"). In the old system, their fingerprints were practically identical.

If you force the new AI to strictly copy the old AI's language (to stay compatible), the new AI is forced to keep "Book A" and "Book B" looking identical too. The new AI loses its ability to distinguish them, even though it could tell them apart if it were free to do so. It's like forcing a master chef to cook a bland meal just because the old kitchen only had one spice.

The Solution: "Prototype Perturbation"

The authors of this paper propose a clever trick called Prototype Perturbation.

Think of the "fingerprint" of a whole category of books (e.g., "Mystery Novels") as a Prototype (a central meeting point for all mystery books).

In the old system, the meeting point for "Mystery Novels" was accidentally squished right next to the meeting point for "Thrillers." They were too close to tell apart.

The authors' idea is simple: Before the new AI learns from the old one, we gently nudge the old meeting points apart.

The Nudge (Perturbation): We take the old "Mystery" meeting point and push it slightly away from the "Thriller" meeting point. We do this based on how similar they are. If they are too close, we push them harder.
The Pseudo-Old World: We create a "fake" or "pseudo" version of the old library where these meeting points are already spaced out nicely.
The New AI's Job: The new AI is trained to match this improved version of the old library. Because the meeting points are already separated, the new AI learns to distinguish "Mystery" from "Thriller" effectively, while still remaining compatible with the old system.

Two Ways to Do the Nudge

The paper offers two ways to calculate exactly how hard to push these points:

Neighbor-Driven (NDPP): The "Local Neighborhood" Approach
- Imagine you are at a party. You look at the people standing immediately next to you. If someone is standing too close to your group, you gently step away from them.
- This method looks at the immediate neighbors of a prototype and pushes it away based on simple math. It's fast and efficient, like a quick reflex.
Optimization-Driven (ODPP): The "Grand Strategy" Approach
- This is like a chess grandmaster looking at the whole board. Instead of just looking at neighbors, it calculates the best possible move for every prototype simultaneously to ensure the whole library is perfectly organized.
- It's more computationally expensive (takes more brainpower/time) but can find a better solution when the library is huge and chaotic.

Why This Matters

By using this "nudging" technique, the new AI gets the best of both worlds:

Compatibility: It still understands the old library's fingerprints (no need for a massive re-shelving project).
Discrimination: It doesn't get stuck in the old AI's blind spots. It learns to tell the difference between similar items, making the search results much more accurate.

In a nutshell: The paper solves the problem of upgrading a system without breaking the old one by teaching the new system to "pretend" the old system was already slightly better organized, allowing the new system to learn to be even smarter without losing its connection to the past.

Here is a detailed technical summary of the paper "Prototype Perturbation for Relaxing Alignment Constraints in Backward-Compatible Learning".

1. Problem Statement

In image retrieval systems, updating the embedding model (e.g., due to new data or better architectures) traditionally requires backfilling: re-computing embeddings for the entire gallery database using the new model. This process is computationally expensive and time-consuming.

Backward-Compatible Learning (BCL) addresses this by training a new model ( $\phi_n$ ) that produces embeddings compatible with the old model's ( $\phi_o$ ) existing gallery embeddings, eliminating the need for backfilling. However, existing BCL methods enforce strict alignment constraints between the new and old feature spaces.

The Core Issue: If two different classes are closely clustered or indistinguishable in the old feature space, strict alignment forces the new model to maintain this indistinguishability. This compromises the discriminative ability of the new model, preventing it from separating classes that could theoretically be distinguished with a fresh feature space.
Goal: The authors aim to relax these alignment constraints to allow the new model to improve its discriminative power while maintaining backward compatibility.

2. Methodology

The authors propose a Prototype Perturbation mechanism. Instead of aligning the new model directly to the rigid "old prototypes" (class centers), they introduce perturbations to the old prototypes to create a pseudo-old feature space. The new model is then trained to align with this perturbed space.

The core idea is to adaptively push indistinguishable old prototypes away from their neighbors, creating a "cleaner" target for the new model to learn from. Two specific approaches are developed:

A. Neighbor-Driven Prototype Perturbation (NDPP)

NDPP uses a heuristic approach to calculate perturbations based on local neighborhood information.

Mechanism: For each old prototype, it identifies $K$ -nearest neighbors (from both old and new feature spaces).
Repulsion: It calculates a repulsion vector based on the similarity between the prototype and its neighbors. If neighbors are similar (high cosine similarity), the repulsion is strong, pushing the prototype away.
Two-Stage Perturbation:
1. Old-to-Old: Perturb old prototypes based on other old prototypes.
2. Joint (New-to-Old): Further perturb the resulting pseudo-old prototypes based on the new model's prototypes. This allows the perturbation to adapt dynamically as the new model evolves during training.
Loss Function: The contrastive loss is modified to align new embeddings with these perturbed (pseudo-old) prototypes for positive pairs, while keeping negative pairs aligned with real old prototypes.

B. Optimization-Driven Prototype Perturbation (ODPP)

ODPP treats perturbation as a learnable parameter optimized via an objective function.

Mechanism: It introduces learnable perturbation vectors ( $r_l$ ) for each old prototype.
Objective: It minimizes a hinge loss designed to push indistinguishable prototype pairs apart. The objective function considers:
- Similarity between perturbed old prototypes.
- Similarity between perturbed old prototypes and new prototypes.
Optimization: Uses mini-batch Stochastic Gradient Descent (SGD) to find the global optimal perturbations that maximize the separation of similar classes.
Trade-off: ODPP is computationally more expensive than NDPP but often yields better results in complex, high-dimensional scenarios with many classes.

3. Key Contributions

Prototype Perturbation Mechanism: A novel method to relax strict backward-compatibility constraints by generating a "pseudo-old" feature space where indistinguishable classes are artificially separated.
Two Novel Implementations:
- NDPP: A lightweight, heuristic approach suitable for scenarios with fewer classes or where training efficiency is critical.
- ODPP: A robust, optimization-based approach suitable for large-scale datasets with complex class distributions.
Dual-Prototype Leverage: Both methods uniquely utilize feature distributions from both the old and new models to determine perturbations, ensuring the new model's discriminative ability is enhanced without breaking compatibility.
Comprehensive Evaluation: Extensive experiments on landmark (GLDv2, RParis, ROxford), commodity (In-Shop), and person Re-ID (Market-1501, RSTPReid) datasets.

4. Experimental Results

The methods were evaluated against state-of-the-art BCL algorithms (e.g., BCT, AdvBCT, UniBCT, BT2, CoReS) across single-step and sequential model upgrades.

Performance Metrics: Evaluated using Self-Test (new model vs. new gallery) and Cross-Test (new query vs. old gallery) Mean Average Precision (mAP).
Key Findings:
- Superiority: Both NDPP and ODPP consistently outperformed SOTA methods in Self-Test (indicating better discriminative ability) while maintaining or improving Cross-Test performance (backward compatibility).
- Sequential Learning: In sequential upgrades (e.g., 9% $\to$ 30% $\to$ 100% data), the proposed methods maintained high compatibility across multiple steps, whereas some baselines degraded.
- Multimodal Extension: The method successfully extended to text-based person retrieval (RSTPReid), proving its applicability beyond visual-only tasks.
- Ablation Studies:
  - Perturbations based on joint (old + new) neighbors/prototypes performed better than those based solely on old data.
  - The methods remained effective even without pre-trained backbone parameters, though performance was higher with them.
- Feature Distribution Analysis: Visualizations (t-SNE) confirmed that NDPP and ODPP successfully separated previously clustered classes (e.g., classes 782 & 1414 in GLDv2) that were indistinguishable under the baseline.

5. Significance

Solving the "Indistinguishable Class" Dilemma: This work addresses a critical limitation in BCL where strict compatibility forces the new model to inherit the flaws (poor separability) of the old model. By perturbing the target space, the new model is free to learn better decision boundaries.
Practical Deployment: By eliminating the need for backfilling while improving model performance, this approach offers a highly efficient solution for real-world retrieval systems that require frequent updates (e.g., e-commerce, surveillance).
Theoretical Insight: The paper draws a connection between prototype perturbation and Neural Collapse, suggesting that perturbing prototypes helps achieve a more uniform feature distribution, which is beneficial for classification and retrieval.

In conclusion, the paper presents a robust framework for backward-compatible learning that breaks the trade-off between compatibility and discriminative power, offering two flexible strategies (NDPP and ODPP) tailored to different computational and data-scale constraints.

Prototype Perturbation for Relaxing Alignment Constraints in Backward-Compatible Learning

The Problem: The "Library Renovation" Dilemma

The Solution: "Prototype Perturbation"

Two Ways to Do the Nudge

Why This Matters

1. Problem Statement

2. Methodology

A. Neighbor-Driven Prototype Perturbation (NDPP)

B. Optimization-Driven Prototype Perturbation (ODPP)

3. Key Contributions

4. Experimental Results

5. Significance

More like this

The Structure of Service Level Agreement of Slice-based 5G Network

Digital currency hardware wallets and the essence of money

Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network

Positionality in Σ_0^2 and a completeness result

Slightly Non-Linear Higher-Order Tree Transducers