SDFed: Bridging Local Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning

Imagine a group of chefs from different parts of the world trying to create a single, perfect "Master Recipe Book" together. They want to collaborate, but they have two big problems:

Privacy: They can't send their secret family recipes (their raw data) to a central kitchen.
Differences: One chef is an expert in spicy curries (lots of data, specific style), another is a master of delicate pastries (less data, different style), and they all have different kitchen equipment (computing power).

In the world of Artificial Intelligence, this is called Federated Learning. The "chefs" are computers (clients), and the "Master Recipe Book" is a massive AI model (like CLIP) that understands both images and text.

The Old Way: The "One-Size-Fits-All" Problem

Previously, when these AI chefs tried to collaborate, they were forced to use the exact same recipe card (called a "Prompt").

The Issue: If the "Master Recipe" was too long and complex, the chef with the small, old kitchen couldn't handle it. If it was too short, the expert chef couldn't express their specific style.
The Conflict: The global recipe might say "Add salt," but the local chef knows their specific dish needs "Lemon zest." If they just average the recipes, the final dish tastes weird to everyone. The global knowledge clashed with local needs.

The New Solution: SDFed

The paper introduces SDFed, a smart new way for these AI chefs to collaborate. Think of it as a "Smart Recipe Exchange" with three clever tricks:

1. The "Universal Base + Custom Add-Ons" (Heterogeneous Framework)

Instead of forcing everyone to use the same recipe card, SDFed gives everyone two things:

A Fixed "Global Base": A short, standard set of instructions that everyone agrees on (like "Preheat the oven"). This is easy to share and combine.
A Variable "Local Add-On": A custom section of the recipe that can be as long or short as the specific chef needs. The curry chef can write a 50-step guide for spices, while the pastry chef writes a 5-step guide for folding dough.
Why it works: It respects that everyone has different needs and resources, while still keeping a common language to talk to each other.

2. The "Conflict Filter" (Subspace Refinement)

Sometimes, the local chef's custom instructions might accidentally contradict the global base (e.g., the global base says "Bake at 350°F," but the local add-on says "Bake at 500°F").

The Trick: SDFed uses a mathematical "filter" (called Subspace Refinement). Imagine looking at the local recipe and asking, "Which parts of this are just repeating what the global recipe already says?"
The Action: It strips away the repetitive or conflicting parts of the local recipe, keeping only the unique and new information. This prevents the local chef from accidentally "un-learning" the global rules.

3. The "Goldilocks Zone" (Divergence Control)

Now we have a risk: What if the local chef gets too focused on their own style and ignores the global team entirely? Or what if they get too influenced by the global team and lose their unique flavor?

The Trick: SDFed uses a "Goldilocks" strategy.
- Retention: It ensures the local chef keeps their unique "secret sauce" (Information Retention).
- Separation: It makes sure the local recipe stays distinct enough from the global one so it doesn't just become a copy (Divergence Control).
The Result: The local recipe is unique and helpful, but it still fits perfectly into the Master Recipe Book.

The Result?

When the paper tested this system, it was like a team of chefs finally creating a menu that was perfect for everyone:

Better Taste: The AI got much more accurate at recognizing images (like distinguishing between different types of flowers or food).
Faster Cooking: It learned quickly, even when some chefs had very little data (the "low-shot" scenario).
No Burned Dishes: It worked great even when the chefs had very different equipment or data styles.

In a Nutshell

SDFed is like a smart translator and mediator for AI teams. It lets everyone speak their own "dialect" (custom length prompts) while ensuring they all understand the main language (global prompt). It filters out the noise, keeps the unique flavor, and ensures that the final group decision is better than any single person could have made alone.

Here is a detailed technical summary of the paper "SDFed: Bridging Local–Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning."

1. Problem Statement

The paper addresses the challenges of adapting Vision-Language Pretrained Models (VLPMs) (e.g., CLIP) in Federated Learning (FL) settings, specifically under conditions of client heterogeneity.

Context: While VLPMs offer strong transferable representations, standard federated optimization is hindered by high communication costs (transmitting full model weights) and limited local data on clients. Federated Prompt Learning mitigates this by freezing the backbone and training only lightweight prompt parameters.
The Gap: Existing federated prompt learning methods (e.g., FedAvg-style aggregation) enforce a unified prompt structure and length across all clients.
Key Challenges:
1. Structural Rigidity: Real-world clients have heterogeneous data distributions (Non-IID) and varying system resources (model architectures, computational capacity). A fixed-length prompt cannot optimally serve diverse local complexities.
2. Local-Global Conflict: A single global prompt may introduce knowledge that conflicts with locally optimal features, impairing local convergence and personalization. Conversely, excessive local optimization can hinder cross-client knowledge transfer.
3. Aggregability: Allowing variable-length local prompts creates a challenge for the server to aggregate parameters into a unified global model.

2. Methodology: SDFed Framework

The authors propose SDFed, a framework designed to bridge local-global discrepancies through three core modules:

A. Prompt-Driven Federated Heterogeneous Framework

Dual-Prompt Architecture: Each client maintains two prompts:
- A Fixed-Length Global Prompt ( $G_s$ ): Shared across all clients for efficient aggregation and consistency.
- A Variable-Length Local Prompt ( $G_c$ ): Unique to each client to capture specific data characteristics and resource constraints.
Mechanism: Clients jointly update both prompts locally. Only the global prompt is uploaded to the server for aggregation (weighted by dataset size), while the local prompt remains private.
Structural Alignment: To enable joint learning, the framework uses shared tokens (start, end, class) and zero-padding to align sequences within a unified Transformer space, despite length differences.

B. Subspace Refinement for Local Prompts (SRLP)

To prevent the local prompt from learning redundant or conflicting information dominated by the global prompt:

SVD Projection: The server performs Singular Value Decomposition (SVD) on the global prompt to identify dominant global semantic directions.
Null-Space Projection: The local prompt is projected onto the null space (or low-energy subspace) of the global prompt. This filters out components aligned with dominant global directions, forcing the local prompt to learn client-specific variations.
Mathematical Formulation: $\tilde{G}_c = G_c R$ , where $R$ is an orthogonal projector constructed from the right-singular vectors of the global prompt corresponding to smaller singular values.

C. Information Retention and Divergence Control (IRDC)

To balance the need for local specificity with the preservation of key semantics:

Stretch Term ( $L_{str}$ ): Minimizes the Mean Squared Error (MSE) between the original local prompt and its projected version. This ensures that while conflicting components are removed, the core semantic meaning of the local prompt is preserved.
Separate Term ( $L_{sep}$ ): Enforces a margin constraint (using ReLU) to maintain a minimum Euclidean distance between the local prompt and the global prompt. This prevents the local prompt from collapsing into the global one, ensuring personalization.
Unified Objective: The total loss combines Cross-Entropy (for both prompts), the Stretch term, and the Separate term.

3. Key Contributions

Framework Innovation: Proposed SDFed, the first federated prompt learning framework that explicitly supports heterogeneous prompt lengths while maintaining global aggregability.
Algorithmic Advances:
- Developed Subspace Refinement to mathematically decouple local and global representations, reducing interference.
- Introduced Information Retention and Divergence Control to optimize the trade-off between preserving local information and maintaining distinctiveness from the global model.
Theoretical Guarantees: Provided a convergence analysis proving that the algorithm converges to a first-order stationary point under standard assumptions (L-smoothness, bounded variance, and client heterogeneity).
Privacy Preservation: Demonstrated that the method introduces no additional privacy leakage beyond standard FedAvg, as all refinement and projection operations occur locally.

4. Experimental Results

The authors evaluated SDFed on multiple datasets (Flowers102, DTD, Food101, OxfordPets, Caltech101, Office31, OfficeHome, CIFAR-10, Tiny-ImageNet) under various heterogeneity settings.

Single-Domain Performance: SDFed achieved State-of-the-Art (SOTA) results, outperforming strong baselines (CoOp, Powder, FedOTP, GPT-FL, UOPP) by up to 3.44% in accuracy with lower variance.
Multi-Domain & Model Heterogeneity: In settings with different domains (OfficeHome) and different backbone models (ViT vs. ResNet), SDFed consistently outperformed runners-up, demonstrating robustness to both data and system heterogeneity.
Few-Shot Learning: SDFed showed superior robustness in low-shot scenarios (1-4 shots), maintaining high accuracy where baselines degraded significantly.
Ablation Studies:
- Removing the Subspace Refinement (SRLP) or Divergence Control (IRDC) led to significant performance drops, confirming the necessity of both conflict resolution and information retention.
- The computational overhead of the subspace refinement was negligible (<1% of total training time).
Prompt Length Sensitivity: Experiments confirmed that allowing variable prompt lengths (4–64) tailored to specific clients yields better performance than fixed-length approaches.

5. Significance

Bridging the Gap: SDFed effectively solves the tension between global generalization (needed for federated aggregation) and local personalization (needed for heterogeneous data).
Efficiency: By keeping the backbone frozen and only optimizing lightweight prompts with variable lengths, it offers a highly parameter-efficient solution for privacy-sensitive, resource-constrained environments.
Scalability: The method is compatible with existing privacy defenses (e.g., Secure Aggregation, Differential Privacy) and scales well to large numbers of clients (tested up to 100 clients).
Practical Impact: It provides a viable path for deploying powerful VLPMs in real-world scenarios where data is siloed, non-IID, and clients have varying computational capabilities.