CA-HFP: Curvature-Aware Heterogeneous Federated Pruning with Model Reconstruction

Imagine a massive group project where hundreds of students (devices) are trying to write a single, perfect encyclopedia together. However, there's a catch: they can't share their private notes (data) with each other, and they all have very different resources. Some students have supercomputers and fast internet; others have old calculators and dial-up connections. Some are experts in history, while others only know about cooking.

This is the world of Federated Learning. The goal is to build one great encyclopedia without anyone ever seeing everyone else's private notes.

The problem? If everyone tries to send their whole encyclopedia draft back and forth, the internet gets clogged, and the students with old calculators crash. If they just send small, random chunks, the final book ends up messy and full of contradictions.

Enter CA-HFP (Curvature-Aware Heterogeneous Federated Pruning). Think of it as a smart, adaptive project manager that solves these problems using three clever tricks.

1. The "Customized Backpack" (Personalized Pruning)

In a normal project, everyone is asked to carry the same heavy backpack full of every single fact. But in CA-HFP, the manager looks at each student's backpack capacity.

Student A (a powerful phone) gets a backpack with 80% of the facts.
Student B (a tiny sensor) gets a backpack with only 40% of the facts.

But here's the genius part: How do they decide which facts to keep?
Old methods just said, "Keep the biggest facts" or "Keep the facts you used most recently." CA-HFP uses a "Curvature-Aware" compass. Imagine the facts are on a hilly landscape.

Some facts are on a flat plain (not very important).
Some are on a steep cliff (very important, changing them ruins the whole map).
CA-HFP calculates the "steepness" (curvature) of the hill for every fact. It tells the students: "Keep the facts that are on the steep cliffs because if you drop those, the whole map collapses. You can safely throw away the ones on the flat plains."

This ensures that even though students carry different amounts of information, they are all carrying the most critical pieces of the puzzle.

2. The "Magic Translator" (Model Reconstruction)

Now, imagine the students finish their work and send their backpacks back to the manager to combine them.

Student A sent a backpack with 80% of the facts.
Student B sent a backpack with 40% of the facts.

If the manager tries to mix them directly, it's like trying to glue together two puzzle pieces that are different shapes. It won't fit! This is the "structural mismatch" problem.

CA-HFP introduces a Magic Translator (the Reconstruction step). Before mixing the backpacks, the manager takes Student B's sparse backpack and "fills in the blanks" using the current master copy of the encyclopedia.

If Student B didn't send a fact about "Ancient Rome," the manager looks at the master copy, sees what the current best guess for "Ancient Rome" is, and temporarily fills that spot in Student B's backpack.
Now, every backpack looks like it has the same shape and size, even though the content inside is still unique to that student.

This allows the manager to mix them all together perfectly without the pieces clashing.

3. The "Fairness Guarantee" (Convergence)

The paper also proves mathematically that this system won't go crazy. Because the students have different data (some know cooking, some know history) and different backpack sizes, the final encyclopedia could become biased or unstable.

CA-HFP calculates a "safety margin." It knows exactly how much the "steepness" of the hills (curvature) and the "missing facts" (pruning) will shake the table. By adjusting how much each student works and how they mix their notes, it guarantees that the group will eventually agree on a stable, high-quality encyclopedia, even if the students are very different.

The Result?

In the real world, this means:

Faster Internet: Students send much less data because they only send their "backpacks" (sparse models), not the whole encyclopedia.
Less Battery Drain: Students with weak phones don't have to do heavy lifting; they only process the specific facts they are allowed to keep.
Smarter Results: Despite the differences, the final model is just as accurate (or better) than if everyone had sent everything.

In short: CA-HFP is like a smart team leader who gives everyone a custom-sized backpack, tells them exactly which items are too heavy to drop, and uses a magic trick to make sure everyone's different backpacks can be mixed together perfectly to build one amazing result.

1. Problem Statement

Federated Learning (FL) faces two critical challenges when deployed on heterogeneous edge devices (e.g., IoT sensors, mobile phones):

System Heterogeneity: Devices vary significantly in computation power, memory, and network bandwidth. Requiring all clients to train full models leads to stragglers and unstable participation.
Statistical Heterogeneity (Non-IID): Data distributions across clients are non-independent and non-identically distributed. This causes divergent local updates, degrading global convergence and generalization.
The Pruning Dilemma: While model pruning reduces communication and computation costs, existing pruning-based FL methods struggle to balance communication efficiency with aggregation robustness. Standard pruning often introduces aggregation bias, leading to unstable convergence, especially under severe Non-IID conditions.

2. Methodology: CA-HFP Framework

The authors propose Curvature-Aware Heterogeneous Federated Pruning (CA-HFP), a framework that allows clients to perform personalized, structured pruning while ensuring the global model remains aggregatable. The process involves three main stages per communication round:

A. Personalized Structured Pruning

Instead of a uniform pruning ratio, each client $k$ is assigned a specific pruning ratio ( $\rho_k$ ) based on its resource constraints.

Curvature-Aware Significance Score: To determine which parameters to prune, the framework uses a metric derived from a second-order Taylor expansion of the loss function. The importance score $s_{i,t}$ $s_{i, t}$ for parameter $i$ $i$ is defined as:
$s_{i,t} = \nabla_i F(w_t) w_{i,t} + h_{i,t} w_{i,t}^2$
Where $\nabla F$ $\nabla F$ is the gradient, $w$ $w$ is the weight, and $h$ $h$ is the diagonal Hessian (curvature).
- Rationale: This score accounts for weight magnitude, gradient direction, and curvature. In Non-IID settings where gradients may diminish, the curvature term ensures that parameters critical for the local loss landscape are preserved, preventing the amplification of pruning errors over rounds.
Mask Generation: A binary mask $m_{k,t}$ is generated based on these scores. Low-scoring parameters are pruned (set to 0), creating a sparse sub-model.

B. Local Training

Clients perform $E$ steps of local Stochastic Gradient Descent (SGD) on their pruned sub-models. The updates are restricted only to the active (unpruned) parameters.

C. Server-Side Model Reconstruction & Aggregation

This is the core innovation to handle structural mismatch.

The Problem: Since clients prune different parameters, their sub-models have different structures and cannot be directly averaged (FedAvg).
The Solution: Before aggregation, the server reconstructs each client's sparse sub-model back to the full dimensionality of the global model.
- Pruned entries in the client's update are filled with the current values from the global model $w_t$ .
- This ensures all updates exist in the same parameter space, allowing for standard weighted synchronous aggregation:
  $w_{t+1} = \sum_{k} p_k \tilde{w}_{t}^{(k,E)}$
- This mechanism preserves the "potential" of pruned parameters (by keeping them in the global model) while utilizing the "current importance" of active parameters.

3. Theoretical Contributions

The paper provides a rigorous convergence analysis for FL with personalized pruning:

Convergence Bound: The authors derive a bound for federated optimization with $E$ $E$ local steps that explicitly accounts for:
1. Local computation steps ( $E$ ).
2. Data heterogeneity ( $\zeta^2$ ).
3. Pruning-induced perturbations (noise term $e_t$ ).
Key Insight: The analysis reveals a trade-off. While moderate local steps ( $E$ ) help absorb pruning bias, excessive steps can amplify noise. The derived bound proves that CA-HFP converges to a neighborhood of a stationary point, provided the pruning noise is minimized via the curvature-aware criterion.

4. Experimental Results

The framework was evaluated on FMNIST, CIFAR-10, and CIFAR-100 using VGG16 and ResNet56 architectures under varying degrees of Non-IID data (controlled by Dirichlet parameter $\alpha$ ) and system heterogeneity (device ranks).

Accuracy: CA-HFP consistently outperformed state-of-the-art baselines (FedAvg, FedProx, PruneFL, FedMP, DapperFL).
- Under severe Non-IID conditions ( $\alpha=0.1$ ), CA-HFP matched full-model accuracy (FedAvg) while using only ~25% of the parameters.
- On CIFAR-100, it maintained stable performance across all heterogeneity levels where other pruning methods failed.
Efficiency:
- Communication: Reduced transmission load by up to 90% compared to full-model training.
- Computation: Significantly reduced FLOPs per client, enabling training on highly constrained devices (Rank 3: 90% pruning).
Convergence: CA-HFP converged faster and achieved higher final accuracy than baselines, demonstrating that the curvature-aware pruning and reconstruction effectively mitigate aggregation bias.
Ablation Studies:
- Reconstruction: Removing the reconstruction step caused a significant drop in accuracy (e.g., from 73.12% to 62.84% on CIFAR-10 under $\alpha=0.1$ ), proving its necessity for handling structural mismatch.
- Curvature: Using only weight/gradient metrics (without curvature) resulted in lower accuracy, confirming the importance of second-order information in Non-IID settings.

5. Significance

Bridging the Gap: CA-HFP successfully bridges the gap between resource-constrained edge computing and statistical heterogeneity. It proves that aggressive pruning does not have to come at the cost of model accuracy or convergence stability.
Novel Metric: The introduction of a curvature-aware pruning criterion offers a new theoretical direction for FL, moving beyond simple magnitude-based pruning to account for the local loss landscape.
Practical Deployment: The lightweight server-side reconstruction mechanism makes the framework practical for real-world deployment where clients have diverse hardware capabilities and data distributions, enabling robust, efficient, and private collaborative learning.

CA-HFP: Curvature-Aware Heterogeneous Federated Pruning with Model Reconstruction

1. The "Customized Backpack" (Personalized Pruning)

2. The "Magic Translator" (Model Reconstruction)

3. The "Fairness Guarantee" (Convergence)

The Result?

1. Problem Statement

2. Methodology: CA-HFP Framework

A. Personalized Structured Pruning

B. Local Training

C. Server-Side Model Reconstruction & Aggregation

3. Theoretical Contributions

4. Experimental Results

5. Significance

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank