An Efficient Unsupervised Federated Learning Approach for Anomaly Detection in Heterogeneous IoT Networks

Imagine a massive neighborhood where everyone owns a different kind of smart device—some have high-tech security cameras, others have simple motion sensors, and some have smart thermostats. All these devices generate data about what's happening in their homes.

The goal is to teach a "smart guard" to spot intruders (anomalies) or identify which device is acting up, without anyone having to send their private home video feeds or sensor logs to a central police station. This is where Federated Learning comes in. Instead of sending the data, the devices send only their "lessons learned" (mathematical updates) to a central teacher.

However, there's a big problem: Heterogeneity.

Device A speaks a language with 48 words (features).
Device B speaks a language with 46 words.
Device C speaks a language with 78 words.

If you try to teach them all from the same textbook, it's a mess. The "teacher" (central server) gets confused because the inputs don't match. Most existing systems either force everyone to use the same (limited) vocabulary or throw away the unique words that make each device special.

The Paper's Solution: The "Common Ground" Club

This paper proposes a clever new way to run this neighborhood club. Here is how it works, broken down into simple steps:

1. The "Shared Vocabulary" Strategy

Imagine three neighbors trying to solve a mystery.

Neighbor 1 has a list of 48 clues.
Neighbor 2 has a list of 46 clues.
Neighbor 3 has a list of 78 clues.

They all have some clues in common (like "time of day," "IP address," or "packet size"). The authors' method says: "Let's only share the lessons learned from the clues we all have in common."

They ignore the unique clues that don't match for now. They take the "common ground" lessons, mix them together at the central school, and create a Super-Teacher Model.

2. The "Custom Tailoring" Step

Once the Super-Teacher is created, it goes back to each neighbor.

The Super-Teacher says, "Here is what I learned about the common clues."
Each neighbor then tailors this knowledge to fit their own unique list of clues. They take the global wisdom and combine it with their own specific data to fine-tune their local detective skills.

This is like a master chef teaching a group of cooks. The chef teaches them the universal rules of seasoning (the shared features). Then, each cook takes those rules and applies them to their own specific cuisine (their unique device data), resulting in a better dish than if they had tried to cook alone.

3. The "Silent Detective" (Unsupervised Learning)

Usually, to teach a computer to spot a burglar, you need to show it thousands of photos labeled "Burglar" and "Not a Burglar." But in the real world, you rarely have those labels. You just have a pile of data and you don't know what's normal and what's weird.

This system uses Deep Autoencoders. Think of this as a "compression machine."

It tries to squish all the data into a tiny, secret summary (a latent space).
If the data is normal, the machine can easily un-squish it and reconstruct it perfectly.
If the data is weird (an attack), the machine struggles to reconstruct it. The "error" tells the system: "Hey, this looks suspicious!"

Because it doesn't need labels, it's perfect for the real world where we don't know what attacks look like yet.

4. The "Grouping Game" (K-Means Clustering)

Once the data is compressed into those tiny summaries, the system plays a game of "Grouping." It puts similar-looking summaries into the same pile.

One pile might be "Normal Traffic."
Another pile might be "Suspicious Traffic."
Another pile might be "Device Type A."

Since the computer doesn't know which pile is which (it's unsupervised), the paper uses a smart trick called Label Alignment to figure out which pile corresponds to "Attack" and which to "Normal" after the fact, ensuring the final score is accurate.

5. The "Why?" Question (Explainability)

Finally, the system uses a tool called SHAP (like a magnifying glass) to explain why it made a decision.

"I flagged this as an attack because the 'packet size' was huge, and the 'time between messages' was too fast."
This makes the system trustworthy because humans can see the logic behind the alarm.

The Results: Why It Matters

The researchers tested this on real-world data from three different years (2022, 2023, and 2024).

The Old Way (Baseline): When devices tried to learn alone or with a rigid system, they missed a lot of attacks.
The New Way: By sharing the "common clues" and keeping the "unique clues" local, the new system got significantly better at spotting intruders.
- On the newest, most complex dataset (2024), it improved accuracy by about 15%.

The Big Takeaway

This paper shows that you don't have to force everyone to be the same to work together. By finding the common ground between different devices and respecting their unique differences, you can build a smarter, more private, and more accurate security system for the Internet of Things. It's like a choir where everyone sings a different part, but they all harmonize on the chorus, creating a beautiful song that no single voice could produce alone.

1. Problem Statement

The rapid proliferation of IoT devices has created highly heterogeneous ecosystems where devices vary in capabilities, data formats, and communication constraints. This heterogeneity leads to Non-IID (Non-Independent and Identically Distributed) data distributions across clients, posing significant challenges for traditional Federated Learning (FL).

Key challenges identified in the paper include:

Feature Heterogeneity: Different IoT devices generate data with varying feature dimensions, types, and sampling rates. Standard FL frameworks often require uniform input/output features, forcing the removal of unique features or leading to model incompatibility.
Privacy Constraints: Centralized aggregation of raw IoT data is undesirable due to privacy and security concerns.
Lack of Labels: Anomaly detection in IoT often operates in an unsupervised setting where labeled attack data is scarce or non-existent, making supervised FL approaches less effective.
Performance Degradation: Existing methods often struggle to balance global model convergence with the preservation of local dataset characteristics, leading to suboptimal anomaly detection accuracy.

2. Methodology

The authors propose an Unsupervised Federated Learning (FL) framework that integrates heterogeneous clients without sharing raw data. The approach is structured into four main phases:

A. Semantic Data Refinement (Preprocessing)

Datasets: The study utilizes three distinct IoT datasets:
- CICIoT2022: Device Identification (48 features, 11 device classes).
- CICIoT2023: Anomaly Detection (46 features, binary: Normal/Attack).
- CICIoT-DIAD 2024: Anomaly Detection (78 features, binary: Normal/Attack).
Normalization: Data is normalized using Min-Max scaling.
Balancing: Datasets are balanced (e.g., 6,000 normal vs. 6,000 attack samples) to ensure fair training.

B. Federated Knowledge Aggregation (Core Innovation)

The framework employs a Deep Autoencoder architecture for unsupervised learning. The key innovation lies in handling heterogeneous input/output dimensions:

Local Training: Each client trains a local autoencoder using the Adam optimizer and Mean Squared Error (MSE) loss.
Layer-Specific Aggregation:
- Common Layers: Intermediate hidden layers (which are structurally identical across clients) are aggregated on the server.
- Heterogeneous Layers: The input and output layers differ in dimensionality (e.g., 48 inputs vs. 46 inputs). These layers are not aggregated globally. Instead, clients retain their specific input/output weights.
Dynamic Weight Adjustment:
- The server aggregates only the weights of the common layers using a sample-size-weighted average (Equation 2).
- The global model is reconstructed at each client by merging the server-aggregated common weights with the client's local input/output weights.
- Fine-tuning: Clients perform a brief re-training (validation) phase to align the newly merged weights before testing.
Clustering: Instead of a supervised classifier, the bottleneck (latent) layer representations are extracted and fed into K-means clustering.
- $k=2$ for anomaly detection (Normal vs. Attack).
- $k=11$ for device identification.

C. Intelligent Device and Anomaly Profiling

Label Alignment: Since K-means assigns arbitrary cluster indices, a label alignment strategy is applied:
- Binary Case: The algorithm checks both the original and inverted label mappings, selecting the one with higher accuracy.
- Multi-class Case: A frequency-based mapping aligns predicted clusters to true device classes.

D. Explainable Intelligence Assessment

SHAP (SHapley Additive exPlanations): The framework uses SHAP to interpret model decisions, identifying which features (shared vs. unique) drive the detection of anomalies or device types.

3. Key Contributions

Unified FL Framework for Heterogeneous Features: A novel mechanism that allows clients with different input/output dimensions to collaborate. It aggregates only common hidden layers while preserving client-specific input/output layers, avoiding the loss of unique feature information.
Unsupervised Anomaly Detection: The approach operates entirely without labeled data, utilizing deep autoencoders and K-means clustering to detect anomalies in real-time.
Collaborative Feature Sharing: The method leverages shared features across complementary datasets (device ID and anomaly detection) to enhance global model robustness without compromising privacy.
Explainability Integration: The inclusion of SHAP provides transparency, validating that shared features are the primary drivers of the model's improved performance.

4. Experimental Results

The proposed method was evaluated on CICIoT2022, CICIoT2023, and CICIoT-DIAD 2024 datasets and compared against a baseline centralized autoencoder.

Performance Gains:
- CICIoT-DIAD 2024: The proposed method achieved a significant improvement, with the F1-score increasing by approximately 15% (from ~0.795 to ~0.957) compared to the baseline.
- CICIoT2022: Substantial improvements were observed in accuracy and F1-score for device identification.
- CICIoT2023: Modest but stable gains were recorded.
Convergence: The model demonstrated stable convergence over 21 federated rounds.
Explainability: SHAP analysis confirmed that the shared features across datasets were the most influential predictors, validating the efficacy of the shared-feature integration strategy.

5. Significance and Conclusion

This paper addresses a critical gap in IoT security: how to perform effective anomaly detection in decentralized, heterogeneous environments without sacrificing privacy or data fidelity.

Practical Impact: The framework enables IoT networks with diverse device types to collaboratively learn a robust global model for threat detection without transmitting sensitive raw data.
Theoretical Contribution: It demonstrates that feature heterogeneity does not have to be a barrier to FL; rather, by selectively aggregating common representations and preserving local specificities, performance can be enhanced.
Future Directions: The authors suggest future work in learned representation alignment (using contrastive learning), personalized FL for highly non-IID clients, and incorporating temporal modeling (e.g., Transformers) to capture sequence dynamics.

In summary, the proposed unsupervised FL framework offers a scalable, privacy-preserving, and highly accurate solution for anomaly detection in complex IoT ecosystems, outperforming conventional homogeneous approaches.