A SISA-based Machine Unlearning Framework for Power Transformer Inter-Turn Short-Circuit Fault Localization

Here is an explanation of the paper using simple language and creative analogies.

The Big Problem: A "Bad Apple" in the Data Barrel

Imagine you are training a very smart robot to act as a power transformer detective. Its job is to listen to the electrical hum of a transformer and instantly tell you exactly where a tiny short-circuit (a "turn-to-turn" fault) is happening inside.

To teach this robot, you feed it thousands of recordings of electrical signals. You want the robot to learn the difference between a healthy hum and a sick one.

The Catch: In the real world, the sensors (the robot's "ears") sometimes break or get interference from electromagnetic noise (like a radio station jamming a signal). This creates "poisoned" data—recordings that look like a fault but are actually just sensor glitches.

If you train your robot on these bad recordings, it gets confused. It might start thinking a sensor glitch is a real fire hazard, or it might miss a real fault.

The Old Solution: If you realize your robot was trained on bad data, the traditional way to fix it is to delete the bad data and start from scratch. You have to re-teach the robot everything from Day 1.

The Downside: This takes forever and uses a massive amount of computer power. It's like burning down your entire library just because one book has a typo, then rewriting every single book from memory.

The New Solution: The "SISA" Framework

This paper proposes a smarter way called SISA (Sharded, Isolated, Sliced, and Aggregated). Think of it as changing how you organize your training library.

Instead of one giant brain learning everything, SISA builds a team of smaller experts.

1. Sharding (The Team of Specialists)

Imagine you have a huge pile of training files. Instead of giving the whole pile to one student, you split the pile into 4 separate boxes (Shards).

You hire 4 different students (Sub-models).
Student A only studies Box 1.
Student B only studies Box 2.
And so on.
The Magic: Because they only study their own box, what Student A learns doesn't mess up what Student B learns. They are "isolated."

2. Slicing (The Chapters)

Inside each box, you further organize the files into chapters (Slices). This helps the students learn step-by-step, but the most important part is that the data is neatly compartmentalized.

3. Aggregation (The Panel of Judges)

When a real transformer makes a noise, all 4 students listen to it. They each vote on what the problem is. The final answer is the average of their votes. This ensures the final decision is accurate, even if one student is slightly unsure.

How "Unlearning" Works (The Magic Trick)

Now, imagine you discover that Box 1 contains some "poisoned" sensor data (bad recordings).

The Old Way (Full Retraining): You fire all 4 students, throw away all 4 boxes of notes, and hire 4 new students to re-read the entire library from scratch.
The SISA Way (Machine Unlearning): You only fire Student A. You throw away only Box 1. You hire a new student to re-learn just Box 1.
- Students B, C, and D? They keep their notes exactly as they are. They don't need to relearn anything.
- Once the new Student A is trained, you put them back on the team. The "Panel of Judges" is back in business, and the bad data is gone.

Why This Matters (The Results)

The researchers tested this on simulated power transformer faults. Here is what they found:

Speed: When they had to fix the "poisoned" data, the SISA method was 2 to 4 times faster than starting over. It's like fixing a single typo in a document versus rewriting the whole book.
Accuracy: The team of specialists (SISA) was almost just as good at diagnosing faults as the single giant brain (Full Retraining).
- Note: If you split the data into too many small boxes (like 4 or more), the students didn't have enough examples to learn from, and they got a bit confused. So, there is a "Goldilocks" zone—not too few, not too many boxes.
Real-World Fit: This is perfect for power grids. If a sensor fails in a wind farm, you don't want to shut down the whole diagnostic system for days to retrain it. You want to fix the specific error in minutes.

The Bottom Line

This paper introduces a modular approach to AI training. Instead of building a giant, fragile monolith that requires a total rebuild when one piece is broken, they built a Lego set.

If one Lego piece is defective, you just snap that one piece out and replace it. The rest of the structure stays strong, stable, and ready to work immediately. This makes AI much more practical for keeping our power grids safe and reliable.

Here is a detailed technical summary of the paper "A SISA-based Machine Unlearning Framework for Power Transformer Inter-Turn Short-Circuit Fault Localization."

1. Problem Statement

In modern power systems, data-driven machine learning (ML) models are critical for fault diagnosis in equipment like power transformers. However, these models rely on sensor data that is often contaminated by sensor failures (e.g., due to Electromagnetic Interference (EMI), aging, or component degradation).

The Challenge: Unlike clear errors, sensor failures often produce signal patterns indistinguishable from normal operating conditions or environmental noise during preprocessing. Consequently, "poisoned" data may only be detected after the model is trained.
The Bottleneck: The standard solution is to remove the bad data and retrain the model from scratch. This process is computationally expensive, time-consuming, and impractical for real-time industrial monitoring systems.
The Goal: Develop a method to efficiently remove the influence of poisoned data from a trained model without full retraining, a process known as Machine Unlearning (MU).

2. Methodology

The paper proposes a SISA (Sharded, Isolated, Sliced, and Aggregated) framework tailored for Inter-Turn Short-Circuit Fault (ITSCF) localization in power transformers.

A. Data Generation & Simulation

Simulation Environment: A 1.5 MW wind turbine model in MATLAB/Simulink was used to simulate ITSCFs under various severities on both High-Voltage (HV) and Low-Voltage (LV) sides across three phases (A, B, C).
Dataset: Generated 48 distinct fault conditions. Current signals (15 seconds, 1000 Hz sampling) were recorded.
Poisoning: EMI-induced sensor failures were simulated by injecting noise, spikes, and biases into specific Current Transformer (CT) measurements to create "poisoned" datasets.

B. The SISA Framework Architecture

Instead of training a single monolithic model, the framework partitions the training process:

Sharding: The dataset ( $D$ ) is divided into $S$ independent shards ( $D_1, D_2, ..., D_S$ ).
Slicing: Each shard is further divided into ordered slices ( $R$ ). Crucially, every slice contains a balanced representation of all fault labels to ensure diversity.
Isolated Training: Separate LSTM (Long Short-Term Memory) models are trained independently on each shard.
Aggregation: The final prediction is an ensemble of all shard models using a softmax-probability averaging strategy:
- The logits of each shard model are converted to probabilities.
- The final probability is the average of probabilities from all $S$ shards.
- The label with the highest average probability is selected.

C. The Unlearning Mechanism

When poisoned data is detected (e.g., a specific ITSCF condition in Shard $D_1$ ):

Selective Retraining: Only the affected shard ( $D_1$ ) is retrained from scratch, starting from the compromised slice.
Preservation: All other shards and their learned parameters remain untouched.
Adaptive Sharding: Faults with similar severity levels are grouped into the same shard to mimic real-world scenarios where EMI affects specific locations (e.g., a single substation), making isolation more effective.

D. Model Details

Base Model: Two-layer LSTM architecture followed by a classification head.
Preprocessing: 15-second signals split into overlapping 50-step windows.
Training: Adam optimizer, 60 epochs, batch size 512, categorical cross-entropy loss.

3. Key Contributions

Novel Framework: Proposed the first SISA-based Machine Unlearning framework specifically for power transformer ITSCF localization.
Strategic Implementation: Developed a softmax-probability averaging strategy to aggregate predictions from independent shard models, ensuring robustness while enabling unlearning.
Realistic Evaluation: Created a simulated dataset incorporating EMI-induced sensor failures to rigorously test the framework's ability to handle "depoisoning" scenarios.
Efficiency Demonstration: Proved that SISA unlearning can restore diagnostic accuracy to near-full-retraining levels while drastically reducing computational costs.

4. Experimental Results

Experiments were conducted on an RTX 4090 workstation comparing Non-SISA (Full Retraining) vs. SISA Unlearning under various shard configurations ( $S=1, 2, 4$ ).

Accuracy:
- Poisoned Data: Models trained on poisoned data showed reduced accuracy (e.g., 97.46% for $S=1$ ).
- Restoration: Both Full Retraining and SISA Unlearning successfully restored accuracy to >97% after removing poisoned data.
- Shard Trade-off:
  - $S=2$ : Achieved 99.05% accuracy (slight drop from 99.78% in full retraining).
  - $S=4$ : Accuracy dropped significantly (to ~84%) due to limited data diversity within shards, causing confusion between HV and LV side patterns.
Computational Efficiency (Retraining Time):
- Full Retraining ( $S=1$ ): ~445.4 seconds.
- SISA ( $S=2$ ): ~221.8 seconds (2.01× speedup).
- SISA ( $S=4$ ): ~112.2 seconds (3.97× speedup).
Confusion Matrix Analysis:
- Poisoned data caused specific misclassifications (e.g., LA faults misidentified as HA).
- Unlearning successfully corrected these errors.
- LV side phases showed higher misclassification rates due to strong signal similarity, a limitation inherent to the fault physics rather than the algorithm.

5. Significance and Conclusion

Practical Impact: The framework offers a viable solution for industrial monitoring systems where sensor failures are inevitable. It allows operators to "fix" AI models quickly without the downtime and cost of full retraining.
Scalability: While the current study used a limited dataset, the authors note that the computational savings (speedup) would be even more pronounced with larger, complex real-world datasets.
Optimal Configuration: The study suggests that $S=2$ offers the best balance between maintaining high diagnostic accuracy and achieving significant time savings. Excessive sharding ( $S=4$ ) risks data scarcity within shards, degrading performance.
Future Work: The authors suggest exploring other MU methods and applying this framework to broader data-driven condition monitoring systems.

In summary, this paper demonstrates that SISA-based Machine Unlearning is a highly effective, computationally efficient strategy for maintaining the reliability of power transformer fault diagnosis systems in the presence of sensor-induced data poisoning.