Knowledge-Guided Machine Learning: Illustrating the use of Explainable Boosting Machines to Identify Overshooting Tops in Satellite Imagery

Imagine you are trying to teach a computer to spot a very specific type of storm cloud, called an Overshooting Top (OT), in satellite photos. These are like giant, bubbling "heads" of clouds that punch through the sky, often signaling dangerous weather like tornadoes or hail.

For a long time, scientists have used "Black Box" AI (like deep neural networks) to do this. Think of a Black Box AI as a genius chef who can cook a perfect meal but refuses to tell you the recipe. You know the food tastes good, but if the chef suddenly decides to put salt in the dessert because they saw a picture of salt once, you have no idea why. In weather forecasting, this is dangerous. If the AI makes a mistake, we don't know why it happened, and we can't easily fix it.

This paper introduces a different kind of AI called an Explainable Boosting Machine (EBM). Instead of a mysterious genius chef, think of an EBM as a transparent, step-by-step recipe book.

Here is the story of how they used this new tool, explained simply:

1. The Problem: The "Clever Hans" Mistake

In the past, AI models sometimes learned "tricks" instead of real science. This is called the "Clever Hans" effect (named after a horse that seemed to do math but was actually just reading the trainer's body language).

The Trick: An AI might learn to spot a horse in a photo not by looking at the horse, but by looking for a "source tag" in the corner of the image that says "Horse."
The Risk: If you show the AI a horse without the tag, it fails. If you show it a picture of a pole with a tag, it thinks it's a horse.
The Goal: The authors wanted to build an AI that couldn't pull these tricks. They wanted a model that humans could look at and say, "Yes, that makes sense," or "No, that's wrong, let's fix it."

2. The Solution: The "Recipe Book" (EBM)

The authors used an Explainable Boosting Machine (EBM).

How it works: Instead of one giant, confusing brain, the EBM is like a team of small, simple experts.
- Expert 1 looks at how bright the cloud is.
- Expert 2 looks at how bumpy (textured) the cloud is.
- Expert 3 looks at how cold the cloud is.
The Magic: Each expert writes down a simple rule on a piece of paper (a graph). You can literally see the graph: "If the cloud is this bright, add 5 points to the storm score. If it's that bright, subtract 2 points."
The Result: You can see exactly how the AI is thinking. If the AI thinks a dark shadow is a storm, you can look at the "Brightness Expert's" graph, see the mistake, and simply erase that part of the graph and draw a new line. You don't have to retrain the whole AI; you just edit the recipe.

3. The Process: From Raw Data to "Scalar" Clues

Satellite images are huge grids of pixels. EBMs can't read pictures directly; they need simple numbers (scalars). The authors had to translate the picture into three simple clues for their "experts":

Brightness: How shiny is the cloud? (OTs are usually very bright in the sun).
Texture: Is the cloud smooth like a pancake or bumpy like a popcorn kernel? (OTs are very bumpy). They used a math trick called a "GLCM" to measure this bumpiness.
Temperature: How cold is the top of the cloud? (OTs are very high and very cold).

4. The Human Touch: "Editing the Recipe"

This is the most important part of the paper.

Step 1: They trained the EBM using data from radar (which tells them where rain is happening).
Step 2: They looked at the "Recipe Book" (the graphs). They noticed something weird: The AI was getting confused by shadows. It thought dark shadows were storms.
Step 3: The Edit. A human scientist looked at the graph, said, "Wait, shadows aren't storms," and manually flattened that part of the graph.
Step 4: They did this without retraining the whole model. They just changed the rules.

5. The Result: Good Enough, But Safe

The final model wasn't the most accurate AI in the world (it wasn't as "smart" as the Black Box models). However, it was honest.

It didn't make up fake reasons for its decisions.
It allowed humans to step in and fix its logic.
It successfully found the storm clouds, even if it sometimes got confused by other cold, bumpy clouds (like the "failure mode" they discussed).

The Big Takeaway

The authors are saying: "Don't just build a black box that works; build a glass box that we understand."

In high-stakes fields like weather forecasting, knowing why a computer made a prediction is just as important as the prediction itself. If a model says a tornado is coming, we need to know if it's looking at the right things (bumpy, cold clouds) or if it's just looking at a shadow. With EBMs, we can look under the hood, see the engine, and tune it ourselves.

In short: They built a weather AI that doesn't just guess; it explains its reasoning, lets humans correct its mistakes, and follows a recipe we can all read.

1. Problem Statement

Machine Learning (ML) algorithms are increasingly used in meteorology but suffer from two critical limitations:

Lack of Transparency: Complex "black-box" models (e.g., Deep Neural Networks) make it difficult to understand how decisions are made, hindering trust in operational forecasting.
Failure Modes: These models often rely on "Clever Hans" strategies (spurious correlations) rather than physical principles. When applied to data outside their training distribution (e.g., extreme weather events), they can fail catastrophically.

The specific application addressed in this paper is the detection of Overshooting Tops (OTs) in satellite imagery. OTs are indicators of severe weather (hail, tornadoes, strong winds). While deep learning models exist for this task, they lack interpretability. The authors aim to demonstrate a Knowledge-Guided Machine Learning (KGML) approach using Explainable Boosting Machines (EBMs) to create a transparent, human-guided algorithm that minimizes catastrophic failure risks, even if it sacrifices some raw accuracy compared to complex black-box models.

2. Methodology

The authors propose a three-step KGML workflow combining domain expertise with data-driven learning:

A. Feature Engineering (Knowledge-Guided Input)

Since EBMs require scalar inputs, the authors extracted three specific scalar features from GOES-16 satellite imagery (Visible and Infrared channels) to represent the physical characteristics of OTs:

Brightness ( $x_1$ ): Derived from Visible (0.64 $\mu$ m) imagery. To capture overall brightness while removing fine texture, a 9 $\times$ 9 moving average was applied, and the data was downsampled to 2 km resolution.
Cool Contrast Tiles ( $x_2$ ): Derived from Visible imagery using Gray-Level Co-occurrence Matrices (GLCMs).
- The image was split into 8 $\times$ 8 tiles.
- A "contrast" statistic was calculated for each tile to quantify texture (bubbly OTs have high contrast).
- A temperature threshold (250 K) from Infrared imagery was applied to mask out warm, low-level clouds, retaining only "cool" high-altitude texture.
Infrared ( $x_3$ ): Raw brightness temperature from the Infrared (10.3 $\mu$ m) channel, serving as a proxy for cloud-top height.

B. Model Architecture: Explainable Boosting Machines (EBMs)

Structure: EBMs are Generalized Additive Models (GAMs) extended with pairwise interactions. The model form is:
$g(E[y]) = \beta_0 + \sum f_i(x_i) + \sum f_{ij}(x_i, x_j)$
Where $f_i$ are primary feature functions and $f_{ij}$ are interaction functions.
Training: The model uses bagging and gradient-boosted trees to learn non-linear relationships. It learns main effects first, then ranks and includes significant pairwise interactions.
Interpretability: The model outputs "scores" (log-odds) for each feature value, which can be visualized as 1D or 2D heatmaps.

C. Human-in-the-Loop Strategy Editing

A unique contribution of this work is the editing phase. Instead of retraining the model to fix errors, domain scientists visually inspect the learned feature functions and manually alter them to align with physical knowledge:

Example: The initial model assigned positive scores to very low brightness values (shadows/dark clouds), incorrectly identifying them as OTs. Scientists manually flattened this spike in the feature function to remove the false association.
Example: The "Cool Contrast" function was scaled to emphasize high-texture regions and penalize low-texture regions more heavily.

3. Key Contributions

Introduction of EBMs to Meteorology: This is one of the first applications of EBMs in atmospheric science, demonstrating their utility for high-stakes, interpretable forecasting.
Editable AI Workflow: The paper establishes a workflow where model strategies are not just "explained" post-hoc (via XAI) but are actively edited by humans to correct faulty strategies without retraining. This bridges the gap between data-driven learning and physical constraints.
Knowledge-Guided Feature Extraction: The successful translation of complex satellite imagery into scalar, physics-relevant features (GLCM-based texture + IR temperature) that an EBM can process.
Transfer Learning via Label Modification: The authors used "Convection" labels (from MRMS radar data) rather than specific "OT" labels. They demonstrated that by editing the EBM's strategies, a model trained on general convection can be effectively modified to detect specific OTs, acting as a form of transfer learning.

4. Results

Performance Metrics:
- The model was evaluated against MRMS convection labels (not perfect OT labels), leading to inherent limitations in standard metrics.
- Recall: Improved from 0.070 (unedited) to 0.144 (edited).
- Precision: Slightly decreased from 0.519 to 0.487 (due to increased detection of weak signals).
- Heidke Skill Score: Improved significantly from 0.119 to 0.216, indicating the edited model performs much better than a random forecast.
Case Studies:
- Success: The model successfully detected OTs with strong thermal and textural signals. It also correctly identified OTs in shadowed regions by leveraging the interaction between "Brightness" and "Infrared" features (dark + cold = OT).
- Failure Modes: The model struggled to distinguish OTs from other cold, textured features like Above-Anvil Cirrus Plumes (AACP) or cold U/V shapes. This was identified as a limitation of the feature set (lack of tropopause height data) rather than a faulty strategy, proving the transparency of the approach.
Efficiency: Training took ~20 minutes; inference on a full GOES domain takes <6 seconds.

5. Significance and Conclusion

Trust and Safety: In severe weather forecasting, avoiding catastrophic failures is often more important than maximizing average accuracy. EBMs provide the transparency required to ensure the model is using physically valid strategies.
Human-Machine Collaboration: The paper argues that the "extra work" of visualizing and editing model strategies is a necessary price to pay for interpretability. Unlike black-box models where strategies are hidden, EBMs allow forecasters to intervene directly.
Future Directions: The authors suggest that while the current model is simple, the framework is scalable. Future work should focus on creating large, hand-labeled OT datasets for better training and incorporating additional features (e.g., tropopause height) to resolve specific failure modes like distinguishing OTs from AACP.

In summary, the paper demonstrates that Explainable Boosting Machines, when combined with Knowledge-Guided Machine Learning and human strategy editing, offer a robust, transparent alternative to black-box deep learning for critical meteorological tasks, prioritizing reliability and physical consistency over raw predictive power.