Weakly Supervised Cloud Detection Combining Spectral Features and Multi-Scale Deep Network

Imagine you are looking at a beautiful landscape photo taken from a satellite. It's a crisp, clear view of the Earth. But suddenly, a layer of fog rolls in, or a fluffy white cloud drifts across the frame. To a computer trying to analyze the ground below, that cloud is a nuisance—it hides the roads, the forests, and the cities.

To fix this, computers need to draw a "mask" over the clouds, effectively saying, "Ignore this part; it's just sky." This is called Cloud Detection.

However, doing this is tricky. Thick, puffy clouds are easy to spot (they are bright white). But thin clouds and haze are like ghosts; they are faint, semi-transparent, and look very similar to bright snow or sun-baked sand. Traditional computer programs often miss these ghosts or get confused by the bright ground.

This paper introduces a new, clever method called SpecMCD to solve this problem. Here is how it works, explained in simple terms:

1. The Problem: The "One-Size-Fits-All" Mistake

Imagine trying to find a lost toy in a giant warehouse.

If you look at the whole warehouse from a helicopter (a large-scale view), you might see the big piles of boxes (thick clouds) but miss the tiny toy hidden in the corner (thin clouds).
If you get down on your knees and look at a small square foot of the floor (a small-scale view), you might find the toy, but you'll miss the fact that there are huge piles of boxes right next to it.

Previous methods usually picked just one "view" (either the helicopter or the knee-level). This paper says, "Why not use both?"

2. The Solution: The "Progressive Training" Framework

The authors built a smart AI that learns in stages, like a student getting better at a test over time.

Step 1: The AI starts by looking at big chunks of the image to learn where the big, obvious clouds are.
Step 2: It gradually starts looking at smaller, finer details to learn how to spot the faint, thin clouds.
The Result: The AI becomes a master at seeing both the giant storm clouds and the wispy morning mist.

3. The "Spectral Feature" Trick: The Color Detective

Even with a good eye, thin clouds are hard to distinguish from bright snow.

The Analogy: Imagine you are trying to tell the difference between a white sheet and a white cloud. If you only look at them with your eyes, it's hard. But if you use a special pair of glasses that only sees blue light, the cloud might glow differently than the sheet.
The Method: The AI uses a mathematical trick (called Singular Value Decomposition) to create a "Cloud Thickness Map." It looks at how blue and green light bounce off the surface. This helps the AI realize, "Ah, this bright white spot is actually a thin cloud because of how it reflects blue light," rather than just guessing.

4. The "Fusion" Strategy: Mixing the Best of Both Worlds

The AI now has two different maps:

A map that is great at finding dense, thick clouds (but might miss the edges).
A map that is great at finding large, thin cloud areas (but might get confused by bright ground).

Instead of picking one, the AI acts like a chef mixing ingredients. It looks at the "edges" of the clouds.

If the edge is sharp and clear (like a thick cloud), it trusts the "thick cloud" map.
If the edge is fuzzy and spread out (like a thin cloud), it trusts the "thin cloud" map.
It blends them together perfectly to create one master map.

5. The "Adaptive Threshold": No More Guessing

Usually, to turn a "cloud probability map" (a blurry image showing where clouds might be) into a "binary mask" (a sharp black-and-white image), humans have to set a rule: "If the cloudiness score is above 50, call it a cloud."

The Old Way: Humans guess the number 50. Sometimes it's too high (misses thin clouds), sometimes too low (calls snow a cloud).
The SpecMCD Way: The AI looks at the specific image and says, "For this picture, the magic number is 42." It adjusts the rule automatically for every single photo, ensuring it never misses a ghostly cloud.

Why Does This Matter?

The authors tested their method against other top-tier systems.

The Result: Their method found 7.8% more clouds than the next best weakly supervised method.
The Impact: This means satellite images can be cleaned up much better. Whether you are mapping forests, tracking weather, or planning a city, you won't have to throw away as many photos because of "bad weather."

In a Nutshell

Think of SpecMCD as a detective who doesn't just have one magnifying glass. They have a telescope to see the big picture, a microscope to see the tiny details, a special light to see through the fog, and a smart brain that knows exactly how to combine all these tools to find the hidden ghosts in the sky.

This allows us to see the Earth's surface clearly, even when the sky is trying to hide it.

1. Problem Statement

Clouds significantly degrade the quality of optical satellite imagery, causing information loss (thick clouds) or spectral distortion (thin clouds). While deep learning has advanced cloud detection, existing methods face three critical limitations:

Data Scarcity: Pixel-level supervised methods require massive, high-quality annotated datasets, which are expensive to produce and often lack thin cloud/fog samples.
Thin Cloud Detection: Existing weakly supervised methods (using scene-level labels or physical rules) struggle with thin clouds due to their indistinct spectral features and lack of clear boundaries. Physical rule-based methods often fail to distinguish thin clouds from bright surfaces (e.g., snow, sand).
Scale Limitations: Single-scale scene-level networks cannot simultaneously capture the global context of large-area thin clouds and the local details of dense thick clouds, leading to either missed detections or false positives.

2. Methodology: SpecMCD

The authors propose SpecMCD, a weakly supervised framework that integrates spectral features (specifically a Cloud Thickness Map) with a multi-scale scene-level deep network. The workflow consists of four main stages:

A. Multi-Scale Scene-Level Network Training

Dataset Construction: Instead of single-scale data, the authors constructed a multi-scale dataset with three resolutions: $256\times256 $,$ $,$ 128\times128 $, and$ $, an d$ 64\times64$.
- Thick cloud samples were generated via pseudo-labeling (TransMCD strategy).
- Thin cloud samples were manually outlined for large-area thin cloud regions.
Progressive Training Framework: A RegNetY network is trained progressively. It starts with large-scale samples ($256\times256 $) to learn global features, then gradually incorporates smaller scales ($ 128\times128 $and$ 64\times64$) to enhance sensitivity to fine details and thin clouds.
Inference: A local sliding window strategy generates multi-scale cloud probability maps ( $\rho_{256}, \rho_{128}, \rho_{64}$ ) to reduce missed detections.

B. Cloud Thickness Map (CTM) Estimation

Initialization: A synthetic band is used to estimate initial cloud thickness based on the strong spectral response of thin clouds in the blue band: $CTM = 2 \times B - 0.95 \times G$ .
Refinement: To mitigate false positives from bright surfaces (high reflectance), regions where CTM exceeds the median but are not identified as clouds by the scene-level masks are suppressed.
Differentiated Optimization:
- Dense Clouds: The CTM is smoothed using mean filtering to suppress noise.
- Large-Area Clouds: Singular Value Decomposition (SVD) is applied to the CTM. A low-rank approximation (retaining the top 70 singular values) is used to preserve global cloud features while removing local noise.

C. Generation of Pixel-Level Probability Maps

The method generates two specialized probability maps based on cloud coverage characteristics:

Large-Area Cloud Map: Aggregates multi-scale probabilities weighted towards larger scales and multiplied by the low-rank SVD CTM. This enhances thin cloud detection.
Dense Cloud Map: Aggregates multi-scale probabilities weighted towards smaller scales and multiplied by the smoothed CTM. This reduces false positives in dense regions.

D. Fusion and Adaptive Thresholding

Gradient-Based Fusion: The two probability maps are fused using the CTM gradient (computed via Sobel operator). High gradients indicate thick cloud boundaries (favoring the Dense Cloud map), while low gradients indicate thin clouds (favoring the Large-Area Cloud map).
Adaptive Thresholding: Instead of manual thresholds, the method calculates an adaptive threshold ( $\mu_{Final}$ ) based on the proportion of gradient boundaries within the scene-level masks.
Distance-Weighted Optimization: To further capture thin clouds near thick clouds, the fused probability map is expanded using a distance-weighted strategy, compensating for areas where the initial thresholding might be too strict.

3. Key Contributions

SpecMCD Framework: A novel weakly supervised method that combines multi-scale scene-level learning with spectral feature integration (CTM) to achieve pixel-level accuracy without pixel-level labels.
Progressive Training Strategy: A unified network trained on multi-scale samples that effectively balances global context (large-area clouds) and local details (thin clouds), overcoming the limitations of single-scale networks.
Differentiated Fusion Strategy: A mechanism that fuses dense and large-area cloud probability maps based on CTM gradients, allowing the system to adaptively handle varying cloud coverage types.
Automation: The use of adaptive thresholds and distance-weighted optimization eliminates the need for manual parameter tuning, enhancing robustness across different scenes.

4. Experimental Results

The method was validated on 60 Gaofen-1 (GF1-MS) images using two datasets: WDCD and GF1MS-WHU.

Comparison with Weakly Supervised Methods:
- SpecMCD achieved an F1-score of 0.8997, outperforming the next best weakly supervised method (WDCD) by 7.82%.
- It significantly improved Recall (0.9287) compared to physical rule-based methods (e.g., HCDNet, TransMCD), which suffered from severe under-detection of thin clouds (Recall < 0.46).
- It reduced false positives compared to baseline single-scale scene-level networks (SL-256, SL-128, SL-64).
Comparison with Fully Supervised Methods:
- SpecMCD outperformed state-of-the-art fully supervised methods (BoundaryNet, HCDNet-Pixel, RegNetY) in Overall Accuracy (OA) and F2-score (which emphasizes recall).
- While fully supervised methods performed slightly better in dense cloud boundary details, SpecMCD was superior in detecting large-area thin clouds and reducing omission errors.
Application Impact:
- In image reconstruction experiments, SpecMCD-based masks enabled the simultaneous removal of thick and thin clouds, resulting in more accurate radiance recovery compared to masks that missed thin clouds.

5. Significance and Limitations

Significance:

Cost-Effective: Reduces the dependency on expensive pixel-level annotations while maintaining high detection accuracy.
Thin Cloud Capability: Successfully addresses the "blind spot" of current deep learning methods regarding thin clouds and haze, which are critical for atmospheric correction and land cover mapping.
Robustness: The adaptive thresholding makes the method applicable to diverse scenes without manual intervention.

Limitations:

Cloud vs. Snow/Ice: The method relies on visible bands for spectral features, making it difficult to distinguish clouds from snow/ice without additional data (e.g., thermal bands).
Dense Cloud Details: While effective for large areas, the method still loses some fine-scale boundary details in extremely dense cloud regions compared to fully supervised pixel-level networks.
Shadow Detection: The current framework does not directly detect cloud shadows.

Future Work:
The authors plan to integrate pixel-level networks to refine dense cloud detection, incorporate data-driven cloud-snow discrimination, and explore cloud shadow detection using morphological constraints.