Automated Disentangling Analysis of Skin Colour for Lesion Images

Imagine you are a doctor trying to diagnose a skin condition, like a mole or a rash. You've studied thousands of photos of these conditions, but almost all of them were taken on people with light skin, under bright studio lights, and with expensive cameras. Now, you try to use your knowledge to diagnose a patient with dark skin, photographed in a dimly lit room with a smartphone. The diagnosis might go wrong. Why? Because the computer (and even human eyes) gets confused by the lighting, the camera, and the skin tone all mixed together.

This paper introduces a clever new tool to untangle that mess. Think of it as a "Skin Color Translator" that helps computers understand skin conditions fairly, no matter who the patient is or how the photo was taken.

Here is how it works, broken down into simple concepts:

1. The Problem: The "Smoothie" of Skin Photos

Imagine a skin photo is like a fruit smoothie.

The fruit is the actual skin condition (the mole or rash).
The milk is the person's natural skin tone.
The lighting and camera are like the sugar and ice cream added to the mix.

Current AI models are bad at separating these ingredients. If they only learned from "strawberry milkshakes" (light skin, bright light), they get confused when they see a "blueberry smoothie" (dark skin, dim light). They think the color of the smoothie is the fruit itself.

2. The Solution: The "Magic Blender"

The authors built a system that acts like a magic blender. It takes a photo and separates the ingredients back into their original parts:

The Shape (Geometry): It keeps the outline of the mole and the skin texture.
The Color (Skin Tone & Lighting): It extracts the "flavor" (the specific skin color and lighting conditions) into a separate, organized list of numbers (a "latent space").

The Secret Sauce: The "Randomized Decolorizer"
Usually, to separate color from shape, you just turn an image black and white. But the authors realized that simple black-and-white conversion is a cheat code. It accidentally leaves behind clues about how dark the skin is (shadows), which confuses the AI.

Instead, they invented a randomized decolorizer. Imagine taking a photo and running it through a filter that changes the colors in a weird, random, but consistent way every single time. This forces the AI to stop relying on simple "dark vs. light" clues and actually learn what the true skin color is, without cheating.

3. The "Safety Net": Fixing the Mistakes

Sometimes, when you change the "flavor" of the smoothie, you accidentally change the fruit too. For example, if you change the skin tone, the AI might accidentally turn a black ink mark (like a pen dot used by a doctor) into a weird color, or make a scar look different.

To fix this, they added a Geometry-Aligned Post-Processing Step. Think of this as a spell-checker for photos. After the AI changes the skin color, this step looks at the original photo. If it sees a spot that didn't change correctly (like a scar or a pen mark), it says, "Wait, that doesn't belong to the skin tone; put it back to how it was." This ensures the medical details stay perfect while the skin tone changes.

4. What Can We Do With This?

Once the AI has learned to separate these ingredients, it can do three amazing things:

The "What If" Machine (Counterfactuals): You can ask, "What would this mole look like if it were on a person with darker skin?" or "What if this photo was taken under yellow light instead of white light?" The AI can generate a realistic answer. This is huge for medical education, letting students practice on a wide variety of skin types without needing real patients.
The "Fairness" Filter (Data Augmentation): If a hospital only has photos of light-skinned patients, this tool can generate fake (but realistic) photos of dark-skinned patients with the same conditions. This creates a balanced training set, teaching the AI to be fair to everyone.
The "Standardizer" (Normalization): It can take a photo taken in a dark bathroom and "normalize" it to look like it was taken in a perfect clinic. This helps different hospitals compare their data without getting confused by their different cameras.

The Big Picture

The ultimate goal of this research is Health Equity.

Right now, skin cancer detection tools often fail on people with darker skin because they weren't trained on enough diverse data. This paper provides a way to synthesize diverse training data and standardize images automatically. It's like giving every doctor a universal translator that ensures a diagnosis is based on the disease, not the skin tone or the lighting.

In short: They built a tool that teaches computers to see the condition, not just the color, ensuring that everyone gets a fair and accurate diagnosis.

1. Problem Statement

Machine learning models for dermatology often suffer from performance degradation when deployed on skin images with different Skin Colours Captured in Images (SCCI) than those in the training set.

Root Cause: SCCI is not a single intrinsic property (like skin tone) but the result of entangled intrinsic factors (e.g., melanin levels, blood perfusion) and environmental factors (e.g., illumination, camera white balance, device settings).
Limitations of Existing Methods:
- Augmentation: Current methods often rely on simple scalar descriptors (e.g., Fitzpatrick scale) or synthetic images that fail to capture the complexity of real-world color formation, offering limited control.
- Normalization: Existing color normalization techniques focus primarily on environmental factors and ignore intrinsic variations, or they lack a structured, interpretable latent space for controlled manipulation.
The Gap: There is a lack of a framework that can disentangle these factors to answer counterfactual questions (e.g., "What would this lesion look like on a different skin tone under different lighting?") while preserving lesion structure and localized details (e.g., scars, ink marks).

2. Methodology

The authors propose an unsupervised skin-colour disentangling framework based on the Information Bottleneck principle and disentanglement-by-compression.

A. Core Architecture

The model consists of two main networks (adapted from prior work [25]):

Colour Encoding Network ( $e$ ): Maps an input colored lesion image $y$ to a low-bitrate colour embedding $e$ . This embedding is constrained to be just sufficient for reconstruction, forcing it to capture global SCCI information while discarding geometric details.
Colour Synthesis Network ( $f$ ): Reconstructs an image $\hat{y}$ $\overset{y}{^}$ using the colour embedding $e$ $e$ and a colourless image $x$ $x$ (which preserves geometry but suppresses color).
- Functionality: By swapping $x$ between images, the model performs color transfer. By editing entries in $e$ , it enables controlled color manipulation.

B. Key Technical Innovations

To address specific challenges in skin image analysis, the authors introduced two critical components:

Randomized, Mostly Monotonic Decolourization:
- Problem: Standard linear greyscale conversion leaks luminance cues correlated with skin darkness, causing the model to learn "shading" rather than true "skin tone."
- Solution: Instead of a fixed conversion, the authors generate the colourless input $x$ using a randomized pixel-wise mapping $g_\alpha$ .
- Mechanism: The mapping is a linear combination of monotonic quadratic terms with fixed endpoints ( $g(0)=0, g(1)=1$ ). The parameters $\alpha$ are resampled for every image in every epoch. This ensures the model cannot rely on simple luminance shortcuts to reconstruct the image, forcing it to learn the true underlying color distribution.
Geometry-Aligned Post-Processing:
- Problem: Global color manipulation can inadvertently alter localized patterns (e.g., ink marks, scars, or lesions) that should remain unchanged.
- Solution: A rejection operator $P$ is applied after synthesis.
- Mechanism: The operator calculates the reconstruction error ( $d_{ij}$ ) between the original image and the synthesized output. If the error is high (indicating a region was hard to reconstruct, likely a localized feature), the post-processing step rejects the color change for that pixel, retaining the original pixel value. This ensures structural fidelity.

C. Downstream Strategies

The framework supports two modes for improving downstream tasks:

Data Augmentation: Samples colour embeddings from a target distribution (e.g., a specific demographic) to generate diverse training examples. This is useful for physician education and training models on underrepresented populations.
Colour Normalization: Maps all images (training and test) to a "standard" colour embedding (the mean of the training set). This reduces variability due to device/camera differences, allowing models trained on one dataset to generalize better to others.

3. Key Contributions

Unsupervised Counterfactual Synthesis: A model that learns a structured, manipulable latent space for SCCI without needing labeled skin tone data, enabling the visualization of skin conditions under different physical conditions.
Robust Disentanglement: The introduction of randomized decolourization prevents information leakage, ensuring the model learns true color features rather than luminance artifacts.
Faithful Preservation: The geometry-aligned post-processing step successfully suppresses unintended color shifts on localized features (scars, ink), a common failure point in global color transfer.
Physical Interpretability: The latent space contains trajectories corresponding to physical properties (e.g., blood perfusion, camera white balance), allowing for controlled traversal and educational visualization.

4. Results

The authors evaluated the framework on lesion malignancy classification tasks using datasets like DermNet NZ, ISIC-2018, and ISIC-2020.

Qualitative Results:
- Color Transfer: The model successfully transferred SCCI between images without altering lesion structure or skin geometry (Fig. 1a).
- Latent Manipulation: Individual latent dimensions corresponded to semi-interpretable physical factors. Traversing these dimensions allowed for continuous changes in blood perfusion or white balance (Fig. 2).
- Ablation: Removing post-processing caused significant degradation in preserving non-skin features (scars/ink) and lowered classification accuracy. Using naive greyscale conversion instead of randomized mapping resulted in poor color transfer (skin darkness matched the target structure rather than the source color).
Quantitative Results (Lesion Classification):
- Baseline: Standard training achieved 56.1% accuracy.
- Prior Art (Style Transfer Augmentation): 76.1% accuracy.
- Proposed Augmentation: 77.2% accuracy (highest performance).
- Proposed Normalization: 76.4% accuracy.
- Ablation Impact: Removing post-processing in normalization dropped accuracy to 72.7%, and using naive decolourization dropped it to 72.2%.

5. Significance

This work addresses health equity in AI-driven dermatology. By providing a method to generate diverse, realistic training data and normalize images across different clinical settings, the framework helps mitigate the performance gaps often seen between light and dark skin tones.

Educational Value: It provides a tool for visualizing how the same lesion appears under varying skin tones and lighting, aiding in the training of future physicians.
Clinical Deployment: It offers a pathway to create robust models that generalize across diverse populations and device types, moving toward more reliable and equitable automated diagnosis.