Multi-criterion uncertainty estimation improves skin cancer distribution shift detection and malignancy prediction

This paper introduces Supervised Autoencoders for Generalization Estimates (SAGE), a multi-criterion uncertainty estimation method that effectively detects distribution shifts in skin lesion images across diverse global datasets, thereby improving the reliability of malignancy prediction models by filtering out problematic artifacts before clinical deployment.

Schreyer, W. M., Samathan, R., Berry, E., Thompson, R. F.

Published 2026-02-27
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a master chef who has spent years perfecting a recipe for a specific type of soup using only the freshest, most perfectly uniform vegetables from a single, high-end farm (let's call this the HAM10000 Farm). You are so good at this that you can predict exactly how the soup will taste with 99% accuracy.

Now, imagine you want to open a soup kitchen for the whole world. But instead of your perfect farm, you start getting vegetables from:

  • A dusty roadside stand in Argentina.
  • A rainy garden in Brazil.
  • A chaotic street market in the US.

The vegetables are real, but they are different sizes, covered in dirt, sometimes wrapped in plastic, and some are even different types of vegetables entirely (like a pumpkin instead of a carrot). If you try to cook your "perfect farm" soup with these new ingredients, the result might be a disaster. The soup could taste weird, or worse, you might accidentally serve something toxic because you didn't realize the ingredients were different.

This is exactly the problem doctors and AI face with skin cancer detection.

The Problem: The "Perfect Farm" Trap

For years, AI models have been trained on "perfect" datasets of skin images (like the HAM10000 dataset). These images are clear, well-lit, and taken with special medical cameras (dermoscopes). The AI learns to spot cancer on these perfect images very well.

But in the real world, photos are messy. They are taken with regular smartphones, in bad lighting, with hair covering the spot, or with rulers and markers in the picture. When the AI sees these "messy" photos, it often gets confused. It might think a harmless mole is cancer, or miss a real cancer because the photo looked "weird" to the computer. This is called Distribution Shift—the data the AI sees in the real world doesn't match the data it studied in school.

The Solution: The "SAGE" Quality Inspector

The authors of this paper created a new tool called SAGE (Supervised Autoencoders for Generalization Estimates). Think of SAGE not as a doctor who diagnoses the disease, but as a strict quality control inspector standing at the door of the hospital.

Here is how SAGE works, using a simple analogy:

  1. The Three-Point Check: When a new photo arrives, SAGE doesn't just look at the picture; it runs three quick tests to see if the photo "feels" like the photos the AI studied in school:

    • The "Shape" Test (Reconstruction): SAGE tries to redraw the image from memory. If it can't redraw it well, the image is weird (maybe it's blurry or has a ruler in it).
    • The "Neighbor" Test (Distance): SAGE checks if this image is hanging out with its "friends" (the training data). If the image is standing alone in a corner of the room, it's an outsider.
    • The "Confidence" Test: SAGE asks the AI, "Are you sure about this?" If the AI is shaking in its boots, SAGE flags it.
  2. The "SAGE Score": SAGE combines these three tests into a single score.

    • Low Score: "This photo looks just like the ones we studied. It's safe to let the AI diagnose it."
    • High Score: "Whoa! This photo has a ruler in it, the lighting is weird, or it's a type of lesion we've never seen. Stop! Do not let the AI make a diagnosis on this one."

What They Found

The researchers tested this system on photos from five different countries (Argentina, Brazil, Austria, Turkey, and the US). Here is what they discovered:

  • The "Messy" Photos: Photos taken with regular smartphones often had high SAGE scores because of things like camera flashes, hair, or rulers. The AI was much less reliable on these.
  • The "Dark Skin" Gap: The AI struggled more with darker skin tones, partly because the training data didn't have enough dark skin, and partly because the lighting on dark skin often creates "weird" artifacts that confuse the AI. SAGE successfully flagged these difficult images.
  • The "New Disease" Problem: The AI was terrible at spotting rare skin cancers it had never seen before (like T-cell lymphoma). Interestingly, the AI was too confident about these new diseases, thinking they were common ones. SAGE, however, correctly flagged them as "Out of Distribution" (strangers) because they didn't look like the training data.

The Result: A Safer Kitchen

By using SAGE to filter out the "bad" photos before the AI makes a diagnosis, the researchers showed that the AI became more accurate.

  • Before SAGE: The AI tried to diagnose everything, including the messy photos, and made mistakes.
  • After SAGE: SAGE threw out the confusing photos. The AI only looked at the "good" photos, and its accuracy went up.

Why This Matters

This paper is like giving the AI a pair of glasses that helps it realize when it is out of its depth.

Instead of blindly trusting an AI to diagnose skin cancer from a random smartphone photo, SAGE acts as a safety net. It tells doctors: "Hey, this photo is too weird for the AI to handle safely. Please, a human doctor should look at this one."

This is crucial for health equity. It ensures that the AI doesn't accidentally fail patients with darker skin or those in rural areas who use smartphones, by catching the moments when the AI is likely to be wrong and handing the job back to a human expert.

In short: SAGE is the bouncer at the club who checks the ID. If the photo doesn't match the "training club" rules, SAGE says, "No entry for the AI," preventing a medical disaster.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →