Learning Credal Ensembles via Distributionally Robust Optimization

This paper introduces CreDRO, a distributionally robust optimization framework that learns credal ensembles by defining epistemic uncertainty as disagreement among models trained under varying relaxations of the i.i.d. assumption, thereby capturing meaningful distribution shifts and outperforming existing methods in out-of-distribution detection and selective classification.

Kaizheng Wang, Ghifari Adam Faza, Fabio Cuzzolin, Siu Lun Chau, David Moens, Hans Hallez

Published 2026-02-27
📖 5 min read🧠 Deep dive

The Big Picture: Why Do We Need This?

Imagine you are a doctor looking at an X-ray. You are 99% sure the patient has a broken bone. But then, you realize the X-ray machine is old and blurry, and you've never seen this specific type of fracture before.

In the world of Artificial Intelligence (AI), this is called Uncertainty. There are two types:

  1. Noise (Aleatoric Uncertainty): The X-ray is just blurry. Even a perfect doctor couldn't be 100% sure. This is unavoidable.
  2. Ignorance (Epistemic Uncertainty): The doctor doesn't know because they haven't seen this specific case before. This is the dangerous kind. If the AI is confident but wrong, it could make a life-threatening mistake.

Current AI methods are good at measuring the "blurry" noise, but they are terrible at measuring "ignorance." They often think they are smart just because they trained on a lot of data, even if that data doesn't match the real world.

The Problem with Current Methods: The "Dice Roll" Approach

Most state-of-the-art AI models try to measure ignorance by training the same model multiple times with slightly different random starting points (like rolling dice to decide where to start).

The Analogy: Imagine you are trying to guess the weather in a new city.

  • Current Method: You ask 10 friends to guess the weather. But, you tell them to close their eyes and spin around before looking out the window.
  • The Result: Your friends give different answers. You say, "Wow, there is high uncertainty!"
  • The Flaw: The uncertainty isn't because the weather is weird; it's because your friends were dizzy from spinning! The AI is measuring its own confusion about how to start training, not its confusion about the real world.

The Solution: CreDRO (The "Stress-Test" Approach)

The authors propose a new method called CreDRO. Instead of spinning their friends around, they put them in different, slightly stressful environments to see how they react.

The Analogy:
Imagine you are training a team of pilots.

  • Old Way: You have them fly the same route 10 times, but you change the wind direction randomly each time just to see how they handle it.
  • CreDRO Way: You tell Pilot A, "Fly assuming the wind is calm." You tell Pilot B, "Fly assuming the wind is a light breeze." You tell Pilot C, "Fly assuming a hurricane is coming."
  • The Result: If all pilots agree the plane is safe, you are confident. But if Pilot A says "Safe" and Pilot C says "Crash imminent," you know there is a real problem. You don't know which scenario is true, so you admit you are uncertain.

How CreDRO Works (The Technical Magic)

The paper uses a technique called Distributionally Robust Optimization (DRO).

  1. The "What If" Game: During training, the AI doesn't just look at the data as it is. It asks, "What if the data I'm seeing is slightly different from the data I'll see in the real world?"
  2. The Stress Test: It creates a "worst-case scenario" for the training data. It forces the AI to learn to handle data that is slightly "off" or "shifted."
  3. The Ensemble: It trains a group of models (an ensemble). Each model is trained to handle a different level of "off-ness."
    • Model 1: Handles data that is 5% different.
    • Model 2: Handles data that is 10% different.
    • Model 3: Handles data that is 20% different.
  4. The "Box" of Answers: When the AI makes a prediction, instead of giving one single number (e.g., "80% chance of rain"), it gives a range (e.g., "Between 40% and 90% chance").
    • If the range is small (40% to 42%), the AI is confident.
    • If the range is huge (10% to 90%), the AI is admitting, "I don't know what's going on here."

Why Is This Better?

The paper tested CreDRO against the best existing methods in two main areas:

  1. Spotting the "Weird" Stuff (Out-of-Distribution Detection):

    • Scenario: You train a model on pictures of cats and dogs. Then you show it a picture of a toaster.
    • Old AI: Might confidently say, "That's a very strange cat!"
    • CreDRO: Says, "I have no idea what that is. My confidence range is huge. Please don't trust me."
    • Result: CreDRO was much better at spotting that the input was weird and didn't belong in its training data.
  2. Medical Safety (Selective Classification):

    • Scenario: A doctor uses AI to diagnose cancer.
    • CreDRO: If the AI is unsure, it can say, "I reject this case. Please have a human look at it."
    • Result: By letting the AI "reject" the hard cases, the overall accuracy of the system went up because the AI only made predictions when it was actually sure.

The Takeaway

CreDRO is like a safety net for AI. Instead of asking, "How confused are you because you started training differently?" it asks, "How confused are you because the real world might be different from your training data?"

By training the AI to expect the unexpected, it learns to admit when it doesn't know the answer. This makes AI safer, more reliable, and much more trustworthy for critical jobs like medicine, self-driving cars, and finance.

In short: CreDRO stops the AI from bluffing. If it's unsure, it raises its hand and says, "I need help," rather than guessing and hoping for the best.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →