COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics

The paper introduces COMPASS, a robust framework that generates efficient and valid conformal prediction intervals for medical segmentation metrics by calibrating directly in the model's feature space rather than treating the segmentation-to-metric pipeline as a black box.

Matt Y. Cheung, Ashok Veeraraghavan, Guha Balakrishnan

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are a doctor looking at an X-ray or a microscope slide. You use a smart computer program (an AI) to draw a line around a tumor or a specific organ. This is called segmentation.

But here's the thing: Doctors don't just care about the line itself. They care about the numbers that come from that line.

  • "How big is this tumor?" (Area/Volume)
  • "Is it growing?"

If the AI draws the line slightly too wide, the calculated size might be 10% too big. If it's too narrow, the size is too small. In medicine, getting that number right is critical for deciding on surgery or medication.

The problem is: How much can we trust that number?

The Old Way: Guessing in the Dark

Traditionally, if you want to know how uncertain an AI is, you treat the whole process like a "black box." You feed an image in, get a size out, and say, "I'm 95% sure the size is between 10 and 20 millimeters."

But because the AI is so complex, this "black box" guess is often very lazy. To be safe, the computer makes the range huge (e.g., "It's between 5 and 50 millimeters"). It's technically correct (it covers the truth), but it's useless. A doctor can't make a decision based on a range that wide.

The New Solution: COMPASS

The paper introduces COMPASS (Conformal Metric Perturbation Along Sensitive Subspaces). Think of COMPASS as a smart, surgical navigator that understands how the AI thinks, rather than just guessing the result.

Here is how it works, using a simple analogy:

1. The "Knob" Analogy

Imagine the AI is a giant radio with thousands of tiny knobs (these are the "features" inside the computer's brain).

  • The Old Way: You just turn the volume up and down randomly to see how the sound changes. It's messy and inefficient.
  • The COMPASS Way: COMPASS knows exactly which one specific knob controls the "size" of the tumor. It doesn't touch the other 999 knobs. It gently turns only that specific knob to see how much the size changes.

2. Finding the "Sensitive" Direction

COMPASS looks at the AI's internal brain and asks: "If I wiggle the image slightly, which part of the AI's thinking changes the tumor size the most?"

It finds a "sensitive direction" (a specific combination of knobs). It then says:

"Okay, if I wiggle this specific direction by a tiny bit, the size changes by 1mm. If I wiggle it a medium bit, it changes by 5mm. If I wiggle it a lot, it changes by 20mm."

Because it understands the mechanism (the knobs), it can calculate a tight, precise range (e.g., "The size is definitely between 12 and 14mm") instead of a huge, useless guess.

3. The "Safety Net" (Conformal Prediction)

The paper uses a statistical trick called Conformal Prediction. Think of this as a safety net.

  • The computer tests itself on a bunch of known examples first (calibration).
  • It learns: "When I wiggle the knobs this much, I'm usually right 95% of the time."
  • When a new patient comes in, it applies that exact amount of wiggle.
  • The Result: It guarantees that the true size is inside the range, but because it used the "smart knob" method, the range is much smaller and more helpful than the old "black box" method.

Why is this a big deal?

  1. It's Tighter: The paper shows that COMPASS gives ranges that are much narrower (more precise) than previous methods, while still being 100% statistically safe.
  2. It Handles "Drift": Sometimes, the data changes (e.g., a new hospital uses a different camera). Old methods break or give bad guesses. COMPASS can adjust its "safety net" to account for these changes, keeping the doctor safe even when the data looks different.
  3. It's Practical: It doesn't require rebuilding the AI. It just adds a smart layer on top that understands the AI's internal "feelings" about the image.

Summary

COMPASS is like giving a doctor a precision ruler instead of a fuzzy tape measure.

Instead of saying, "The tumor is somewhere between the size of a grape and a watermelon," COMPASS says, "The tumor is between the size of a grape and a cherry, and I am mathematically guaranteed to be right." This allows doctors to make life-saving decisions with much more confidence.