Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting

This paper proposes a robust framework for conformal prediction under label corruption by analyzing the resilience of privileged conformal prediction to weight estimation errors and introducing a novel uncertain imputation method, both of which can be integrated into a triply robust system to ensure valid uncertainty quantification.

Shai Feldman, Stephen Bates, Yaniv Romano

Published 2026-02-27
📖 6 min read🧠 Deep dive

The Big Picture: Predicting the Future with Broken Data

Imagine you are a weather forecaster. Your job is to predict tomorrow's weather and give people a "confidence range" (e.g., "It will be between 60°F and 70°F"). You want to be right 90% of the time.

Usually, you learn from past data. But in this paper, the authors are dealing with a messy situation: The past data is corrupted. Some of the historical weather records are missing, and some have the wrong temperature written down.

If you try to make predictions using this broken data, your confidence ranges will be wrong. You might say "60–70°F" when it's actually going to be a blizzard at 20°F. This is dangerous in high-stakes fields like medicine or finance.

The authors propose three new ways to fix this broken data so your predictions remain reliable.


The Problem: The "Missing Ingredient" Dilemma

To understand their solution, we need to introduce a special ingredient called Privileged Information (PI).

  • The Scenario: Imagine you are training a doctor to diagnose a disease.
  • The Data: You have patient records (symptoms, age, etc.) and the diagnosis.
  • The Corruption: Some records are missing the diagnosis.
  • The Privileged Information (PI): During training, the doctor had access to a secret, high-tech MRI scan that perfectly predicted the disease. But at the time of testing (when a new patient comes in), that MRI scan is unavailable (maybe it's too expensive or the patient forgot to bring it).

The challenge is: How do you use that secret MRI scan to fix the missing diagnoses in your training data, even though you can't see the MRI scan for new patients?


Solution 1: The "Weighted Scale" (Privileged Conformal Prediction - PCP)

The Analogy: Imagine you are weighing apples to estimate the average weight of a bushel. But, some apples are rotten (corrupted labels). You also know that the rotten apples came mostly from a specific tree (the Privileged Information).

How it works:
The authors suggest using a weighted scale. You give "more weight" to the healthy apples and "less weight" to the rotten ones to balance the scale.

  • The Catch: To do this perfectly, you need to know exactly how likely an apple is to be rotten based on which tree it came from.
  • The Paper's Discovery: The authors found that even if you guess the weights wrong (your scale is slightly off), the prediction might still be okay! It's like if you guess the apple is 10% rotten when it's actually 15% rotten; your final average might still be close enough to be safe.
  • The Limitation: If your guess is too wrong, the scale tips, and your prediction fails.

Solution 2: The "Guess-and-Check" with a Safety Net (Uncertain Imputation - UI)

The Analogy: Instead of trying to weigh the rotten apples, imagine you just replace the rotten apples with a "fake" apple that looks like a healthy one, but you add a giant, fuzzy cloud around it to represent uncertainty.

How it works:

  1. The Guess: You use the Privileged Information (the secret MRI) to guess what the missing diagnosis should have been.
  2. The Safety Net: You don't just write down the guess. You say, "I think the diagnosis is X, but I'm not 100% sure." So, you add a "cloud of error" around X. This cloud represents all the possible things the diagnosis could be.
  3. The Result: When you make your final prediction, you include this whole cloud. Because you made the cloud big enough to cover all possibilities, your prediction is guaranteed to be correct, even if your initial guess was slightly off.

Why it's cool: This method doesn't need to know the exact "weights" of the corruption. It just needs to be good at guessing the answer using the secret info, and then admitting it might be wrong by adding a safety buffer.

Solution 3: The "Triple-Redundancy" System (Triply Robust)

The Analogy: Imagine you are building a bridge. You want it to be safe even if one of your support beams breaks.

  • Beam 1: A standard prediction (works if the data is perfect).
  • Beam 2: The "Weighted Scale" (works if you can estimate the corruption rates).
  • Beam 3: The "Guess-and-Check with Cloud" (works if you can guess the answer using the secret info).

How it works:
The authors combine all three methods into one super-method called TriplyRobust.

  • They take the prediction from Beam 1, Beam 2, and Beam 3.
  • They combine them into one giant prediction set (the union of all three).
  • The Magic: As long as at least one of these three methods is doing its job correctly, the final result is guaranteed to be safe. It's like having three different navigators; even if two get lost, as long as one knows the way, you arrive safely.

Why This Matters

In the real world, data is rarely perfect.

  • Medical records often have missing diagnoses.
  • Financial data might have errors or hidden biases.
  • AI training often relies on data that was labeled by humans who made mistakes.

This paper gives us a toolkit to build AI that says, "I'm not 100% sure because the data is messy, but here is a range that I promise (with 90% certainty) contains the truth."

Summary of the "Magic"

  1. PCP: Tries to fix the data by weighing the good parts more than the bad parts. It's surprisingly sturdy even if the weights aren't perfect.
  2. UI: Replaces bad data with a "best guess" but wraps it in a giant safety blanket of uncertainty.
  3. TriplyRobust: Combines everything. If any one of the three strategies works, the whole system works.

It's like having a backup plan for your backup plan, ensuring that even when the data is broken, your AI doesn't crash—it just gets a little wider and safer.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →