Semi-Supervised Conformal Prediction With Unlabeled Nonconformity Score

Imagine you are a weather forecaster. You want to tell people, "There is a 90% chance of rain." But here's the catch: you only have data from 10 days in your history book to make this prediction.

Because your data is so scarce, your forecast might be wildly unstable. One day you might say "90% chance," and the next, you might accidentally say "100% chance" or "50% chance," even though the real weather hasn't changed. You are flying blind, and your confidence intervals (your prediction sets) are either too wide (useless) or too narrow (dangerous).

This is the problem the paper "Semi-Supervised Conformal Prediction" solves.

The Problem: The "Empty Calibration Room"

In machine learning, there's a technique called Conformal Prediction. Think of it as a "quality control inspector" for AI. Before the AI makes a final guess, the inspector checks a "calibration room" filled with examples where the answers are known.

The Goal: The inspector needs to find a "threshold" (a cutoff score) to decide how many possibilities to list as the answer.
The Issue: In the real world, we often have tons of unlabeled data (photos without tags) but very few labeled data (photos with tags). If the inspector only has 20 labeled photos to calibrate the system, the results are shaky. The "coverage" (how often the AI is right) bounces around unpredictably.

The Solution: The "Semi-Supervised" Trick

The authors propose a new method called SemiCP. Instead of leaving the unlabeled data in the corner, they bring it into the calibration room.

But there's a problem: The unlabeled data doesn't have the "true answer" (the label) needed to calculate the score. It's like trying to grade a test where the answer key is missing.

The Magic Ingredient: Nearest Neighbor Matching (NNM)

This is where the paper's secret sauce comes in. They invent a clever way to estimate the score for the unlabeled data without knowing the true answer.

The Analogy: The "Look-Alike" Strategy

Imagine you are trying to guess how difficult a new, unlabeled math problem is.

The Naive Approach: You just guess based on what you think the answer is. This is usually wrong because you tend to overestimate your own confidence.
The SemiCP Approach (NNM):
- You look at the new problem and say, "This looks a lot like Problem #42 from my old textbook."
- You check Problem #42. You know the real answer to #42, and you know what the AI thought the answer was.
- You calculate the "bias" (the error) of the AI on Problem #42.
- You apply that same error correction to the new problem.

In the paper, they call this Nearest Neighbor Matching. They find the labeled example that looks most similar to the unlabeled one (based on how the AI "feels" about the answer) and borrow its error history to correct the new one.

Why This is a Game Changer

By using this "Look-Alike" strategy, the system can effectively treat thousands of unlabeled examples as if they were labeled.

Stability: The "calibration room" is now huge. The inspector isn't guessing based on 20 examples anymore; they are using 20 labeled + 4,000 unlabeled examples. The results stop bouncing around.
Efficiency: Because the system is more confident and stable, it doesn't need to list 50 possible answers to be safe. It can narrow it down to just 2 or 3, making the AI much more useful.

The Results

The authors tested this on famous image datasets (like identifying animals in photos).

Before: With only 20 labeled examples, the AI's confidence was all over the place. Sometimes it was too sure, sometimes too unsure.
After (SemiCP): By adding 4,000 unlabeled photos and using the "Look-Alike" trick, the AI's confidence became rock-solid. They reduced the error in their confidence levels by 77%.

Summary

Think of SemiCP as a way to teach a student to take a test using a "cheat sheet" made of similar past exams. Even though the student hasn't seen the answers to the new questions, they can look at similar old questions, see where they made mistakes, and adjust their answers accordingly.

This allows AI to be safer, more reliable, and more efficient, even when we don't have enough labeled data to train it perfectly. It turns a "guessing game" into a "calculated prediction."

Here is a detailed technical summary of the paper "Semi-Supervised Conformal Prediction With Unlabeled Nonconformity Score" (SemiCP).

1. Problem Statement

Conformal Prediction (CP) is a statistical framework that generates prediction sets with guaranteed coverage (i.e., the probability that the true label is in the set is at least $1-\alpha$). However, standard Split Conformal Prediction relies heavily on a labeled calibration set to estimate the nonconformity score threshold.

The Core Issue: In real-world scenarios, labeled data is often scarce. When the calibration set size ( $n$ $n$ ) is small, the estimated threshold becomes highly unstable. This leads to:
- High Variance in Coverage: The actual coverage fluctuates significantly across different runs, often deviating from the target $1-\alpha$.
- Inefficiency: To compensate for uncertainty, the prediction sets often become unnecessarily large (over-coverage), reducing their utility.
Limitations of Existing Solutions: Previous attempts to fix this (e.g., interpolating calibration instances, few-shot meta-learning) are often heuristic, lack finite-sample guarantees, or require specific task structures. Crucially, no existing method effectively leverages abundant unlabeled data to stabilize the calibration process in a semi-supervised setting.

2. Methodology: SemiCP and NNM

The authors propose SemiCP, a semi-supervised framework that utilizes both labeled and unlabeled data for calibration. The core innovation is a new nonconformity score for unlabeled data called Nearest Neighbor Matching (NNM).

A. The SemiCP Framework

Instead of using only $n$ labeled samples to calculate the threshold, SemiCP aggregates scores from $n$ labeled samples and $N$ unlabeled samples. The threshold $\hat{\tau}$ is computed as the quantile of the combined set of scores:
$\hat{\tau}_{SemiCP} = \text{Quantile}\left( \{s_i\}_{i=1}^n \cup \{\tilde{s}_j\}_{j=1}^N, \frac{\lceil(n+N+1)(1-\alpha)\rceil}{n+N} \right)$
The challenge lies in computing the nonconformity score $\tilde{s}$ for unlabeled data $\tilde{x}$ without ground-truth labels.

B. Nearest Neighbor Matching (NNM) Score

Directly using the model's predicted pseudo-label ( $\hat{y}$ ) to compute the score (the "Naive" method) introduces a systematic bias because the model is most confident in its own prediction, artificially lowering the score.

To correct this, NNM estimates the pseudo-bias locally:

Pseudo-Labeling: Compute the pseudo-label $\hat{y}$ for an unlabeled sample $\tilde{x}$ using a pre-trained classifier.
Naive Score: Calculate the score using the pseudo-label: $S(\tilde{x}, \hat{y})$ .
Nearest Neighbor Search: Find a labeled sample $x_j$ in the calibration set whose pseudo-score $S(x_j, \hat{y}_j)$ is closest to the unlabeled sample's pseudo-score $S(\tilde{x}, \hat{y})$ .
$j = \arg\min_{k} |S(\tilde{x}, \hat{y}) - S(x_k, \hat{y}_k)|$
Bias Correction: Estimate the bias of the unlabeled sample based on the observed bias of the matched labeled sample:
$\Delta(\tilde{x}) \approx S(x_j, y_j) - S(x_j, \hat{y}_j)$
Final NNM Score:
$\tilde{S}_{NNM}(\tilde{x}) = S(\tilde{x}, \hat{y}) + [S(x_j, y_j) - S(x_j, \hat{y}_j)]$

This approach aligns the distribution of unlabeled scores with the true score distribution by leveraging local bias patterns observed in the labeled data.

3. Key Contributions

Novel Paradigm (SemiCP): The first framework to integrate unlabeled data into the calibration phase of Conformal Prediction to address data scarcity.
NNM Score Function: A training-free, non-parametric method to estimate nonconformity scores for unlabeled data that corrects for pseudo-label bias via nearest-neighbor matching in the score space.
Theoretical Guarantees:
- Proved that the Average Coverage Gap (the deviation between empirical and target coverage) decreases at a rate of $O(1/\sqrt{N})$ as the number of unlabeled samples ( $N$ ) increases.
- Demonstrated that with a well-designed NNM score, the bias term becomes negligible, ensuring valid marginal coverage guarantees.
Versatility: The method is compatible with various existing scoring functions (THR, APS, RAPS), conditional settings (group/class-conditional), and other CP enhancements (Interpolation, ClusterCP).

4. Experimental Results

The authors evaluated SemiCP on CIFAR-10, CIFAR-100, and ImageNet using various architectures (ResNet, ViT, etc.).

Stability Improvement: Under extreme data scarcity (e.g., only 20 labeled examples and 4,000 unlabeled examples), SemiCP reduced the Average Coverage Gap by up to 77% compared to standard Split CP.
Efficiency: The method significantly reduced the average prediction set size (e.g., by 5.7% on CIFAR-10 with 20 labels) while maintaining valid coverage, moving performance closer to the "Oracle" (which assumes access to all true labels).
Robustness:
- Model Agnostic: Consistent improvements across 10 different model architectures.
- Conditional Settings: Effective in group-conditional and class-conditional scenarios where labeled data per group is extremely sparse.
- Distribution Shift: Demonstrated robustness when labeled data comes from a shifted domain (e.g., ImageNet-R) while unlabeled data matches the target distribution.
Comparison: SemiCP outperformed baselines like "Naive" pseudo-labeling, global debiasing, and random matching, confirming the necessity of the local nearest-neighbor strategy.

5. Significance

This work bridges a critical gap in uncertainty quantification for the era of large-scale, data-hungry models where labeled data is the bottleneck.

Practical Impact: It allows practitioners to deploy reliable, uncertainty-aware AI systems in high-stakes domains (medical, finance) even when labeled calibration data is extremely limited.
Theoretical Insight: It provides a rigorous proof that unlabeled data can be used not just for training, but for calibrating uncertainty, stabilizing the statistical guarantees of CP.
Scalability: The method is computationally efficient (logarithmic overhead relative to the number of unlabeled samples) and requires no additional model training, making it a plug-and-play solution for existing CP pipelines.

In summary, SemiCP transforms the limitation of scarce labels into an opportunity by leveraging the abundance of unlabeled data to stabilize uncertainty quantification, offering a principled, robust, and highly effective solution for semi-supervised conformal prediction.