Conformal Prediction for Long-Tailed Classification

This paper proposes new conformal prediction methods, including a prevalence-adjusted softmax score and an interpolation procedure, to effectively balance prediction set size and class-conditional coverage for long-tailed classification tasks where rare classes are often omitted.

Tiffany Ding, Jean-Baptiste Fermanian, Joseph Salmon

Published 2026-03-02
📖 4 min read☕ Coffee break read

Imagine you are an amateur botanist trying to identify a strange plant you found in your backyard. You snap a photo and upload it to an AI app.

The Problem:
The AI doesn't just guess one plant; it gives you a list of possibilities (a "prediction set"). This is helpful because the AI might be unsure.

  • If the list has 1,000 plants, it's useless. You'd spend all day checking them.
  • If the list has 1 plant, it's risky. If the AI is wrong, you miss the answer entirely.

Now, imagine the world of plants is long-tailed. This means there are a few super-common plants (like Dandelions) that the AI sees millions of times, and thousands of rare, endangered plants that the AI has only seen a handful of times.

The Current Dilemma:
Existing AI tools force you to choose between two bad options:

  1. The "Safe but Useless" List: The AI guarantees it won't miss the rare plants, but to do so, it dumps every single plant in the world into your list. You can't use it.
  2. The "Small but Dangerous" List: The AI gives you a short list of 2 or 3 plants. It's easy to check, but it almost always misses the rare, endangered species because it's never seen them before.

The Solution: A New Way to Balance the Scale
The authors of this paper propose two clever tricks to get the best of both worlds: a short, manageable list that still catches the rare plants.

Trick #1: The "Popularity Discount" (Prevalence-Adjusted Softmax)

Imagine the AI is a judge at a talent show.

  • Old Way: The judge gives a high score to a famous pop star (common plant) and a low score to an unknown indie band (rare plant). The judge's list is dominated by the pop stars.
  • New Way (PAS): The judge realizes, "Wait, I've seen the pop star a million times, so I'm not impressed. But I've barely seen this indie band, so if they show up, they must be special!"
  • The Metaphor: The AI applies a "Popularity Discount." It lowers the score of common plants (because they are easy to guess) and boosts the score of rare plants (because they are hard to guess).
  • The Result: When the AI makes its list, it doesn't just pick the "most likely" plants; it picks the plants that are "surprisingly likely" given how rare they are. This keeps the list short but ensures the rare plants aren't ignored.

Trick #2: The "Dimmer Switch" (INTERP-Q)

Imagine you have two lights:

  • Light A (Standard): Very bright, but only shines on the common plants.
  • Light B (Classwise): A floodlight that shines on everything, including the rare plants, but it's so bright it blinds you with too many options.

The New Method:
Instead of choosing one light, the authors built a dimmer switch.

  • You can slide the switch to mix the two lights.
  • Slide it a little toward the floodlight? You get a slightly longer list, but now the rare plants are visible.
  • Slide it back? The list gets shorter again.
  • The Benefit: You (the user) get to decide exactly how much "rare plant safety" you want versus how "short" you want your list to be. It's a smooth dial, not a binary on/off switch.

Why Does This Matter?

This isn't just about plants. It applies to:

  • Medicine: Finding a rare, aggressive cancer (the "rare plant") is more important than classifying a common cold (the "dandelion"). We don't want the AI to ignore the cancer just because it's rare.
  • AI Safety: If we ignore rare classes in AI training, the AI eventually "forgets" them and gets worse over time (a phenomenon called "model collapse").

The Bottom Line

The paper teaches us how to build AI that doesn't just play it safe with the common stuff. By adjusting how the AI "sees" rarity, we can create prediction lists that are short enough to be useful but inclusive enough to catch the rare, important things we care about most. It's like having a flashlight that is bright enough to see the rare gems in the dark, without blinding you with the whole room.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →