Inference conditional on selection: a review

This paper reviews selective inference techniques that provide valid conditional guarantees for statistical questions derived from data, such as identifying winners or clusters, by addressing the limitations of classical methods in modern scientific workflows and demonstrating their application through simulations and single-cell RNA sequencing data.

Anna Neufeld, Ronan Perry, Daniela Witten

Published 2026-04-14
📖 6 min read🧠 Deep dive

The Big Problem: The "Double-Dipping" Trap

Imagine you are a detective trying to solve a crime. You have a room full of suspects (data).

  1. The Old Way (Classical Statistics): You pick a suspect before you look at the evidence, based on a hunch, and then you run a test to see if they are guilty. This is fair.
  2. The Modern Problem (Double Dipping): In modern science, we often look at the whole room of suspects first, find the one who looks the most suspicious (the "winner"), and then run a test to see if they are guilty.

The Catch: If you pick the most suspicious person just because they look the most suspicious, you are almost guaranteed to be wrong about how "guilty" they actually are. You've used the same evidence to pick the suspect and to judge them. This is called double dipping.

In statistics, this leads to the "Winner's Curse." If you pick the candidate with the highest test score, that score is likely inflated by luck. If you then calculate a confidence interval (a range of where the true score likely is) using standard math, that range will be too narrow. You will be overconfident, and your "95% sure" claim will actually only be right 50% of the time.

The Solution: Conditional Inference

The paper argues that we need a new way of thinking. Instead of asking, "Is this person guilty?" we should ask, "Given that we picked this specific person as the most suspicious, are they actually guilty?"

This is called Conditional Inference. It's like saying, "Okay, we already looked at the whole room and picked the guy with the messy hair. Now, let's re-evaluate his guilt only considering the fact that we picked him for having messy hair."

The paper reviews several "recipes" to fix this double-dipping problem without throwing away the data.


The Four Recipes for Fixing Double Dipping

The authors compare four main ways to solve this. Think of them as different ways to manage a team of investigators.

1. Full Conditional Selective Inference (The "Strict Judge")

  • How it works: You use all the data to pick the suspect. Then, you act like a strict judge who says, "I know you picked him because he had the messiest hair. I will now calculate his guilt only looking at the specific scenario where he had the messiest hair."
  • The Good: You use every single piece of evidence. You don't throw anything away.
  • The Bad: It's incredibly hard to do the math. Sometimes, if the "messy hair" wasn't that much messier than the others, the math gets so complicated that the answer becomes "We have no idea" (an infinitely wide confidence interval). It's like a judge saying, "Because the evidence is so ambiguous, I can't give you a verdict."

2. Sample Splitting (The "Two-Team Approach")

  • How it works: You split your team of investigators into two groups: Team A and Team B.
    • Team A looks at the suspects and picks the winner.
    • Team B (who has never seen the suspects before) tests the winner.
  • The Good: It's easy. Team B has no bias because they didn't help pick the suspect.
  • The Bad: You threw away half your data. Team A's findings are discarded after the selection. If Team B doesn't have enough information, they might give up and say, "I can't tell."

3. Data Thinning (The "Magic Filter")

  • How it works: Instead of cutting the team in half, you use a "magic filter" on the data. You take the original data and split it into two independent streams of information.
    • Stream A is used to pick the winner.
    • Stream B is used to test the winner.
    • Crucially, Stream B still contains some information about the winner, even though it's independent of Stream A.
  • The Good: You don't throw away data like in Sample Splitting. You get a verdict even when the data is tricky.
  • The Bad: It only works if the data follows specific mathematical rules (like a bell curve). If your data is weird, this filter breaks.

4. Randomized CSI (The "Controlled Chaos")

  • How it works: This is a mix of the above. You use the whole data to pick the winner, but you add a little bit of "noise" (random static) to the selection process.
    • Imagine adding static to a radio signal. You pick the station based on the noisy signal.
    • Then, you use the original clean signal to test the station, but you account for the fact that you picked it based on the noisy version.
  • The Good: It prevents the "infinite verdict" problem of the Strict Judge. It uses all the data but keeps the math manageable.
  • The Bad: You have to introduce artificial randomness, which can feel weird to scientists who want pure data.

Real-World Examples from the Paper

The authors tested these recipes on three real scenarios:

  1. The "Winner's Curse" (Example 1): Picking the best-performing drug from a list of 100.
    • Lesson: If you pick the winner and test it normally, you overestimate its success. You need to adjust for the fact that you picked the "champion."
  2. Regression Trees (Example 2): Using an algorithm to find subgroups of patients who respond well to a treatment.
    • Lesson: The algorithm finds the groups because of the data. If you then test those groups, you are double-dipping. The "Two-Team" (Sample Splitting) or "Magic Filter" (Data Thinning) approaches worked well here.
  3. Single-Cell RNA Sequencing (Example 3): Grouping cells into types (like "T-cells" vs. "B-cells") and then checking which genes are different.
    • Lesson: This is the hardest case. You can't split the cells in half easily because if you cluster half, you don't know how to label the other half.
    • Result: The "Magic Filter" (Data Thinning) and "Controlled Chaos" (Randomized CSI) worked best. The "Strict Judge" (Full CSI) was too rigid and couldn't handle the messy biological data.

The Bottom Line

Science has moved from "hypothesis-driven" (guessing first, then testing) to "data-driven" (exploring first, then testing). The old math doesn't work for this new way of doing science because it leads to false confidence.

The paper concludes that there is no single "perfect" tool.

  • If you want to use all your data and have a complex model, you might need the Strict Judge (Full CSI), but be prepared for wide, uncertain answers.
  • If you want simplicity and don't mind throwing away some data, Sample Splitting is great.
  • If you have clean, standard data, Data Thinning is a sweet spot.
  • If you want a balance of using all data and getting a definite answer, Randomized CSI is often the winner.

The Takeaway: Scientists need to stop "double dipping." They must choose a method that acknowledges they picked their question from the data, not before it. The paper provides a menu of options to help them do that without losing their minds over the math.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →