Statistical uncertainty explains the poor agreement in polygenic scoring for type 2 diabetes

This paper demonstrates that the poor agreement among polygenic scores for type 2 diabetes is fully explained by statistical uncertainty, and proposes that incorporating individual-level uncertainty estimates improves risk prediction accuracy and the identification of high-risk individuals.

Mandla, R., Li, X., Shi, Z., Abramowitz, S., Lapinska, S., Penn Medicine Biobank,, Levin, M. G., Damrauer, S. M., Pasaniuc, B.

Published 2026-02-27
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Problem: Too Many Different Answers

Imagine you go to a doctor to check your risk of getting Type 2 Diabetes. You hand them a DNA test. But instead of one clear answer, the doctor pulls out five different calculators (Polygenic Scores, or PGS) to predict your risk.

  • Calculator A says: "You are in the top 2% of risk! You are very likely to get diabetes."
  • Calculator B says: "You are in the bottom 50%. You are very safe."

This is the current problem in genetic medicine. Different "calculators" often give completely different answers for the same person. This makes doctors and patients confused. If the tools disagree so much, how can we trust them to make life-changing decisions?

The Discovery: It's Not a Broken Calculator; It's "Static"

The researchers in this paper asked: Why do these calculators disagree?

They discovered that the disagreement isn't because the calculators are broken or because the DNA is wrong. It's because of statistical uncertainty, or what we might call "genetic static."

Think of a polygenic score like trying to listen to a radio station in a car while driving through a tunnel.

  • The Signal: Your true genetic risk.
  • The Static: The uncertainty in how the scientists calculated the score.

Because the math used to build these scores isn't perfect, there is a "fuzzy zone" around every person's score.

  • Person A has a score of 90, but the static is low. The "fuzzy zone" is tight. We are very confident they are high risk.
  • Person B also has a score of 90, but the static is high. The "fuzzy zone" is huge. They might actually be a 70 or a 110. We aren't sure.

The paper found that this "static" explains why the calculators disagree. When the static is high, Calculator A might say "High Risk" while Calculator B says "Low Risk" just because they are listening to the static differently.

The Solution: Don't Just Look at the Number; Check the "Confidence"

The researchers proposed a new way to use these scores. Instead of just looking at the number (the point estimate), we should look at the confidence interval (the size of the fuzzy zone).

They created a system to categorize people into three groups:

  1. High Confidence: The static is low. The calculators all agree. (e.g., "We are 99% sure this person is high risk.")
  2. Medium Confidence: There is some fuzziness. The calculators mostly agree, but not perfectly.
  3. Low Confidence: The static is huge. The calculators are shouting different things. We can't trust the result yet.

The Results: Trusting the "High Confidence" Group

When the researchers looked at the people in the High Confidence group, they found something amazing:

  • Agreement: Almost all the different calculators agreed on these people.
  • Reality: These people were actually much more likely to develop Type 2 Diabetes in real life compared to people who just had a "high number" but low confidence.

The Analogy:
Imagine a weather forecast.

  • Scenario 1: A meteorologist says, "It will rain tomorrow," but their data is shaky and they are only 50% sure. You might ignore it.
  • Scenario 2: A meteorologist says, "It will rain tomorrow," and their data is rock solid with 99% certainty. You will definitely bring an umbrella.

This paper says: Stop treating all "High Risk" scores the same. Only trust the ones where the "meteorologist" is 100% sure.

The Catch: The "Static" is Louder for Some People

The researchers also found a sad truth about fairness.

  • Most of the big genetic studies were done on people of European ancestry. The "radio signal" is clear for them, so the static is low, and the confidence is high.
  • For people of African or other ancestries, the "radio signal" is weaker because the studies didn't include enough people like them. The static is much louder.

This means that for people of non-European ancestry, it is much harder to get a "High Confidence" score. If we only treat the "High Confidence" people, we might accidentally leave out many high-risk people from diverse backgrounds, making health inequality worse.

The Bottom Line

  1. The Problem: Different genetic risk scores often disagree because of mathematical "noise" (uncertainty).
  2. The Fix: We should measure how "noisy" the score is. If the noise is low, we can trust the score. If the noise is high, we should be careful.
  3. The Benefit: People with "low noise" (high confidence) scores are the ones most likely to actually get the disease, making them the best candidates for early prevention.
  4. The Warning: We need to fix the "noise" for diverse populations so that everyone gets a fair, clear reading, not just those who look like the people in the original studies.

In short: It's not just about what the score says, but how sure we are that the score is right.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →