A DNA foundation model predicts osteoporosis risk genes without proximity bias

This paper introduces Rosalind, a DNA foundation model that overcomes the proximity bias of traditional gene-mapping approaches by accurately predicting distal variant-gene regulatory relationships from sequence, thereby successfully identifying novel osteoporosis risk genes and demonstrating a scalable framework for translating genetic insights into drug discovery.

Regep, C., Kapourani, C.-A., Sofyali, E., Dobrowolska, A., Loukas, G., Anighoro, A., Canale, E., Gross, T., Licciardello, M., Gupta, R., Maciuca, S., Desai, T., Del Vecchio, A., Field, C., Gemayel, K., Javer, A., Zhang, Z., Tsujikawa, R., Inoue, F., Hessel, E., Taylor-King, J., Whittaker, J., Roblin, D., McIntyre, R., Edwards, L.

Published 2026-03-12
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Problem: The "Nearest Neighbor" Mistake

Imagine you are looking at a massive, crowded city (the human genome). You spot a strange graffiti tag (a genetic mutation) on a wall. You want to know: Which building is this graffiti actually affecting?

For decades, scientists have used a very simple rule to answer this: "It must be the building right next to the wall."

This is called the "Nearest Neighbor" rule. If a mutation is found near Gene A, scientists assumed Gene A was the culprit causing the disease.

The Problem: DNA isn't a straight line; it's a tangled ball of yarn inside a tiny cell. A mutation might be physically close to Gene A, but because of how the DNA folds, it might actually be touching and controlling Gene Z, which is miles away in the linear sequence. Relying only on distance is like assuming the person shouting at you is the one standing right next to you, ignoring the fact that they might be shouting through a megaphone to someone across the street.

This "proximity bias" has caused scientists to miss the real causes of diseases like osteoporosis (brittle bones).

The Solution: Meet "Rosalind"

The authors of this paper built a new AI tool called Rosalind. Think of Rosalind not as a rule-follower, but as a super-smart detective who understands the "language" of DNA.

  • How it works: Instead of just measuring distance, Rosalind reads the DNA sequence like a sentence. It understands the grammar, the punctuation, and the long-range connections. It knows that a word (mutation) at the beginning of a paragraph can change the meaning of a word at the very end, even if they are far apart.
  • Training: Rosalind was trained on a massive library of human genetic data (from the GTEx project) to learn which mutations actually change how genes behave.

The Test: The "Bone Builder" Experiment

To prove Rosalind was better than the old "Nearest Neighbor" rule, the team used Osteoporosis (brittle bones) as a test case.

  1. The Setup: They took 1,103 known genetic "clues" linked to weak bones.
  2. The Prediction:
    • The Old Way (Nearest Neighbor) pointed to the genes closest to the clues.
    • Rosalind pointed to different genes, often ones far away from the clue.
  3. The Lab Test: They took human bone-building cells (osteoblasts) and used a molecular pair of scissors (CRISPR) to "turn off" the genes Rosalind predicted.
    • The Result: When they turned off the genes Rosalind picked (the "distal" ones), the bone cells stopped building bone properly.
    • The Surprise: When they turned off the "nearest neighbor" genes, the bone cells kept working fine!

The Analogy: Imagine you are trying to fix a broken car engine. The old rule says, "Check the part closest to the broken wire." Rosalind says, "No, check the fuel pump three feet away." The team went and checked the fuel pump, and that was the problem.

Why This Matters

  1. Better Drug Discovery: If you are trying to invent a drug to cure a disease, you need to target the right gene. If you target the wrong one (because you followed the "nearest neighbor" rule), the drug will fail. Rosalind helps find the real target, saving time and money.
  2. New Insights: Rosalind found that genes involved in primary cilia (tiny antenna-like structures on cells that sense movement) are crucial for bone health. This is a new discovery that the old methods completely missed.
  3. Scalability: This isn't just for bones. Because Rosalind understands the "language" of DNA, it can be applied to diabetes, heart disease, asthma, and more to find the true causes.

The Bottom Line

This paper introduces a new AI tool that stops guessing based on distance and starts understanding the complex, 3D reality of our DNA. By doing so, it helps scientists find the true "villains" behind diseases, leading to better medicines and cures.

In short: We stopped looking at the person standing next to the crime scene and started looking at who actually pulled the trigger, even if they were hiding in the next room.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →