Biologically informed genetic data transformations improve multi-omic comorbidity prediction in people with HIV

This study demonstrates that in people with HIV, biologically informed genetic data transformations—specifically polygenic risk scores and AlphaGenome-derived gene-level impact scores—significantly improve multi-omic prediction accuracy for coronary artery disease and chronic kidney disease compared to using raw SNP genotypes or principal components.

Ryan, B., Thorball, C. W., Ait Oumelloul, M., Kouyos, R., Tarr, P. E., Fellay, J.

Published 2026-03-10
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine your body is a massive, complex orchestra. To understand why some musicians (people) develop specific health issues like heart disease or kidney trouble, scientists usually try to listen to every single instrument in the orchestra at once. This is called multi-omics.

In this study, the researchers focused on people living with HIV. Even though modern medicine (ART) keeps them alive and healthy, they are still at higher risk for "comorbidities" like Coronary Artery Disease (CAD) and Chronic Kidney Disease (CKD).

The researchers wanted to build a "crystal ball" (a computer model) to predict who might get sick. They knew two things were important:

  1. The Genetic Scorecard (Genomics): Your DNA, which is like the sheet music written before the concert even starts.
  2. The Current Performance (Other Omics): Your blood proteins or metabolites, which are like the actual sound the orchestra is making right now.

The Problem: Too Much Sheet Music

The researchers ran into a huge problem. Your DNA sheet music is enormous—it has millions of notes (called SNPs). If you try to feed all those millions of raw notes into a computer along with the current performance data, the computer gets confused. It's like trying to read a library's worth of books while simultaneously listening to a symphony; the computer just can't find the signal in the noise.

Usually, scientists try to fix this by:

  • Throwing away most of the notes: Keeping only a random handful.
  • Grouping notes together: Using a technique called PCA (Principal Component Analysis) to summarize the music.

But the researchers suspected these methods were too blunt. They wanted to see if smarter ways of organizing the genetic data would help the computer predict the future better.

The Experiment: Four Ways to Read the Sheet Music

The team tested four different ways to translate the raw genetic data into something the computer could understand:

  1. Raw Notes (Raw SNPs): Feeding the computer the millions of raw genetic letters.
  2. The Summary (PCA): Grouping the notes into broad themes.
  3. The "Risk Score" (PRS): Instead of looking at every note, they used a pre-made "Risk Score" based on what we already know about heart and kidney disease from huge global studies. It's like having a cheat sheet that says, "These specific notes usually mean trouble."
  4. The "AI Translator" (AlphaGenome): They used a super-advanced AI (a "foundational DNA model") that reads the DNA and translates it into a summary of how specific genes might be impacted. It's like having a genius conductor who looks at the sheet music and instantly tells you, "This section is likely to be loud and chaotic."

The Results: Quality Over Quantity

Here is what they found, using a simple analogy:

  • The "Raw Notes" and "Summary" approaches failed. When they tried to mix the raw genetic data or the simple summaries with the current blood data, the computer actually got worse at predicting who would get sick. It was like trying to mix a messy pile of sheet music with the live audio; the noise drowned out the useful signal.
  • The "Smart Translations" won. When they used the Risk Scores (PRS) or the AI Translator (AlphaGenome), the prediction accuracy went up.
    • For Kidney Disease, the best model combined the "AI Translator" genetic data with the blood metabolites. It was the most accurate crystal ball.
    • For Heart Disease, the "Risk Score" (PRS) was the star player.

The Big Takeaway

The main lesson of this paper is: Don't just dump raw data into a computer.

If you want to predict disease using a person's DNA and their current blood work, you need to translate the DNA first. You need to turn those millions of confusing genetic letters into a few meaningful, biologically smart summaries (like a Risk Score or an AI-generated impact score).

In everyday terms:
Imagine you are trying to guess the weather.

  • Method A (Raw Data): You give the computer 10 million raw temperature readings from every single leaf on every tree in the world. The computer crashes.
  • Method B (Smart Translation): You give the computer a simple report that says, "The air pressure is dropping, and the humidity is high." The computer predicts rain perfectly.

This study shows that for people with HIV, using these "smart translations" of their DNA helps doctors predict heart and kidney problems much better than using raw genetic data alone. It's a step toward personalized medicine that doesn't require millions of people to work—it just requires the right way of looking at the data we already have.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →