An in silico framework for evaluating PRS-guided… — Plain-Language Explanation

Original authors: Cai, R., Gillard, J., Yang, S., Gasparyan, S. B., Lu, Y., Tian, L., Vedin, O., Ashley, E. A., Rivas, M. A., O'Sullivan, J. W.

Published 2026-03-24

📖 4 min read☕ Coffee break read

View on medRxiv ↗PDF ↗

CC BY 4.0

Original authors: Cai, R., Gillard, J., Yang, S., Gasparyan, S. B., Lu, Y., Tian, L., Vedin, O., Ashley, E. A., Rivas, M. A., O'Sullivan, J. W.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to test a new medicine to prevent heart attacks. The old way of doing this is like throwing a giant net into the ocean and hoping to catch a few fish. You recruit thousands of people, wait for years, and hope that enough of them actually get sick (have a heart attack) while on the study so you can see if the medicine works.

The problem? Most people in your net are healthy. They aren't going to get sick anytime soon. So, you end up waiting a long time, spending a fortune, and often still not catching enough "fish" (events) to prove your medicine works. This is why clinical trials are so expensive and slow.

The New Idea: The "Smart Net"

This paper proposes a smarter way to design these trials using a "digital simulator" and a tool called a Polygenic Risk Score (PRS).

Think of your DNA like a weather forecast for your health. A PRS is like a "storm risk score." Some people have a genetic makeup that makes them very likely to get sick soon (a "stormy" forecast), while others are very unlikely to get sick (a "sunny" forecast).

The researchers suggest: Why not only recruit the people with the "stormy" forecasts?

If you only test the medicine on people who are genetically likely to get sick soon, you will see results much faster. You need fewer people, you spend less money, and you get your answer sooner. This is called Prognostic Enrichment.

How They Tested It (The "Time Machine" Simulation)

The researchers didn't want to wait years to see if this idea works in real life. Instead, they built a computer simulation (an in silico framework) using data from the UK Biobank, which contains health and DNA data from half a million people.

They used a clever trick: Natural Experiments.
They looked for people who naturally carry "protective" genetic mutations. Imagine these mutations are like a built-in shield.

The "Treatment" Group: People with the shield (the mutation).
The "Control" Group: People without the shield.

Since these people already have the "shield" naturally, the researchers could pretend the shield was a new drug they invented. They ran their simulation to see: If we only recruited the "stormy" people (high risk) for our trial, would we find the difference between the shield and no shield faster and cheaper?

The Results: It Depends on the Disease

They tested this idea on three different diseases: Heart Disease, Glaucoma (eye disease), and Inflammatory Bowel Disease (IBD).

Heart Disease (CAD): This was a huge success. By only recruiting the top 25% of people with the highest genetic risk, they could cut the number of people needed for the trial by 60%. It was like finding the fish in a small pond instead of the whole ocean.
IBD: This was even more dramatic. They could cut the required sample size by 78%.
Glaucoma: This one was tricky. While picking the highest-risk people helped, if they picked too high a risk (the top 25%), there weren't enough people left to study to get a clear answer. It's like trying to find a needle in a haystack, but you only look in a tiny, tiny piece of the haystack—you might miss the needle entirely because the sample is too small.

The Big Takeaway

This paper gives scientists a blueprint or a calculator for the future.

Before they spend millions of dollars starting a real clinical trial, they can now run this simulation to ask: "If we use genetic risk scores to pick our participants, how much money and time will we save?"

The Analogy of the "Goldilocks" Zone
The study shows that there isn't one perfect rule for everyone.

For some diseases, you want to be very picky (only the top 25% risk).
For others, being too picky leaves you with too few people.
The goal is to find the "Goldilocks" zone: High enough risk to see results fast, but low enough risk to have enough people to study.

In Summary
This research is like giving clinical trial designers a GPS. Instead of driving blind and hoping to find a destination (a successful trial), they can now use genetic data to plot the most efficient route, saving time, money, and getting life-saving treatments to patients much faster.

1. Problem Statement

Clinical trials, particularly those for chronic diseases, face significant inefficiencies due to low event rates among participants. Traditional inclusion criteria often fail to capture high-risk individuals, necessitating large sample sizes and prolonged follow-up periods to achieve statistical power. This results in high costs, long timelines, and a risk of "null trials" where a potentially effective drug appears ineffective simply because the study was underpowered or too short. While genomic data (specifically Polygenic Risk Scores or PRS) offers a mechanism to stratify risk, current trial designs rarely leverage this information prospectively. There is a lack of a generalizable, data-driven framework to quantitatively evaluate how PRS-based enrichment would impact trial parameters (sample size, power, cost) before a trial begins.

2. Methodology

The authors developed an in silico framework using large-scale biobank data (UK Biobank) to simulate and evaluate PRS-guided prognostic enrichment.

Data Source: White British participants from the UK Biobank with linked genomic and electronic health record (EHR) data.
Natural Experiment Design: The study utilized naturally occurring protective genetic variants (loss-of-function variants) as proxies for therapeutic interventions.
- Treatment Arm: Carriers of the protective variant.
- Control Arm: Non-carriers.
- Rationale: These variants mimic the effect of pharmacological inhibition (e.g., PCSK9 inhibitors), allowing the study of "treatment" effects without an actual intervention.
Enrichment Strategy: Participants were stratified by Polygenic Risk Scores (PRS) into four groups:
1. Unenriched: Full population.
2. Top 75%: Upper 75% of PRS distribution.
3. Top 50%: Upper 50% of PRS distribution.
4. Top 25%: Upper 25% of PRS distribution.
Selection Criteria (Joint Information Score $\Psi$ ): The authors evaluated multiple disease-gene pairs and selected three based on a quantitative score ( $\Psi$ $Ψ$ ) that combined:
1. Significance of the protective variant effect in the full cohort.
2. Magnitude of PRS stratification (OR per SD).
3. Persistence of the variant effect within enriched strata (ensuring statistical power remains viable after restriction).
Analytical Approach:
- Disease Prevalence: Estimated via logistic regression (adjusting for age, sex, PCs) for carriers vs. non-carriers within each PRS stratum.
- Sample Size Calculation: Derived using standard two-proportions Z-tests to achieve 60–85% power.
- Empirical Power: Assessed via bootstrap simulations (1,000 replicates) with fixed sample sizes (100–1,000 per arm).
- Event Accrual: Simulated prospective follow-up (up to 10 years) to calculate Restricted Mean Survival Time (RMST) for disease-free cohorts.
- Cost Modeling: Estimated total trial costs including screening, PRS ascertainment, and execution.

3. Key Contributions

A Generalizable Framework: Established a systematic, prospective method to evaluate PRS enrichment strategies using existing population data, moving beyond retrospective post-hoc analyses.
Quantification of Trade-offs: Provided a quantitative assessment of the trade-off between enrichment intensity (higher risk, smaller sample) and statistical power (risk of losing signal due to small sample sizes in highly restricted strata).
Disease-Specific Insights: Demonstrated that optimal PRS thresholds are context-dependent, varying significantly by disease prevalence and genetic architecture.
Operational Modeling: Integrated cost analysis to show how screening costs interact with reduced trial execution costs under different enrichment scenarios.

4. Key Results

The framework was applied to three model gene-disease pairs:

CAD-PCSK9 (Coronary Artery Disease)
Glaucoma-ANGPTL7
IBD-IL23R (Inflammatory Bowel Disease)

General Findings:

Increased Prevalence: Restricting enrollment to higher PRS strata consistently increased disease prevalence in both arms, but the relative risk reduction (protective effect) remained detectable in CAD and IBD.
Sample Size Reduction:
- CAD-PCSK9: Restricting to the Top 25% PRS reduced the required per-arm sample size by ~60% (from ~6,100 to ~2,400) at 80% power.
- IBD-IL23R: Showed the most dramatic reduction, ~78% (from ~51,700 to ~11,100 per arm) due to the low baseline prevalence of IBD.
- Glaucoma-ANGPTL7: Showed a more modest reduction (~30%). Crucially, the Top 25% stratum did not yield better results than the Top 50% because the sample size became too small to maintain statistical significance of the protective variant effect (attenuation of signal).
Empirical Power:
- For CAD and IBD, power increased monotonically with stricter PRS thresholds.
- For Glaucoma, power peaked at the Top 50% threshold and declined at Top 25%, illustrating the "sweet spot" where enrichment benefits are maximized before sample size constraints degrade power.
Event Accrual (RMST): Enriched cohorts showed earlier event accumulation (lower RMST), suggesting shorter follow-up times could achieve target event counts. However, absolute reductions in follow-up time were modest (e.g., ~0.1 years for CAD).
Cost Implications:
- If genetic data is pre-existing, Top 25% enrichment reduced total projected costs by 41.8% (CAD) and 74.7% (IBD).
- If sequencing is required, the cost benefit is still positive but lower, as screening costs offset some savings.

5. Significance and Implications

Prospective Trial Design: This framework allows sponsors and regulators to simulate trial outcomes before initiation, optimizing enrollment criteria to reduce costs and duration.
Context-Dependent Optimization: The study proves that a "one-size-fits-all" PRS threshold is suboptimal. For rare diseases (IBD) or high-effect variants (CAD), aggressive enrichment (Top 25%) is highly efficient. For diseases where the protective signal is weaker or sample sizes are limited (Glaucoma), moderate enrichment (Top 50%) is superior.
Feasibility of Genomic Trials: As genomic data becomes routine in healthcare, this approach provides a scalable foundation for integrating genetic risk into clinical trials, potentially transforming how therapeutic targets are validated.
Limitations & Future Work: The study relies on UK Biobank (White British), limiting generalizability to diverse populations. It also assumes protective variants perfectly mimic drug effects, which may underestimate therapeutic magnitude. Future work should integrate multi-ethnic biobanks and explore other biomarkers beyond PRS.

In conclusion, the paper establishes that PRS-guided prognostic enrichment is a powerful tool for improving clinical trial efficiency, but its implementation requires a tailored, data-driven approach that balances risk enrichment against the statistical power required to detect treatment effects.

An in silico framework for evaluating PRS-guided prognostic enrichment in clinical trial design

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

5. Significance and Implications

More like this