📄 genetic and genomic medicine

Cross-ancestry performance of Parkinson's disease polygenic risk scores in admixed Latin American populations

This study demonstrates that in admixed Latin American populations, polygenic risk scores for Parkinson's disease derived from large European GWAS currently outperform those from smaller ancestry-matched datasets, though methods incorporating functional annotations like SBayesRC offer the best predictive performance, highlighting the urgent need for larger, diverse genetic studies to ensure equitable clinical translation.

Original authors: Flores-Ocampo, V., Reyes-Perez, P., Ogonowski, N. S., Sevilla-Parra, G., Diaz-Torres, S., Leal, T. P., Waldo, E., Ruiz-Contreras, A. E., Alcauter, S., Arguello-Pascualli, P., Mata, I. F., Renteria, M.

Published 2026-03-03

📖 5 min read🧠 Deep dive

CC BY 4.0

Original authors: Flores-Ocampo, V., Reyes-Perez, P., Ogonowski, N. S., Sevilla-Parra, G., Diaz-Torres, S., Leal, T. P., Waldo, E., Ruiz-Contreras, A. E., Alcauter, S., Arguello-Pascualli, P., Mata, I. F., Renteria, M. E., Medina-Rivera, A., Dennis, J. K.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Picture: A Recipe for Prediction

Imagine you are trying to bake the perfect cake (predicting who might get Parkinson's disease). To do this, you need a recipe (a Polygenic Risk Score, or PRS) that tells you which ingredients (genetic markers) matter most.

For a long time, scientists have only had access to recipes written by bakers from Europe. These recipes work great if you are baking for a European audience. But when you try to use those same European recipes to bake for Latin American populations, the cake often turns out flat or tastes weird. Why? Because Latin American populations are a unique "three-way mix" of Native American, European, and African ancestry. The genetic "ingredients" and how they interact are different.

This paper asks: How can we bake the best cake for Latin American people when our best recipes come from Europe, and our local recipes are still being written?

The Experiment: Testing Different Bakers

The researchers gathered a huge group of people from Latin America (about 3,300 individuals, split between those with Parkinson's and those without). They wanted to see which "baking method" worked best to predict the disease.

They tested four different "bakers" (statistical tools) using three different "cookbooks" (data sources):

The Cookbooks (Data Sources):
- The Giant European Cookbook: A massive collection of data from over 80,000 European people. It's huge and detailed, but it doesn't know much about Latin American genetics.
- The Small Local Cookbook: A tiny collection of data from about 1,500 Latin American people. It's culturally perfect, but it's so small it's missing many important details.
- The Mixed-World Cookbook: A medium-sized collection mixing data from Europe, Latin America, Africa, and Asia.
The Bakers (Methods):
- The Traditionalist (PRSice-2): Uses a simple "clump and threshold" approach. It's like picking the top 10 ingredients from a list. It's fast but often misses the nuance.
- The Smart Chef (SBayesRC): A newer method that uses "functional annotations." Think of this as a chef who not only looks at the ingredients but knows which ingredients are actually biologically important (like knowing that flour matters more than a random speck of dust).
- The Bridge Builder (PRS-CSx & BridgePRS): These methods try to "bridge" the gap. They take the big European data and the small local data and try to blend them together to create a custom recipe for the mixed population.

The Results: Who Won the Bake-Off?

Here is what they found, translated into plain English:

1. Size Matters (The "Big Cookbook" Wins)
Surprisingly, the Smart Chef (SBayesRC) using the Giant European Cookbook performed the best overall.

The Analogy: Even though the European recipe wasn't written for Latin Americans, it was so detailed and complete that it was still better than using a tiny, incomplete local recipe. The sheer volume of data from Europe outweighed the fact that the ancestry wasn't a perfect match.
The Catch: The local recipe (Latin American data) was just too small to be useful on its own yet.

2. The "Best" Depends on What You Measure

If you wanted to know how much risk a person had (the "odds"), the European-based score was the winner.
If you wanted to know how well the score could distinguish between sick and healthy people (the "AUC"), the score using the Mixed-World Cookbook was slightly better.
The Analogy: The European recipe told you how bad the cake might be, while the Mixed recipe was slightly better at telling you which cake was the "bad" one and which was the "good" one.

3. The "Mix" Matters
The researchers noticed that the more "European" ancestry a person had, the better the prediction worked.

The Analogy: If you are baking a cake that is 80% European flour and 20% local flour, the European recipe works great. If you are baking a cake that is 80% local flour, the European recipe starts to fail. The prediction gets weaker as the person's genetics get further away from the European data source.

The Takeaway: What Does This Mean for the Future?

The Good News:
We can already use the massive European data to help Latin American people, especially if we use smart tools (like SBayesRC) that know how to filter out the noise. It's better than nothing, and it's significantly better than using our tiny local data alone.

The Bad News:
We are still relying too much on European data. The "Local Cookbook" is too small. Until we gather more data from Latin American, African, and other underrepresented populations, we can't build the perfect, custom recipe for them.

The Solution:
The paper argues that we need to stop just "translating" European recipes. We need to hire more local bakers and write more local cookbooks. Programs like GP2 (Global Parkinson's Genetics Program) are doing exactly this—gathering more diverse data so that one day, we can predict Parkinson's risk accurately for everyone, regardless of their ancestry.

Summary in One Sentence

While we can currently use massive European genetic data to predict Parkinson's risk in Latin American populations (especially with smart tools), the most accurate predictions will only come when we finally build large, diverse genetic databases that truly represent the world's mixed populations.

1. Problem Statement

Polygenic Risk Scores (PRS) are powerful tools for predicting genetic liability to complex diseases like Parkinson's Disease (PD). However, their predictive accuracy is heavily dependent on the genetic similarity between the discovery Genome-Wide Association Study (GWAS) population and the target population.

The Disparity: Most large-scale GWAS are conducted in European (EUR) ancestry populations. Consequently, PRS derived from these studies show significantly reduced performance in non-European populations.
The Specific Challenge: Latin American (LatAm) populations present a unique and difficult case due to their three-way admixture (Native American/Admixed American [AMR], European [EUR], and African [AFR]). This creates complex local ancestry patterns and linkage disequilibrium (LD) structures that standard PRS methods struggle to model.
The Gap: While multi-ancestry methods exist, it is unclear whether they outperform single-ancestry methods when the target population is admixed and the available ancestry-matched discovery data (LatAm) is severely underpowered compared to EUR data.

2. Methodology

Data Sources:

Target Dataset: 3,315 individuals (1,872 PD cases, 1,443 controls) of Latin American ancestry from the Global Parkinson's Genetics Program (GP2). Genotyped on the Illumina NeuroBooster array and imputed using the TOPmed server.
Discovery GWAS Summary Statistics: Three distinct datasets were used to construct PRS:
1. EUR: Large-scale meta-analysis (~80k cases/proxy cases, ~1.7M controls).
2. AMR: Previous LatAm-specific GWAS (807 cases, 690 controls).
3. MAMA: Multi-ancestry meta-analysis (EUR, AMR, East Asian, African; ~49k cases, ~2.4M controls).

PRS Construction & Methods:
The study benchmarked four PRS tools representing two methodological classes:

Single-Ancestry Methods:
- PRSice-2: Uses Clumping and Thresholding (C+T).
- SBayesRC: A Bayesian method incorporating functional annotations to improve cross-ancestry portability.
Multi-Ancestry Methods:
- PRS-CSx: Jointly models ancestry-specific summary statistics and LD panels using a continuous shrinkage prior.
- BridgePRS: Uses a hierarchical model to "bridge" information from a large source ancestry (EUR) to a smaller target (AMR).

Experimental Design:

The target dataset was split into a tuning subset (983 cases, 675 controls) and an independent validation subset (889 cases, 768 controls).
Performance Metrics:
- Odds Ratio (OR): Risk change per standard deviation (SD) increase in PRS.
- Nagelkerke's Pseudo- $R^2$ : Variance explained (transformed to the liability scale).
- Area Under the Curve (AUC): Discriminative ability (case vs. control).
Covariates: Models adjusted for sex, age, family history, and the first 10 principal components (PCs).
Stratification: Performance was further analyzed by stratifying the cohort into quartiles based on individual global European ancestry proportions.

3. Key Results

Overall Performance:

Best Explanatory Power: The SBayesRC method using EUR discovery statistics achieved the highest effect size and variance explained.
- OR: 2.02 (95% CI: 1.83–2.22).
- Liability-scale $R^2$ : 0.031.
Best Discriminative Ability: The SBayesRC method using MAMA (multi-ancestry) discovery statistics yielded the highest AUC (0.67).
Method Comparison:
- SBayesRC outperformed all other methods across all metrics.
- PRS-CSx (EUR + AMR input) performed well, consistently outperforming PRSice-2 and BridgePRS.
- PRSice-2 and BridgePRS showed lower performance, likely due to the complexity of the three-way admixture and the small size of the AMR discovery sample.

Impact of Ancestry Composition:

PRS performance was positively correlated with the proportion of European ancestry in the target individuals.
In the lowest EUR ancestry quartile (Q1), the OR was 1.85; in the highest quartile (Q3/Q4), it rose to ~2.40.
Interestingly, the second quartile (moderate EUR ancestry) showed the highest $R^2$ but the lowest AUC, suggesting that increased heterogeneity in this group inflated variance metrics without improving case-control ranking.

Clinical Context:

Adding the PRS to a baseline model (age, sex, family history) improved the AUC from ~0.69 to 0.728.
The PRS provided predictive power comparable to other established clinical risk factors (e.g., family history).

4. Key Contributions

Benchmarking in Three-Way Admixture: This is one of the first studies to rigorously evaluate PRS methods specifically in Latin American populations, which involve complex three-way admixture (AMR, EUR, AFR) rather than simple two-way admixture.
Sample Size vs. Ancestry Match: The study empirically demonstrates that, under current conditions, large, well-powered EUR GWAS outperform smaller, ancestry-matched LatAm GWAS for PRS construction in LatAm populations. The statistical power of the EUR dataset compensates for the ancestry mismatch better than the limited power of the AMR dataset.
Utility of Functional Annotations: The superior performance of SBayesRC highlights the value of incorporating functional genomic annotations, which appear to capture causal variants shared across ancestries despite differences in LD structure.
Validation of Multi-Ancestry Potential: While single-ancestry EUR data currently wins, the study shows that multi-ancestry methods (like SBayesRC with MAMA data) can maximize discriminative ability (AUC), suggesting a path forward as diverse GWAS sample sizes increase.

5. Significance and Implications

Equity in Genomics: The findings underscore the urgent need to expand genetic studies in underrepresented populations. Currently, relying on ancestry-matched but underpowered datasets yields inferior results compared to leveraging large European datasets.
Clinical Translation: While PRS for PD in LatAm populations currently has modest predictive power, it adds significant value when combined with clinical risk factors. However, the study notes that calibration (converting scores to absolute risk) was not evaluated, which is a critical step before clinical implementation.
Future Directions: The results suggest that as the Global Parkinson's Genetics Program (GP2) and other initiatives increase the sample sizes of non-European GWAS, multi-ancestry methods will likely surpass single-ancestry approaches. Until then, incorporating functional annotations (as in SBayesRC) is the most effective strategy for improving portability in admixed populations.

Conclusion:
The study concludes that for the current generation of PRS in Latin American populations, large European discovery GWAS combined with functional annotation-aware methods (SBayesRC) offer the best predictive performance. However, the full potential of equitable precision medicine in these populations awaits the expansion of large-scale, diverse GWAS discovery cohorts.