Multi-Stage Graph Attention Networks for Interpretable Alzheimer's Disease Classification from Genome-Wide Association Data

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Picture: Solving the Alzheimer's Puzzle

Imagine the human genome (your DNA) as a massive library containing billions of books. Scientists have known for a long time that certain "typos" in these books can increase the risk of Alzheimer's disease.

For years, researchers have used a method called Polygenic Risk Scores (PRS). Think of this like a credit score. It adds up all the tiny "bad typos" you have across the library to give you a single number: "High Risk" or "Low Risk." It's helpful, but it's blunt. It tells you how much risk you have, but it doesn't tell you why or how those risks interact. It's like knowing your car won't start because the "engine score" is low, but not knowing if it's the battery, the spark plugs, or the fuel pump causing the issue.

The Problem: Alzheimer's isn't just about one bad typo. It's about how thousands of tiny typos talk to each other. If Gene A has a typo, it might only be dangerous if Gene B also has a specific typo. This is called epistasis (or gene-gene interaction). Traditional math struggles to find these hidden conversations because there are too many combinations to check.

The New Solution: A "Smart Network" Detective

The authors of this paper built a new kind of AI called a Graph Attention Network (GAT).

1. The Map (The Graph)

Instead of looking at genes as a flat list, they built a map.

Nodes (The Cities): Each city on the map is a gene.
Edges (The Roads): The roads connect genes that are known to work together (like genes in the same biological pathway or genes that are active in the same part of the brain).

2. The Detective (The GAT)

Imagine a detective walking through this map.

Old Method: The detective just counts how many "bad typos" are in the whole city.
New Method (GAT): The detective looks at a specific gene (City A) and asks, "Who are my neighbors?" Then, the detective uses Attention to decide: "Is this neighbor important right now?"
- If a neighbor is a close friend (a strong biological link), the detective pays close attention.
- If a neighbor is a stranger, the detective ignores them.
- This allows the AI to learn complex patterns, like "Gene A is risky only if its neighbor Gene B is also risky."

3. The Three-Stage Training (The School)

The AI didn't learn everything at once. It went through three stages of school:

Stage 1 (Learning the Map): The AI studied the map and the gene risks to learn how to spot patterns.
Stage 2 (Adding Context): The researchers realized the map wasn't enough. They added "non-coding" risks (typos in the margins of the books that don't change the words but change how they are read). They injected this data into the AI to help it understand the bigger picture.
Stage 3 (Bias Removal): The AI was learning too much about the ancestry of the patients (e.g., "This group has more Alzheimer's because they are from a specific region, not because of their genes"). The researchers taught the AI to "forget" ancestry so it only focused on the actual disease biology.

The Results: A Better Prediction

When they tested this new system:

The Old Way (Credit Score/PRS): Got about 80% accuracy.
The New Way (The Map Detective): Got about 78% accuracy on its own.
The Team Up (Ensemble): When they combined the "Credit Score" with the "Map Detective," they got 82% accuracy.

Why is this a big deal?
It proves that looking at how genes talk to each other (the map) adds new information that the simple "credit score" misses. It's like adding a second pair of eyes to a security camera; you catch things you would have missed before.

The "X-Ray" Vision (Interpretability)

One of the coolest parts of this paper is that the AI isn't a "black box." The researchers asked the AI, "Which genes were you looking at when you made your decision?"

The AI pointed its finger at specific genes and pathways, and guess what? It was right.

It highlighted APOE, the most famous Alzheimer's gene.
It found new suspects, like genes involved in iron-sulfur clusters (think of these as the tiny batteries inside your cells) and potassium channels (the switches that control brain electricity).
It even showed that in healthy brains, the "control group" had strong signals for cell repair and cleaning, while the "disease group" showed signals of stress and protein clumping.

The Takeaway

This paper is like upgrading from a simple weather forecast ("It will rain") to a detailed meteorological model ("It will rain because a low-pressure system is colliding with a cold front over the mountains").

By using a Graph Neural Network, the researchers created a tool that doesn't just count genetic risks but understands the relationships between them. This makes the prediction more accurate and, more importantly, gives scientists a clear map of where to look for new treatments. It turns a mountain of confusing data into a readable story about how Alzheimer's develops.

1. Problem Statement

Alzheimer's Disease (AD) is a complex genetic trait where risk is distributed across numerous small-effect loci rather than a few large-effect variants. While Polygenic Risk Scores (PRS) effectively aggregate additive genetic risk, they suffer from three critical limitations:

Lack of Epistasis: PRS are linear sums and fail to capture non-additive gene-gene interactions (epistasis).
Limited Interpretability: PRS provide a single risk score without elucidating specific biological mechanisms or gene networks involved.
Ancestry Bias: Genetic models often struggle with population stratification, leading to biased predictions across different ancestries.

The authors aim to develop a deep learning framework that leverages Graph Neural Networks (GNNs) to model gene-gene interactions within a biological context, thereby improving predictive accuracy and providing biologically interpretable insights into AD etiology.

2. Methodology

The study proposes a three-stage Graph Attention Network (GAT) framework trained on individual-level GWAS data from 7,358 participants across seven Alzheimer's Disease Center cohorts.

A. Data Preprocessing and Feature Engineering

Input Data: Individual genotypes and phenotypes from AD cohorts, harmonized with 1000 Genomes Phase 3 data.
Gene Mapping: SNPs were mapped to genes using a hierarchical pipeline (functional consequences $\rightarrow$ proximity to Transcription Start Sites $\rightarrow$ MAGMA gene windows).
Node Features: Each gene node is assigned risk scores derived from:
- AD-specific GWAS summary statistics.
- 11 genetically correlated phenotypes (e.g., Schizophrenia, Fluid Intelligence, brain volumes) identified via LD Score Regression.
Intergenic PRS: Risk scores for non-coding SNPs (intergenic) were calculated separately and injected as global graph-level features.

B. Graph Construction

Two distinct graph construction strategies were evaluated:

Hippocampal Co-expression Graphs: Nodes are genes; edges represent Pearson correlation coefficients from post-mortem hippocampal transcriptomic data (filtered by top 0.125%, 0.25%, 0.375% correlations).
Curated Pathway Graph: Nodes are genes; edges are derived from KEGG and Reactome pathways, Gene Ontology (GO) terms, and co-expression data.
- Optimization: The pathway graph underwent Forman-Ricci curvature pruning (removing ~75% of low-curvature edges) and Ricci flow rewiring to reduce over-smoothing and alleviate over-squashing, ensuring robust information flow.

C. Model Architecture: Multi-Stage GAT

The model consists of three sequential training stages:

Stage 1: GNN Encoder Training
- Uses GATConvV2 layers to compute attention coefficients for edges, dynamically weighting gene-gene interactions.
- Includes a Bilinear Context (BLC) module: This captures global gene-gene interactions beyond the immediate graph topology by projecting node representations into local and global subspaces, multiplying them, and re-injecting the result.
- Outputs graph-level embeddings via pooling (sum, mean, max) fed into fully connected layers for classification.
Stage 2: Transfer Learning with Intergenic Injection
- Weights from Stage 1 are transferred.
- Intergenic PRS (non-coding risk) are injected as graph-level features after the encoder.
- Progressive unfreezing of layers allows the model to learn how non-coding risk modulates gene-level representations.
Stage 3: Adversarial Debiasing
- A Gradient Reversal Layer (GRL) is added to an auxiliary head predicting ancestry (using the top 10 Principal Components).
- An adaptive scheduler dynamically adjusts the adversarial loss weight ( $\lambda$ ) to minimize ancestry prediction ( $R^2 < 0.05$ ) while maintaining classification performance (AUROC > threshold). This forces the model to learn ancestry-invariant patterns.

D. Ensemble Strategy

The GNN logits are combined with whole-genome PRS predictions using Elastic Net regression to create an ensemble classifier.

3. Key Contributions

Novel Architecture: First application of a multi-stage GAT with bilinear context and adversarial debiasing specifically for AD classification using GWAS data.
Graph Optimization: Introduction of Forman-Ricci curvature pruning and rewiring to optimize biological pathway graphs for GNNs, addressing over-smoothing issues.
Interpretability: Development of a post-hoc analysis pipeline (gradient-based attribution, ablation studies, and subgraph extraction) to identify specific genes, edges, and biological pathways driving predictions.
Hybrid Signal Integration: Demonstrated that combining additive PRS with non-additive GNN signals yields superior performance compared to either method alone.

4. Results

Predictive Performance

Best GNN Model: The GAT with Bilinear Context on the Pathway Graph (Stage 2) achieved an AUROC of 0.78 (95% CI: 0.75–0.80).
Ensemble Performance: Combining Stage 2/3 GNN logits with Whole-Genome PRS via Elastic Net achieved an AUROC of 0.82 (95% CI: 0.79–0.84).
Comparison: This significantly outperformed PRS alone (AUROC 0.80) and Deep Sets models (MLP without graph structure), confirming the value of graph topology and epistatic modeling.
Ancestry Debiasing: Stage 3 models successfully reduced ancestry prediction ( $R^2 < 0$ ) without degrading classification accuracy.

Explainability and Biological Insights

Feature Importance: Ablation studies showed that removing the top 10% of important nodes or edges significantly reduced AUROC, while removing bottom nodes had no effect. Top nodes were highly connected (hubs) but not necessarily enriched for known AD hubs, suggesting novel network structures.
Gene Attribution:
- Known Genes: The model correctly identified APOE, TOMM40, APOA2, and AFM as top contributors.
- Cell Type Specificity: GSEA revealed strong enrichment in deep-layer inhibitory interneurons (LAMP5, CRABP1) and VIP interneurons, consistent with single-nucleus transcriptomic studies.
- Novel Pathways:
  - AD: Enrichment in MET/PTK2 signaling (hepatocyte growth factor receptor), Iron-sulfur cluster transfer, and Potassium channels (for Fluid Intelligence).
  - Control: Enrichment in ER calcium homeostasis and unfolded protein response.
Subgraph Analysis: Case-specific subgraphs showed unique enrichment for amyloid fiber formation and membrane proteolysis, while control subgraphs highlighted neuron maturation and metabolic homeostasis.

5. Significance and Conclusion

This study demonstrates that Graph Neural Networks can effectively capture complementary, non-additive genetic signals in AD that are missed by traditional linear PRS. By integrating biological priors (pathways) with data-driven graph learning and adversarial debiasing, the model achieves state-of-the-art classification accuracy while remaining interpretable.

The findings validate the hypothesis that epistatic interactions and global genomic context are critical for understanding complex genetic architectures. The identified novel pathways (e.g., MET signaling, iron-sulfur metabolism) and cell-type vulnerabilities offer new hypotheses for AD pathogenesis and potential therapeutic targets. The framework sets a precedent for using interpretable deep learning to dissect the "missing heritability" in psychiatric and neurodegenerative disorders.