Differentiable Gene Set Enrichment Analysis for Pathway-Level Supervision in Transcriptomic Learning

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Problem: The "Chef" vs. The "Food Critic"

Imagine you are training a robot chef to cook a complex meal (a drug) based on a list of ingredients (a chemical structure).

The Chef's Goal (The Model): The robot is currently trained to get every single ingredient's taste perfectly right. If it adds a pinch too much salt or a drop too little of pepper, it gets a "bad grade." This is like the current AI models that try to predict the exact level of activity for every single gene in a cell.
The Critic's Goal (The Scientist): However, when the meal is served, the food critic (the scientist) doesn't taste every single grain of rice individually. Instead, they look at the flavor profile. They ask: "Is the spicy flavor strong? Is the sweet flavor balanced?" In biology, these "flavor profiles" are called Pathways (groups of genes working together).

The Mismatch:
The paper points out a major disconnect. The robot is being graded on individual ingredients (genes), but the critic is judging the overall flavor (pathways).

If the robot makes a tiny mistake on 50 different ingredients, the individual "taste" might still look okay.
But those 50 tiny mistakes could completely ruin the "spicy" flavor profile, making the dish taste like dessert instead of curry.
Currently, the robot doesn't know it's ruining the flavor profile because it's only being graded on the individual ingredients.

The Solution: dGSEA (The "Smart Grading System")

The authors created a new tool called dGSEA (Differentiable Gene Set Enrichment Analysis). Think of this as a translator that lets the robot chef learn from the food critic while it is cooking, not just after.

Here is how it works, broken down into three simple steps:

1. Softening the Hard Edges (The "Blurry Lens")

Traditional methods for judging flavor profiles are "hard." They say, "This ingredient is definitely in the top 10 spiciest," or "It is definitely not." If you move an ingredient from rank 10 to rank 11, the score jumps wildly. This is like a light switch: it's either ON or OFF. You can't smoothly turn it up.

The Fix: dGSEA uses a "dimmer switch" (mathematically called soft sorting). Instead of saying "Gene A is #1," it says "Gene A is mostly #1, but maybe a little bit #2." This makes the grading system smooth and continuous, allowing the robot to learn from small corrections without getting confused by sudden jumps.

2. Keeping the Meaning (The "Universal Translator")

If you just smooth things out, you might lose the specific meaning of the original test. The authors made sure dGSEA speaks the same language as the old, trusted methods.

The Fix: They added a "calibration" step. It's like taking a new, high-tech thermometer and adjusting it so that "100 degrees" on the new device means exactly the same thing as "100 degrees" on the old, trusted thermometer. This ensures that if the robot learns to improve the "spicy" score, it actually means the dish is getting spicier in a way real scientists understand.

3. Speeding It Up (The "Express Lane")

Calculating these flavor profiles for thousands of ingredients is usually very slow. It's like trying to count every single grain of sand on a beach to check the tide. Doing this while the robot is cooking would take forever.

The Fix: The authors built a "shortcut" (called nyswin). Instead of counting every single grain of sand, they sample a few representative handfuls and estimate the rest. This makes the process nearly instant, allowing the robot to learn from the critic in real-time.

The Result: A Better Chef

When they tested this new system:

Without dGSEA: The robot got good at predicting individual ingredients, but the overall "flavor profiles" (pathways) were often wrong or unstable.
With dGSEA: The robot got even better at predicting individual ingredients, AND it learned to get the "flavor profiles" right much more often.

The Takeaway

This paper introduces a way to teach AI models to care about the big picture (how groups of genes work together) while they are learning the details (individual genes).

By turning a slow, rigid, post-cooking inspection into a smooth, real-time coaching session, dGSEA helps AI models make predictions that are not just mathematically accurate, but biologically meaningful. It bridges the gap between "getting the numbers right" and "understanding the story the numbers tell."

1. Problem Statement

In transcriptomic-driven drug discovery, a critical objective mismatch exists between model training and downstream interpretation:

Training: Upstream models (predicting chemical-induced transcriptional profiles from molecular structures like SMILES) are typically optimized using gene-level objectives (e.g., Mean Squared Error, Pearson correlation). These treat all genes as equally important.
Interpretation: Downstream analysis relies on pathway-level statistics, specifically Gene Set Enrichment Analysis (GSEA), which uses rank-based metrics (Normalized Enrichment Score, NES) to identify biological mechanisms.
The Conflict: GSEA is fundamentally non-differentiable due to hard ranking, discrete prefix accumulation, and extremum selection. Consequently, it cannot be used as a training signal. This leads to a scenario where a model may achieve high gene-level accuracy (low MSE) but fail to capture biologically meaningful pathway activation or even reverse enrichment directions due to small ranking perturbations.

2. Methodology: Differentiable GSEA (dGSEA)

The authors propose dGSEA, a smooth, differentiable surrogate for classical GSEA that maps predicted gene-level scores to pathway enrichment scores with well-behaved gradients. The methodology consists of three core technical innovations:

A. Differentiable Relaxations of Non-Differentiable Operations

To enable gradient-based optimization, dGSEA replaces the discrete steps of classical GSEA with temperature-controlled continuous relaxations:

Soft Ranking: Replaces hard sorting with a sigmoid-based soft rank. For a gene $i$ , the rank $r_i$ is approximated by summing sigmoid functions of score differences against all other genes, controlled by a temperature parameter $\tau_{rank}$ . As $\tau_{rank} \to 0$ , this converges to hard ranking.
Smooth Prefix Accumulation: Replaces the discrete running-sum curve with a smooth accumulation using a soft prefix indicator (sigmoid) controlled by $\tau_{prefix}$ . This generates a continuous enrichment curve $C_{soft}(t)$ .
Differentiable Extremum Aggregation: Replaces the selection of the maximum deviation (max/min) with a temperature-weighted softmax aggregation controlled by $\tau_{abs}$ . This assigns weights to positions based on the magnitude of deviation, converging to the true extremum as temperature vanishes.

B. Statistical Semantics Preservation (dNES)

To ensure the differentiable score remains biologically interpretable and comparable to classical GSEA:

Sign-Specific Robust Permutation Normalization: The authors introduce dNES (differentiable NES). They compute a null distribution via gene-label permutations and estimate the mean absolute enrichment score separately for positive and negative signs.
Robust Estimation: To handle outliers in the permutation distribution, they use a hybrid estimator combining a trimmed mean and a Winsorized mean.
$\kappa$ -Calibration: A calibration factor is introduced to align the scale of the differentiable null distribution with the classical NES scale, ensuring numerical comparability.

C. Scalable Implementation (nyswin)

Classical and naive dGSEA formulations have $O(G^2)$ complexity due to all-pairs comparisons and full grid evaluation, making them infeasible for genome-scale data ( $G \sim 10^4$ ). The authors propose nyswin, a scalable approximation:

Nyström Approximation: Reduces soft ranking complexity from $O(G^2)$ to $O(Gm)$ by sampling $m$ anchor points (quantiles) from the score distribution instead of comparing all pairs.
Windowed Grid: Restricts the prefix accumulation evaluation to a window around the median rank, reducing the running-sum calculation cost.
Result: This reduces complexity to near-linear, enabling end-to-end training on GPU.

3. Key Contributions

First Differentiable GSEA: The first framework to make GSEA fully differentiable, allowing pathway-level enrichment to serve as an explicit training objective rather than just a post-hoc diagnostic.
Theoretical Guarantees: Proves that dGSEA converges to classical GSEA as temperature parameters vanish and provides bounds on gradient stability.
Scalability: The nyswin algorithm enables genome-scale evaluation, reducing computational bottlenecks from quadratic to near-linear time.
Hybrid Training Strategy: Demonstrates a practical workflow where dGSEA is used as an auxiliary structured loss alongside standard gene-level objectives.

4. Results

The authors validated dGSEA using synthetic benchmarks and the LINCS L1000 dataset (978 landmark genes, ~10k compounds).

Semantic Fidelity: dGSEA closely matches classical GSEA.
- High Spearman correlation ( $\rho \approx 0.87$ ) between dNES and classical NES.
- Running-sum curves preserve the trajectory and enrichment direction of classical GSEA.
- Permutation p-values are well-calibrated under the null hypothesis.
Numerical Stability: dGSEA exhibits significantly reduced sensitivity to input noise compared to hard ranking (33% reduction in instability).
Training Performance (SMILES-to-Transcriptome):
- Gene-Level: Adding dGSEA as an auxiliary loss preserved gene-level accuracy (Mean Pearson $r$ : $0.449 \to 0.452$ ; RMSE: $0.420 \to 0.418$ ).
- Pathway-Level: Significant improvement in functional agreement.
  - Macro pathway correlation increased from 0.257 to 0.306 (+19%).
  - Sign accuracy improved from 0.620 to 0.641.
  - Pathway MSE decreased by ~10%.
- Ablation: Training only on dGSEA (without gene-level loss) resulted in catastrophic failure in gene-level reconstruction, confirming that dGSEA must be used as a complementary structured regularizer, not a standalone objective.

5. Significance

Bridging the Gap: dGSEA resolves the disconnect between optimization objectives and biological interpretation in drug discovery. It ensures that models are explicitly guided to learn biologically coherent pathway responses.
General Paradigm: The "soften, align, accelerate" framework proposed here offers a blueprint for integrating other discrete, rank-based, or set-based statistical tests (e.g., connectivity scoring) into deep learning pipelines.
Practical Utility: By enabling pathway-aware optimization without sacrificing gene-level fidelity, dGSEA provides a mechanism to improve the reliability of in silico drug screening and mechanism-of-action studies, particularly when prediction accuracy is imperfect.

In summary, the paper presents a mathematically rigorous and computationally efficient method to embed biological pathway knowledge directly into the training loop of transcriptomic prediction models, leading to more biologically interpretable and reliable drug discovery outcomes.