Sparse autoencoders reveal organized biological knowledge but minimal regulatory logic in single-cell foundation models: a comparative atlas of Geneformer and scGPT

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Picture: Peeking Inside the "Black Box"

Imagine two super-smart AI robots, Geneformer and scGPT, that have read millions of biology textbooks (single-cell data). They are great at guessing what a cell is doing or what happens if you change a gene. But scientists have a nagging question: Do these robots actually understand how biology works (the cause-and-effect rules), or are they just really good at spotting patterns (like noticing that people who buy umbrellas also buy raincoats)?

To answer this, the author built a special tool called a Sparse Autoencoder (SAE). Think of this tool as a high-tech X-ray machine that can look inside the robot's brain while it's thinking.

The Discovery: A Library of Hidden Books

When the author used the X-ray machine, they found something amazing:

Massive Overcrowding (Superposition): The robots' brains have a limited number of "slots" (dimensions) to store information. But they are storing thousands of more concepts than there are slots.
- The Analogy: Imagine a library with only 1,000 shelves, but the librarian has 80,000 books. To make it work, they stack the books in a way that looks like a solid wall to the naked eye. You can't see the individual books unless you have a special decoder. The AI is doing this with biological concepts.
- The Result: The author found 82,000+ hidden "features" (concepts) in Geneformer and 24,000+ in scGPT.
Organized Knowledge: These hidden books aren't random. They are organized perfectly.
- The Analogy: If you open the library, you don't just see a mess. You see a section for "Cell Division," a section for "Immune System," and a section for "Mitochondria."
- The Result: The AI has learned the "vocabulary" of biology. It knows which genes belong to which pathways, just like a human biologist would.
The "U-Shape" Journey: As the information travels through the robot's layers (from input to output), the knowledge changes.
- Early Layers: Focus on raw parts (like "ribosomes" or "DNA replication").
- Middle Layers: Get abstract and messy (hard to label).
- Late Layers: Re-organize into big-picture goals (like "cell differentiation" or "stress response").

The Twist: The Robot Knows the "What," But Not the "Why"

Here is the most critical finding. The author tested if the robot understood causal logic (the "If I pull this lever, that light turns on" relationship).

The Test: They simulated a real-world experiment: turning off specific genes (using CRISPR data) and seeing if the robot's internal "books" changed in a way that matched the specific gene's job.
The Result: The robot was bad at this.
- It noticed that something changed (the cell state shifted).
- But it didn't know specifically which gene caused which effect.
- The Analogy: Imagine a detective who sees a crime scene and says, "Oh, a robbery happened! The window is broken, and the safe is open." But if you ask, "Did the butler do it, or the gardener?" the detective just shrugs and says, "I don't know, I just know a robbery happened."

The Stat: Out of 48 specific transcription factors (the "bosses" of genes), the robot only correctly identified the specific cause-and-effect relationship for 3 of them (6.2%).

Why Does This Matter?

It's Not the Tool's Fault: The author tried training the X-ray machine on different types of cells (not just one type) to see if the robot was just confused by the data. It didn't help much. The problem is the robot itself.
The Bottleneck: The current AI models are trained to predict the next word (or gene) based on patterns. They are excellent at memorizing correlations (things that happen together) but terrible at learning the actual rules of the universe (cause and effect).
The Future: To make these robots truly understand biology, we need to train them differently. Instead of just asking them to predict patterns, we need to teach them with experiments where they have to figure out why something happened.

The Gift to the World

The author didn't just write a paper; they built two interactive websites (Feature Atlases).

Think of these as Google Maps for the AI's brain.
Anyone can go online, search for a gene, and see exactly which "hidden book" in the AI's brain is talking about it, which layer it lives in, and how it connects to other concepts.

Summary in One Sentence

These AI models have memorized the entire encyclopedia of biology and organized it beautifully, but they are still just pattern-matching machines that haven't quite learned the actual rules of how genes control life.

1. Problem Statement

Single-cell foundation models (scFMs) like Geneformer and scGPT have demonstrated remarkable capabilities in cell type annotation and gene network inference. However, a critical question remains unresolved: Do these models encode causal regulatory logic (directed relationships between transcription factors and target genes) or merely statistical co-expression patterns?

Previous studies analyzing attention weights suggested that these models capture co-expression rather than unique regulatory signals. However, attention weights represent only one aspect of internal computation. The residual stream (the running sum of layer outputs) may contain richer, non-linear structures. Furthermore, the superposition hypothesis suggests that neural networks encode more concepts than their dimensionality allows by packing features into nearly orthogonal directions, rendering them invisible to standard linear decomposition methods like SVD or PCA.

2. Methodology

The authors applied Sparse Autoencoders (SAEs) to systematically decompose the dense residual stream activations of two leading scFMs into interpretable features.

Models Analyzed:
- Geneformer V2-316M: 18 layers, hidden dimension $d=1,152$ , trained on ~30M cells using rank-value tokens and next-token prediction.
- scGPT Whole-Human: 12 layers, hidden dimension $d=512$ , trained on ~33M cells using continuous expression values and masked gene prediction.
SAE Architecture:
- TopK SAEs: Trained on per-position residual stream activations.
- Configuration: 4× overcomplete dictionary (4,608 features for Geneformer, 2,048 for scGPT) with $k=32$ sparsity.
- Data: Geneformer trained on K562 control cells; scGPT on diverse Tabula Sapiens cells.
Analytical Pipeline:
1. Feature Extraction & Annotation: Identified "alive" features and annotated them against five biological databases (Gene Ontology, KEGG, Reactome, STRING, TRRUST).
2. Superposition Quantification: Compared SAE features against top-50 SVD axes to measure "novelty."
3. Co-activation Analysis: Constructed graphs using Pointwise Mutual Information (PMI) to identify functional modules.
4. Causal Patching: Performed feature ablation (zeroing specific feature activations) to measure specificity in disrupting target gene logits.
5. Perturbation Response: Mapped CRISPRi knockdown responses to test if features respond specifically to the regulatory targets of the perturbed TFs.
6. Cross-Model & Multi-Tissue Controls: Compared architectures and tested SAEs trained on multi-tissue data to isolate bottlenecks.

3. Key Contributions

First Comprehensive SAE Atlas for scFMs: Produced atlases of 82,525 features for Geneformer and 24,527 features for scGPT across 30 total layers.
Interactive Web Platforms: Released two interactive web tools (Geneformer Feature Atlas and scGPT Feature Atlas) allowing the community to explore over 107,000 features.
Methodological Framework: Established a standardized pipeline for applying mechanistic interpretability (SAEs) to biological foundation models, proving its applicability across different architectures.

4. Key Results

A. Massive Superposition

Invisibility to SVD: 99.8% of SAE features are invisible to standard SVD decomposition. Only 189 features (0.2%) aligned with SVD axes.
Biological Signal: The "novel" SAE features carry 98.7% of all ontology annotations.
Compression Ratio: The models encode at least 82,525 distinct biological concepts within 1,152 dimensions (Geneformer), a compression ratio exceeding 70×.

B. Organized Biological Knowledge

Annotation Rates: 29–59% of features are significantly annotated to biological pathways or functions.
U-Shaped Layer Profile:
- Early Layers: Encode molecular machinery (e.g., cell cycle, DNA replication).
- Middle Layers: Abstract computation with lower annotation rates.
- Late Layers: Re-specialize into integrative programs (e.g., differentiation, signaling) before optimizing for prediction.
Modular Structure: Features organize into 141 co-activation modules (Geneformer) and 76 modules (scGPT), representing distinct biological programs (e.g., immune signaling, metabolism).
Information Highways: 97–99.8% of features form "information highways" across layers, indicating strong functional connectivity despite layer-specific feature identities.

C. Causal Specificity vs. Regulatory Logic

Feature-Level Causality: Causal patching revealed that individual features are causally necessary for specific computations. The median specificity ratio was 2.36×, with top features reaching 114.5× specificity (disrupting target genes significantly more than non-targets).
The Regulatory Bottleneck: Despite rich biological organization, the models fail to encode causal regulatory logic.
- When tested against genome-scale CRISPRi data, only 3 of 48 (6.2%) transcription factors showed feature responses specific to their known regulatory targets.
- The models detect that a perturbation occurred (92% detection rate) but do not encode which specific downstream targets should be affected.
Control Experiments: Training SAEs on multi-tissue data (K562 + Tabula Sapiens) yielded only a marginal improvement (6.2% → 10.4% specificity), confirming that the limitation lies in the model's representations, not the SAE methodology or training data diversity.

5. Significance and Conclusions

Redefining Model Capabilities: The study establishes that current scFMs have internalized organized biological knowledge (pathways, protein interactions, hierarchical abstraction) but lack directed causal regulatory wiring. They model co-expression states rather than regulatory mechanisms.
Implications for Interpretability: Standard linear probing (SVD) vastly underestimates the richness of biological representations in foundation models due to superposition. SAEs are essential for revealing this hidden structure.
Future Directions: To encode true regulatory logic, future foundation models likely require training objectives that explicitly distinguish cause from correlation, such as perturbation-aware pre-training.
Community Resource: The released atlases provide a granular map of the "biological computation" within these models, enabling researchers to explore how specific genes and pathways are represented across the network depth.

In summary, while Geneformer and scGPT are powerful tools for understanding cell states and co-expression, they currently function as sophisticated statistical encoders of biological context rather than mechanistic simulators of gene regulatory networks.