Causal Circuit Tracing Reveals Distinct Computational Architectures in Single-Cell Foundation Models: Inhibitory Dominance, Biological Coherence, and Cross-Model Convergence

Imagine two super-intelligent robots, Geneformer and scGPT, that have read every human cell's instruction manual (DNA) and learned to predict how cells behave. They are "foundation models," meaning they are huge, complex, and generally considered "black boxes"—we know they work, but we don't know how they think.

This paper is like taking a pair of X-ray glasses to these robots to see their internal wiring. The author, Ihor Kendiukhov, developed a new method called "Causal Circuit Tracing" to figure out exactly how information flows inside these digital brains.

Here is the story of what they found, explained simply.

1. The Experiment: "Turning Off the Lights"

To understand how a robot thinks, you can't just watch it; you have to poke it.

The Method: The researchers identified specific "features" inside the robot's brain (think of these as tiny neurons that light up when the robot thinks about "DNA repair" or "making energy").
The Trick: They systematically turned off (ablated) one feature at a time and watched what happened to the rest of the brain.
The Result: They mapped out a massive wiring diagram showing which features control which others. It's like pulling a fuse in a house and seeing which lights go out, which get brighter, and which stay the same.

2. The Big Discovery: The "Inhibitory" Brain

The most surprising finding was how these robots process information.

The Analogy: Imagine a busy office. In most offices, people shout instructions to each other (Excitatory). But in these robots, the dominant rule is "Silence is Golden."
The Finding: About 65% to 89% of the connections are inhibitory. This means that when one feature turns on, it usually tells the next feature to shut up or slow down.
Why it matters: It suggests these models work by filtering out noise. A feature only stays active if it's absolutely necessary. If you remove a necessary feature, the whole downstream process collapses because the "safety net" is gone.

3. Two Different Personalities

Even though both robots are trying to understand biology, they have very different "personalities" and organizational styles.

Geneformer (The Architect):
- Style: It's a massive, cooperative network. It has thousands of tiny workers.
- Focus: It organizes itself around RNA processing and chromatin (how DNA is packed). Think of it as a librarian organizing books.
- Vibe: Very stable, but the connections are a bit weaker individually.
scGPT (The Energy Manager):
- Style: It's more compact and competitive.
- Focus: It organizes itself around mitochondria and energy (how cells make power). Think of it as a power plant manager.
- Vibe: It has stronger, more dramatic connections. If you pull a wire here, the whole system reacts violently.

4. The "Universal Truths" (Convergence)

Despite their different personalities, the robots agreed on some fundamental truths about biology.

The Consensus: When the researchers compared the wiring diagrams, they found 1,142 specific pathways that both robots discovered independently.
The Significance: This is huge. It means these aren't just random glitches in the code; the robots have independently learned the real laws of biology.
The Example: Both robots figured out that DNA Damage leads to Cell Cycle Arrest (stopping the cell from dividing). They mapped this out as a clear, step-by-step chain of command, just like real biologists know it works.

5. The "Lens" Matters More Than the Camera

The researchers tested if the robots' brains changed based on what they were looking at (different cell types).

The Finding: The structure of the brain didn't change much based on the cell type. Instead, it changed based on how the brain was trained (the "lens").
The Metaphor: Imagine looking at a city through a foggy window vs. a clear window. The city (the model) is the same, but the clear window (a better-trained AI) lets you see the streets (biological circuits) much more clearly. If you train the AI on many different types of cells, it builds a "cleaner" map of how biology works.

6. The Limitation: "Knowing the Map, Not the Driver"

This is the most critical takeaway for the future.

What they got right: The robots are amazing at drawing the map. They know that "Process A" leads to "Process B." They know the big picture of how a cell works.
What they got wrong: They are bad at predicting the driver. If you ask, "If I knock out this specific gene, will the cell die?" the robots are only about 56% accurate (barely better than a coin flip).
The Reason: The robots learned correlation (things that happen together) rather than causation (things that cause each other). They know that "umbrellas" and "rain" always appear together, but they don't fully understand that the umbrella causes the rain to stop (or vice versa).

Summary

This paper is a breakthrough because it finally lets us peek inside the "black box" of biological AI.

We found the wiring: We can now see how these models process biological information.
We found the rules: They rely heavily on "stopping" signals (inhibition) to work.
We found the truth: Two different AI models independently discovered the same real biological pathways, proving they have learned genuine science.
The warning: While they are great at understanding the system, they aren't yet perfect at predicting specific gene outcomes.

In short: We have successfully reverse-engineered the "operating system" of these biological AIs, revealing that they have learned the deep logic of life, even if they still need help with the fine print.

Here is a detailed technical summary of the paper "Causal Circuit Tracing Reveals Distinct Computational Architectures in Single-Cell Foundation Models: Inhibitory Dominance, Biological Coherence, and Cross-Model Convergence."

1. Problem Statement

While Sparse Autoencoders (SAEs) have successfully decomposed activations in foundation models into interpretable, monosemantic features, the causal interactions between these features across network depth remain largely unknown in biological models. Previous studies established what features exist and where they are located (feature atlases), but they did not reveal how features causally influence one another to form the model's computational graph. Existing causal intervention methods (e.g., activation patching) have been applied to language models but not at the SAE feature level for biological foundation models. Furthermore, it is unclear whether different single-cell foundation models (e.g., Geneformer vs. scGPT) learn similar biological circuit structures or distinct computational strategies.

2. Methodology: Causal Circuit Tracing

The authors introduce Causal Circuit Tracing, a method that extends activation patching logic to the SAE feature level to map the directed computational graph of biological information processing.

Models Analyzed:
- Geneformer V2-316M: 18 layers, 4,608 features/layer. Tested with K562-only and Multi-tissue SAEs.
- scGPT (Whole-Human): 12 layers, 2,048 features/layer. Tested with Multi-tissue SAEs.
Experimental Conditions: Four conditions were tested using 200 cells each:
1. K562/K562: Geneformer with K562-only SAEs on K562 cells.
2. K562/Multi: Geneformer with Multi-tissue SAEs on K562 cells.
3. TS/Multi (GF): Geneformer with Multi-tissue SAEs on Tabula Sapiens (TS) cells.
4. TS/Multi (scGPT): scGPT with Multi-tissue SAEs on TS cells.
Procedure:
1. Source Selection: 30 well-annotated features were selected per source layer (e.g., L0, L5, L11, L15).
2. Ablation: The activation of a specific source feature $f$ at layer $L_{src}$ was zeroed ( $z_f \leftarrow 0$ ) after encoding the clean hidden state.
3. Propagation: The modified state was propagated through all subsequent layers.
4. Measurement: Downstream SAE feature activations were measured to calculate the change ( $\Delta$ ) relative to the clean pass.
5. Statistical Analysis: For each source-target pair, Cohen's $d$ (effect size) and Consistency (fraction of cells with the same sign of change) were computed. Edges were considered significant if $|d| > 0.5$ and consistency $> 0.7$ .
Scale: The study involved 96,892 causal edges and 80,191 forward passes across the four conditions.

3. Key Contributions

Methodological Innovation: First application of causal circuit tracing at the SAE feature level in biological foundation models, revealing the "wiring diagram" of biological information flow.
Architectural Comparison: A systematic comparison of two distinct single-cell foundation models (Geneformer and scGPT), revealing divergent computational strategies (cooperative vs. competitive).
Discovery of Universal Properties: Identification of invariant properties across models, such as inhibitory dominance and biological coherence, suggesting these are fundamental to how transformers process biological data.
Systematic Knowledge Extraction: Automated extraction of 1,142 conserved biological circuits and 29,864 novel candidate relationships, validated against CRISPRi data and disease gene sets.

4. Key Results

A. Computational Architecture: Inhibitory Dominance

Dense Graphs: The computational graphs are dense, with source features influencing hundreds to thousands of downstream features.
Inhibitory Bias: A striking 65–89% of causal edges are inhibitory (negative sign). Ablating a feature reduces downstream activation, implying features encode necessary information rather than redundant information.
- Geneformer: ~80% inhibitory.
- scGPT: ~65% inhibitory (more balanced/excitatory dynamics).
Effect Sizes: scGPT produces stronger individual effects (mean $|d| = 1.40$ ) compared to Geneformer ( $|d| = 1.05$ ), likely due to scGPT's lower dimensionality forcing features to carry more information.

B. Biological Coherence and Consensus

Shared Ontology: ~53% of causal edges connect features sharing biological ontology terms (GO, KEGG, Reactome, etc.). This value is invariant across model architectures and cell types, suggesting it reflects a structural property of biological knowledge organization.
SAE Dependence: Biological coherence is determined by the SAE training data, not the input cell type. Multi-tissue SAEs achieved 68.8% coherence regardless of whether they processed K562 or Tabula Sapiens cells, whereas K562-only SAEs achieved only 52.9%.
Cross-Model Convergence: 1,142 domain pairs (source $\to$ target biological processes) were conserved between Geneformer and scGPT, representing a 10.6× enrichment over chance ( $p < 0.001$ ). This confirms that both models independently learn genuine biological circuit structures.

C. Distinct Organizing Principles (Hubs)

Geneformer: Organized around chromatin and RNA processing hubs (e.g., Golgi Organization, RNA Methylation, Cholesterol Biosynthesis).
scGPT: Organized around mitochondrial energy metabolism hubs (e.g., NADH Dehydrogenase, Electron Transport Chain).
Temporal Ordering: The layer hierarchy faithfully encodes biological temporal sequences. Early layers (L0-L1) contain signaling cascades (MAPK, Ras), while late layers (L15-L17) contain gene expression regulation and cell fate decisions.

D. Validation and Limitations

CRISPRi Validation: Gene-level predictions derived from circuits showed 56.4% directional accuracy against Replogle CRISPRi data (marginally above chance) but near-zero magnitude correlation. This confirms the models encode co-expression patterns rather than precise causal regulatory logic at the gene level.
Disease Mapping: Disease-associated domains are 3.59× more likely to be part of cross-model consensus circuits and occupy more central positions in the graph, indicating that disease-relevant biology is the most robustly encoded.

5. Significance and Implications

Mechanistic Interpretability: The study proves that features, not attention heads or MLP layers, are the natural unit of computation in biological foundation models. Ablating components often yields null effects, whereas feature ablation reveals rich, coherent circuits.
Hypothesis Generation: The identification of 29,864 novel edges (connecting domains not linked in existing databases) provides a rich source of testable biological hypotheses, such as cross-compartment functional coupling (e.g., mitochondrial energy status driving protein transport).
Model Robustness: The convergence of distinct models on the same biological circuits (10.6× enrichment) suggests that foundation models have internalized a "ground truth" of biological organization, making them reliable tools for exploring biological structure, even if they lack precise causal gene-level prediction capabilities.
SAE Lens Importance: The choice of SAE training data (multi-tissue vs. single-cell) significantly impacts interpretability outcomes, highlighting that the "lens" used to view the model is as critical as the model itself.

In conclusion, this paper establishes a framework for causal circuit tracing in biology, revealing that single-cell foundation models possess dense, predominantly inhibitory computational graphs that faithfully encode the temporal and functional hierarchy of biological processes, with distinct architectures converging on a shared core of biological truth.