Retrieval and competition: how a protein foundation… — Plain-Language Explanation

Imagine you have a very smart, well-read librarian named ESM2. This librarian has read millions of books (protein sequences) and is famous for predicting what the very first word of a new story will be.

In the world of biology, there is a simple rule: almost every protein story starts with the word "Methionine" (let's call it "M"). Because the librarian has read so many books, they are incredibly confident that if you hand them a story with the first word hidden (masked), the answer is "M."

But here is the mystery the paper solves: Is the librarian actually looking at the clues in the story to figure this out? Or are they just guessing based on a habit?

The authors of this paper decided to peek behind the curtain of the librarian's brain (the computer model) to see how they make this guess. They found that the librarian isn't actually "reading" the first spot of the story to find "M." Instead, they are using a clever, multi-step trick that involves a Reference Book, a Special Query, and a Tug-of-War.

Here is how the trick works, broken down into simple parts:

1. The "Reference Book" (The BOS Token)

In the librarian's office, there is a special token at the very beginning of every file called BOS (Beginning of Sequence). Usually, people think this is just a boring label, like a sticky note saying "Start Here."

The paper discovered that for the librarian, this sticky note is actually a Reference Book. Inside this book, the librarian has permanently written down the answer: "M." This book stays exactly the same, no matter what story you are reading. It's a static memory of the rule.

2. The "Special Query" (The Detective's Note)

When the librarian needs to guess the first word, they don't just look at the first spot. They have to send a detective (an attention head) to check the Reference Book.

But the detective can only go to the Reference Book if they have a Special Note (a "query") that says, "I am at the very beginning of the story."

How is this note made? It's not made in one step. It's built up over several layers of the librarian's brain. Other detectives in earlier layers look at the "Start" label and pass that information along, slowly assembling a note that says, "We are at Position 0!"
The Twist: If the detective is standing at Position 5 (the middle of the story), the note they build looks different. It doesn't say "Go to the Reference Book." It says, "Ignore the Reference Book."

3. The "Tug-of-War" (Circuit Competition)

This is the most important part. The librarian doesn't just have one voice. They have many internal voices (circuits) shouting out predictions.

Voice A (The Positional Prior): This voice is always shouting "M!" whenever the detective is at Position 0. It's loud and consistent.
Voice B (The Context Circuits): These voices listen to the rest of the story. If the story is about a long chain of "A"s (Alanine), Voice B shouts, "No! It's an 'A'!"

The Result:

At the start of the story (Position 0), Voice A is usually the loudest, so the librarian predicts "M."
In the middle of the story, Voice A is still shouting "M!" (because the Reference Book is always there), but Voice B is shouting "A!" much louder. The librarian listens to the loudest voice, so they predict "A."

The paper found that the librarian doesn't have a "mute button" to silence Voice A in the middle of the story. Instead, Voice A is always there; it just gets outvoted by the context.

4. The "Geometric Gate" (RoPE)

How does the librarian know where they are in the story to build the right note? They use a mathematical tool called RoPE (Rotary Position Embedding).

Think of RoPE as a compass or a clock hand that rotates as you move through the story.

At Position 0, the compass points North. This aligns perfectly with the "Reference Book," allowing the detective to find it.
At Position 10, the compass points South. This misaligns the detective, so they can't find the Reference Book, even though the book is still sitting there.

The paper discovered that this alignment happens in two ways:

Direction: The compass points the right way (Angular alignment).
Strength: The detective's voice gets louder or softer depending on the position (Norm amplification).

If you break the compass (remove RoPE), the librarian gets confused. They start shouting "M!" at every position in the story, not just the start, because they can no longer tell the difference between the beginning and the middle.

The Big Conclusion

The paper concludes that the librarian's confidence in predicting "M" at the start of a protein is not because they recognized a biological signal in that specific spot.

Instead, it's because:

They retrieved a stored rule from a "Reference Book" (BOS).
They used a "Special Note" built by a team of detectives to confirm they were at the start.
They won a "Tug-of-War" against other voices that were trying to guess based on the rest of the story.

Why does this matter?
The authors warn that if the librarian is confident, it doesn't mean they "understand" the biology. They might just be winning the tug-of-war with a statistical default. If the biology of a protein is weird (e.g., it starts with something other than "M"), the librarian will still guess "M" because their internal "Reference Book" says so, and they haven't learned the exception.

To truly trust an AI's prediction in science, we can't just look at the answer; we have to understand the circuitry (the detectives, the compass, and the tug-of-war) to see if the AI is using real evidence or just a lucky guess.

Technical Summary: Retrieval and Competition in Protein Foundation Models

Problem Statement
Protein language models (PLMs) are increasingly relied upon to guide experimental and clinical decisions. However, a critical ambiguity remains: does a confident prediction reflect the recognition of biologically meaningful evidence within the input, or merely the retrieval of a stored statistical default applied independently of specific evidence? While both mechanisms yield identical predictions, only the former justifies trust in generalization. Existing mechanistic interpretability methods have identified internal features and decomposed representations but have largely treated the attention mechanism—the routing of information between positions—as a black box. Specifically, the role of positional encoding in directing information flow has received little scrutiny. This paper addresses this gap by tracing the computational pathway of a simple, near-universal biological rule: that proteins typically begin with methionine (Met).

Methodology
The authors analyzed ESM2-8M, a 6-layer, 8-million-parameter protein language model. They employed a suite of mechanistic interpretability techniques to dissect the prediction of methionine at the first position (position 0) when the initial residue is masked:

Ablation Studies: Systematic zero-ablation of attention heads and MLP modules at specific token positions (mask-only vs. all-positions) to identify necessary components.
Activation Patching: Injecting intermediate representations (queries, keys, values, attention outputs) from "clean" runs (mask at position 0) into "corrupted" runs (mask at internal positions) to isolate causal dependencies.
Circuit Discovery: A greedy search to identify minimal sets of components required for methionine prediction.
Norm–Direction Decomposition: A novel decomposition of attention scores within Rotary Positional Embedding (RoPE) frequency bands. The authors separated the attention score ( $Q \cdot K$ ) into a magnitude term (query norm) and an alignment term (cosine of the angle between query and key) to determine how positional information is encoded.
RoPE Manipulation: Swapping RoPE rotations between positions and applying uniform index shifts to test geometric causality.
Competition Experiments: Context elongation and destruction experiments using poly-alanine sequences to observe how competing signals affect the methionine logit.

Key Results

Circuit Competition, Not Local Recognition:
The model predicts methionine at position 0 not by detecting the position-0 identity locally, but through a competition between circuits. The methionine circuit contributes a roughly constant signal at the first position. At internal positions, context-dependent circuits produce competing signals that often outweigh the methionine prior. When context is destroyed (e.g., masking internal residues), the methionine signal re-emerges as dominant. This indicates the model does not "switch on" a rule for position 0; rather, it maintains a constant prior that is overridden by stronger contextual signals elsewhere.
The Methionine Circuit Architecture:
The prediction is mediated by a distributed, multi-layer circuit rather than a simple local rule:
- Readout (Layer 6): Head L6H8 acts as the final readout. It attends almost exclusively to the Beginning-of-Sequence (BOS) token and writes a methionine-specific signal into the output. Its output-value (OV) vector is highly aligned with the methionine logit gradient.
- Query Composition (Upstream Layers): L6H8 does not act independently. It relies on a query constructed upstream (primarily in Layers 1, 4, and 5) by attention heads that attend to the BOS token. These upstream heads use RoPE-mediated positional signals to assemble a query that, when rotated by L6H8's RoPE, aligns with the BOS key.
- Representation Building (MLPs): Early-layer MLPs (L1, L2, L4) are essential for building the key/value representations at BOS and the query at the masked position. Without these, the circuit fails.
The Active Role of the BOS Token:
Contrary to the view of BOS as a passive delimiter, the authors find it functions as an active computational node. It stores a stable, position-invariant reference representation (key and value) that the circuit retrieves. The specificity of the prediction (why it happens at position 0 and not elsewhere) is determined entirely by the query formed at the masked token, not by changes in the BOS representation.
Mechanism of Positional Selectivity (RoPE):
The authors introduce a norm–direction decomposition of attention scores across RoPE frequency bands. They find that positional selectivity arises from coupled changes in query norm and angular alignment:
- Dominant Band ( $f_0$ ): Operates on both axes. At internal positions, the query norm for this band collapses by roughly seven-fold, and the cosine alignment with the BOS key drops from ~0.8 to near zero.
- Secondary Bands ( $f_1, f_2, f_6$ ): Act primarily as alignment selectors, where cosine values flip sign or collapse, while norms remain relatively stable.
- RoPE as a Gate: In an isolated minimal circuit (without competing context circuits), removing RoPE causes the model to predict methionine at all positions. Thus, RoPE does not just enable the prediction at position 0; it actively gates the circuit to suppress the prior at internal positions.

Significance and Claims
The paper claims that distinguishing between biological recognition and statistical retrieval requires resolution at the level of individual circuits, frequency bands, and query composition.

Mechanistic Verification: For even the simplest biological rule (proteins start with Met), the model's prediction is mediated by a distributed computational circuit rather than direct recognition. This suggests that for more complex tasks, the relationship between model confidence and underlying biological evidence will be increasingly obscured.
Interpretability of Confidence: High confidence does not guarantee the use of correct biological evidence; it may simply indicate that one circuit (the statistical prior) is dominating others.
General Computational Motifs: The findings highlight the active role of BOS tokens as reference frames and the use of query composition for positional routing, motifs that may be general to transformer architectures beyond protein models.
Limitations: The authors note that while the circuit architecture is identified in ESM2-8M, it remains an open question whether these specific motifs persist in larger models or across different PLM families. They also identify that the upstream query composition involves both purely RoPE-driven heads and at least one context-dependent head (L4H3), the specific content of which remains to be identified.

The study concludes that mechanistic verification is necessary and challenging for predictions where biological stakes are high, as the model's "knowledge" of biology may be a byproduct of statistical regularities retrieved via complex, distributed circuits.

Retrieval and competition: how a protein foundation model starts a protein