ZeroFold: Protein-RNA Binding Affinity Predictions from… — Plain-Language Explanation

Original authors: Josef Hanke (Yusuf Hamied Department of Chemistry, University of Cambridge, UK), Sebastian Pujalte Ojeda (Yusuf Hamied Department of Chemistry, University of Cambridge, UK), Shengyu Zhang (Yusuf Hamie

Published 2026-03-26

📖 4 min read☕ Coffee break read

View on arXiv ↗PDF ↗

CC BY 4.0

Original authors: Josef Hanke (Yusuf Hamied Department of Chemistry, University of Cambridge, UK), Sebastian Pujalte Ojeda (Yusuf Hamied Department of Chemistry, University of Cambridge, UK), Shengyu Zhang (Yusuf Hamied Department of Chemistry, University of Cambridge, UK), Werngard Czechtizky (Medicinal Chemistry, Research and Early Development, Respiratory and Immunology, BioPharmaceuticals R and D, AstraZeneca, Sweden), Leonardo De Maria (Medicinal Chemistry, Research and Early Development, Respiratory and Immunology, BioPharmaceuticals R and D, AstraZeneca, Sweden), Michele Vendruscolo (Yusuf Hamied Department of Chemistry, University of Cambridge, UK)

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to predict how well two specific puzzle pieces will snap together. One piece is a protein, and the other is RNA. In the world of biology, these interactions are like a dance: the protein and RNA twist, turn, and change shape to find the perfect fit. If they fit well, they bind tightly; if not, they drift apart.

For a long time, scientists have struggled to predict exactly how strong this "dance" will be. Here is why: RNA is incredibly flexible. Unlike a rigid rock, RNA is more like a piece of cooked spaghetti. It wiggles and flops around in many different shapes.

The Old Problem: The "Freeze-Frame" Mistake

Previous computer models tried to solve this by taking a single "freeze-frame" photo of the RNA, guessing what shape it would take, and then calculating how well it fits the protein.

The problem with this approach is like trying to predict how well a dancer will perform by looking at a single, frozen photo of them. You miss all the movement, the flexibility, and the fact that they might change their pose to grab the partner's hand. By forcing the RNA into one static shape, the computer throws away crucial information about how it actually behaves.

The New Solution: ZeroFold

Enter ZeroFold, a new AI model created by researchers at the University of Cambridge and AstraZeneca. Instead of looking at the final "photo" (the predicted 3D structure), ZeroFold looks at the sketches the computer made before it decided on the final shape.

Think of it this way:

Old Method: The AI draws a single, rigid statue of the RNA and asks, "Does this statue fit?"
ZeroFold: The AI looks at the artist's rough, swirling sketches that show the RNA moving, stretching, and trying on different poses. It understands that the RNA is a "cloud of possibilities" rather than a single object.

These "sketches" are called pre-structural embeddings. They are a secret code that contains all the information about the RNA's flexibility without needing to commit to just one shape.

How ZeroFold Works

The Brain (Boltz-2): ZeroFold uses a massive, pre-trained AI brain called "Boltz-2" that already knows a lot about how proteins and RNA are built.
The Shortcut: Instead of waiting for Boltz-2 to finish drawing the final 3D picture, ZeroFold stops the process early and grabs the "thoughts" (embeddings) Boltz-2 was having while it was still figuring things out.
The Matchmaker: ZeroFold then uses a special attention mechanism (like a matchmaker) to see how the protein's "thoughts" and the RNA's "thoughts" interact.
The Prediction: Based on this interaction, it predicts the binding strength (affinity) directly from the sequence of letters (the genetic code), without ever needing to see a 3D structure.

Why This Matters

To train this AI, the researchers built a massive library called PRADB, containing over 2,600 unique protein-RNA pairs with real-world measurements of how tightly they stick together.

When they tested ZeroFold:

It was incredibly accurate: It achieved a score of 0.65, which is almost as good as the best possible score human experiments can achieve (since human experiments have their own noise and errors).
It was fair: When tested against other top models, ZeroFold didn't just win because it had "cheated" by seeing similar examples in its training data. Even when the test was made strictly harder (removing any similar examples), ZeroFold stayed strong while the others stumbled.
It's fast: Because it skips the step of building a 3D model, it can screen thousands of potential drug candidates in the time it takes other methods to screen just a few.

The Big Picture

This is a breakthrough because it solves the problem of flexibility. For years, we thought we needed a perfect 3D map to predict how molecules interact. ZeroFold shows that we don't. By understanding the potential shapes a molecule can take (the "cloud of possibilities"), we can predict how it will behave in the real world.

This opens the door to designing new medicines that target RNA (which is involved in many diseases) without needing to know the exact 3D structure of the target first. It's like learning to predict a dance partner's moves by understanding their rhythm and style, rather than memorizing a single pose.

1. Problem Statement

The accurate prediction of protein-RNA binding affinity is a critical unsolved problem in structural biology, essential for understanding gene regulation and designing RNA-targeting therapeutics. Current approaches face two primary challenges:

Structural Flexibility: Unlike proteins, RNA molecules exist as dynamic conformational ensembles. Traditional structure-based methods often rely on a single static 3D structure, discarding crucial ensemble information relevant to binding.
Data Scarcity and Generalization: Existing datasets are small (few thousand points), and many models suffer from data leakage or overfitting due to high sequence similarity between training and test sets. Furthermore, structure-based methods require experimentally resolved structures, which are unavailable for the vast majority of protein-RNA pairs.

2. Methodology: ZeroFold

The authors propose ZeroFold, a transformer-based model that predicts binding affinity directly from sequence by leveraging pre-structural embeddings.

Core Concept: Pre-Structural Embeddings

Instead of decoding a single static 3D structure (which collapses the conformational ensemble), ZeroFold extracts intermediate representations from a biomolecular foundation model (Boltz-2) before the structure decoding step.

Rationale: These embeddings implicitly encode the ensemble of possible conformations and their associated binding properties, retaining information lost when a dynamic system is forced into a single conformation.
Input: Boltz-2 is used because it natively supports a unified architecture for proteins, RNA, and DNA, and handles non-natural amino acids/modified nucleotides.

Model Architecture

ZeroFold consists of four main components:

Encoders (RNA & Protein):
- Extracts two types of representations from the final layer of the Boltz-2 trunk:
  - Single representations ( $s$ ): Per-residue/per-nucleotide embeddings (dim: 384).
  - Pair representations ( $z$ ): Pairwise embeddings encoding spatial and evolutionary relationships (dim: 128).
- These pass through transition layers with residual connections. The RNA encoder includes a learned nucleic acid type embedding (RNA vs. DNA) concatenated to the single representation.
Cross-Modal Attention Module:
- Integrates the encoded protein and RNA representations to model interactions across the interface.
Affinity Prediction Head:
- Takes the joint representation and outputs a scalar estimate of binding affinity ( $pK_D$ ).
Training Strategy:
- Sample Weighting: To prevent bias toward highly sampled families (e.g., ribosomal proteins), samples are weighted inversely proportional to the size of their protein and RNA sequence clusters.

3. Key Contributions

A. PRADB Dataset

To address data scarcity, the authors constructed PRADB (Protein-RNA Affinity Database), a curated dataset of 2,621 unique protein-RNA pairs with experimentally measured affinities ( $pK_D$ ).

Sources: Aggregated from four databases: ProNAB, BioLiP2, UTexas Aptamer Database, and PDBbind+.
Preprocessing: Redundant pairs were resolved (preferring PDBbind+ data), and strict clustering was applied to ensure no sequence in the test set shares >40% identity with the training set.

B. Novel Evaluation Protocol

The authors introduced progressively stricter evaluation subsets to ensure fair comparisons with state-of-the-art (SOTA) models:

Baseline: Standard 40% identity threshold.
Strict Subsets: Systematically removed sequences present in competitor training sets (CoPRA and DeePNAP) and applied stricter identity thresholds (70% and 40%) to eliminate data leakage.

C. Computational Efficiency

ZeroFold bypasses the computationally expensive structure decoding step required by structure-based pipelines, passing pre-structural embeddings directly to the affinity head. This enables high-throughput virtual screening.

4. Results

Performance Metrics

On a held-out test set (40% identity threshold), ZeroFold achieved:

Spearman Correlation Coefficient (SCC): 0.65
Pearson Correlation Coefficient (PCC): 0.63
RMSE: 1.47
MAE: 1.14

The authors note that an SCC of ~0.6–0.7 represents the practical upper bound imposed by experimental measurement noise, suggesting ZeroFold has approached the ceiling of current data quality.

Comparative Analysis

vs. CoPRA (Structure-based): CoPRA reported an SCC of 0.59 on its own benchmark (70% identity threshold). Under ZeroFold's stricter 40% threshold, CoPRA's performance dropped significantly (PCC fell from 0.50 to 0.22 as overlap was removed). ZeroFold maintained a stable PCC (~0.6) and outperformed CoPRA even when CoPRA had access to experimentally resolved structures.
vs. DeePNAP (Sequence-based): DeePNAP reported a high correlation (0.92) but suffered when evaluated on stricter subsets due to lack of sequence filtering. ZeroFold consistently outperformed DeePNAP across all filtered subsets, demonstrating superior generalization despite being trained on a smaller dataset.

Affinity Range Analysis

ZeroFold excels at coarse-grained discrimination (distinguishing weak binders from strong binders).
Performance drops for fine-grained ranking within high-affinity bands (SCC ~0.28), indicating limitations in resolving subtle differences between potent binders, likely due to experimental noise.

5. Significance and Conclusion

Paradigm Shift: The study validates that pre-structural embeddings are a superior representation for flexible biomolecules compared to static predicted structures. By capturing ensemble information without committing to a single conformation, ZeroFold overcomes the "flexibility bottleneck" in RNA modeling.
Generalization: ZeroFold demonstrates that high-performance affinity prediction is possible without experimentally resolved structures, making it applicable to the vast majority of protein-RNA pairs lacking structural data.
Future Outlook: The results suggest that future improvements in affinity prediction will depend less on architectural complexity and more on the generation of larger, higher-quality, and more consistently measured experimental datasets to push beyond the current noise ceiling.

In summary, ZeroFold establishes a new state-of-the-art for protein-RNA binding affinity prediction by leveraging intermediate representations from foundation models, offering a robust, structure-free solution for drug discovery and functional genomics.

ZeroFold: Protein-RNA Binding Affinity Predictions from Pre-Structural Embeddings