A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

This paper demonstrates that the Boltz2 co-folding model, which leverages protein-ligand interactions for pretraining, generates superior and complementary small-molecule representations that outperform existing standalone models across diverse tasks including ADMET prediction, generative modeling, and structure-guided optimization.

Original authors: Hyosoon Jang, Hyunjin Seo, Honghui Kim, Seonghyun Park, Taewon Kim, Yunhui Jang, Sungsoo Ahn

Published 2026-05-25
📖 4 min read☕ Coffee break read

Original authors: Hyosoon Jang, Hyunjin Seo, Honghui Kim, Seonghyun Park, Taewon Kim, Yunhui Jang, Sungsoo Ahn

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to teach a robot how to understand the shape and behavior of tiny chemical building blocks (small molecules) to help design new medicines.

Usually, scientists teach these robots by showing them millions of isolated molecules, like looking at a single Lego brick in a vacuum and asking, "What does this do?" The robot learns the brick's shape, but it never sees how that brick actually snaps together with other pieces.

This paper introduces a new, smarter way to teach the robot. Instead of looking at the brick alone, they show the robot the brick while it is already snapped into a complex machine (a protein). This is called "co-folding."

Here is the breakdown of their discovery using simple analogies:

1. The Problem: Learning in a Vacuum

Most current AI models for chemistry are like students who only study flashcards of single words. They know what the word looks like, but they haven't seen it used in a sentence. They miss out on the context of how words interact with each other. In chemistry, this means the AI misses how a drug molecule interacts with the protein it's supposed to target.

2. The Solution: The "Co-Folding" Teacher

The researchers used a powerful new AI model called Boltz2. Originally, Boltz2 was trained to predict how a protein and a drug molecule fold together into a 3D shape. It's like a master architect who knows exactly how every brick fits into a wall.

The big question was: Can we take this architect, who is an expert at building walls, and use it just to understand the bricks themselves?

3. The Experiment: Reusing the Architect

The team took the "brain" of Boltz2 (its internal representations) and applied it to tasks where the protein isn't present. They treated Boltz2 as a general-purpose teacher for small molecules.

They tested this "teacher" in three main ways:

  • The Exam (ADMET Prediction): They asked Boltz2 to predict if a drug would be safe or toxic (absorption, metabolism, etc.).
    • Result: Boltz2 scored just as well, or better, than models specifically trained for these exams, even though it was never explicitly taught for them. It learned the rules of chemistry just by watching how molecules interact with proteins.
  • The Creative Writing Class (Molecular Generation): They tried to teach a robot to invent new molecules. Usually, this is slow and trial-and-error.
    • Result: By using Boltz2's "understanding" as a guide, the robot learned to create valid, high-quality molecules twice as fast. It was like giving the robot a map of the city instead of letting it wander blindly.
  • The Treasure Hunt (Ligand Optimization): They asked the robot to find the perfect key (molecule) to fit a specific lock (protein).
    • Result: Instead of just getting a "Yes/No" score on how well the key fits, the robot used Boltz2's detailed "feelings" about the fit (intermediate representations) to learn much faster. It found better keys with fewer attempts.

4. The Secret Sauce: A New Perspective

The paper found that Boltz2 doesn't just memorize facts; it learns a different kind of map of the chemical world.

  • Existing models are like a map of a city's streets (molecular structure).
  • Boltz2 is like a map of the city's traffic and how cars interact with intersections (molecular interactions).

Because these maps are different, combining them (like using Boltz2 alongside an older model) creates a super-map that is better than either one alone.

5. The Conclusion

The paper concludes that watching molecules interact with proteins is a powerful way to teach AI about molecules.

They proved that you don't need to train a model from scratch on millions of isolated molecules to get a great result. You can take a model trained on complex protein-drug interactions, strip away the protein, and use its "knowledge" of the drug to solve problems on its own.

In short: They showed that an AI trained to build complex structures (protein-drug complexes) is actually a brilliant teacher for understanding the individual parts (small molecules) better than models that only study the parts in isolation. This makes Boltz2 a ready-to-use, "off-the-shelf" tool for drug discovery.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →