PoseBusters: AI-based docking methods fail to generate… — Plain-Language Explanation

Original authors: Martin Buttenschoen, Garrett M. Morris, Charlotte M. Deane

Published 2026-06-09

📖 5 min read🧠 Deep dive

Original authors: Martin Buttenschoen, Garrett M. Morris, Charlotte M. Deane

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to find the perfect key to fit into a very specific, complex lock. In the world of drug discovery, the "lock" is a protein in your body, and the "key" is a potential medicine (a molecule). The process of figuring out exactly how that key fits into the lock is called docking.

For years, scientists have used traditional, rule-based computer programs to do this. Recently, a new wave of "AI" programs (Deep Learning) has arrived, promising to do the job faster and better. These AI models are like brilliant students who have memorized millions of examples of keys and locks.

However, a new study called PoseBusters suggests that while these AI students are very good at memorizing the shape of the key, they are terrible at understanding the physics of how it actually works.

Here is a simple breakdown of what the paper found:

1. The "RMSD" Trap: Looking Good on Paper

Scientists usually judge how well a docking program works by measuring RMSD. Think of RMSD as a ruler. If the AI predicts where the key goes, and that prediction is within 2 millimeters (Angstroms) of where the key actually sits in a real-life photo (a crystal structure), the AI gets a passing grade.

The paper found that many AI programs get high scores on this ruler test. They say, "Look! We are 90% accurate!"

2. The Reality Check: The "Impossible" Key

The problem is that these AI programs are so focused on matching the ruler measurement that they sometimes create physically impossible keys.

Imagine the AI predicts a key that:

Has a bond (a connection between atoms) that is stretched so thin it would snap like a dry twig.
Has a ring shape that is twisted into a pretzel, even though chemistry says it should be flat like a pancake.
Has two parts of the key crashing into each other like two cars trying to drive through the same door at the same time.

The paper calls these "physically implausible." It's like the AI drew a picture of a key that looks right from a distance, but if you tried to build it, it would fall apart or break the lock.

3. Enter PoseBusters: The Inspector

To catch these bad predictions, the authors built a tool called PoseBusters. Think of PoseBusters as a strict building inspector or a quality control manager.

Instead of just measuring the ruler (RMSD), PoseBusters checks the "laws of physics" for every prediction:

Chemical Validity: Does the molecule make sense chemically? (e.g., Is the charge correct? Are the atoms connected properly?)
Geometry: Are the rings flat? Are the bonds the right length?
Clashes: Did the key crash into the lock or other parts of the machine?

If a prediction fails these checks, it is marked as "invalid," no matter how good the ruler measurement was.

4. The Big Reveal: Old vs. New

The researchers tested five new AI docking methods against two older, traditional methods (AutoDock Vina and Gold).

On familiar locks (Training Data): When the AI was tested on locks it had seen before during its training, it looked amazing on the ruler test. One AI (DiffDock) seemed to beat the old methods.
The "Physics" Filter: But when PoseBusters checked the physics, the AI's performance dropped drastically. Many of its "perfect" predictions were actually impossible structures. The old, traditional methods, while slightly slower, produced keys that were both accurate and physically possible.
On new, unknown locks (Generalization): When the researchers tested the AI on completely new locks it had never seen (a "Benchmark Set"), the AI struggled badly. It couldn't generalize. The old methods, which rely on physical rules rather than just pattern memorization, handled these new locks much better.

5. The "Tweak" Doesn't Fix Everything

The authors tried to help the AI by adding a "polishing" step after the prediction, using a physics engine (called a force field) to smooth out the weird shapes.

The Result: This helped the AI fix some of its broken keys, but it didn't make them better than the old traditional methods. The old methods were already starting with a solid foundation; the AI had to try to fix a broken foundation.

The Bottom Line

The paper concludes that AI-based docking methods are not yet ready to replace traditional tools.

While they are fast and can guess the right location, they often ignore the basic laws of chemistry and physics. To be truly "state-of-the-art," a method needs to pass two tests:

The Ruler Test: Is it in the right spot?
The Physics Test: Is it a real, buildable object?

Currently, the traditional methods pass both. The AI methods pass the first but often fail the second. The authors hope that by using their "PoseBusters" tool, developers can fix these AI models to understand physics better, leading to truly accurate drug predictions in the future.

PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences