Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

The paper introduces Hermes, a lightweight transformer model trained exclusively on large-scale, diverse DNA-encoded library (DEL) datasets that successfully generalizes to predict protein-ligand binding across novel targets and chemical scaffolds, demonstrating that unified DEL data can overcome the biases of traditional public affinity datasets for effective virtual screening.

Original authors: Maxwell Kleinsasser, Brayden J. Halverson, Edward Kraft, Sean Francis-Lyon, Sarah E. Hugo, Mackenzie R. Roman, Ben Miller, Andrew D. Blevins, Ian K. Quigley

Published 2026-02-17
📖 5 min read🧠 Deep dive

Original authors: Maxwell Kleinsasser, Brayden J. Halverson, Edward Kraft, Sean Francis-Lyon, Sarah E. Hugo, Mackenzie R. Roman, Ben Miller, Andrew D. Blevins, Ian K. Quigley

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to teach a robot how to find the perfect key for a specific lock. In the world of medicine, the "lock" is a protein inside your body (often a culprit in diseases), and the "key" is a drug molecule. If the key fits the lock perfectly, it can stop the disease. This is called Protein-Ligand Binding.

For a long time, teaching robots to do this has been like trying to learn a language by reading thousands of different dictionaries written by different people, in different languages, with different spellings. It's messy, inconsistent, and full of errors.

Enter Hermes, a new AI model created by a team at Leash Biosciences. Here is the story of how Hermes learned to be a master key-finder, explained simply.

1. The Problem: The "Noisy Library"

Traditionally, scientists taught AI models using data from public databases. These databases are like a giant library where millions of people have dropped off notes about which keys fit which locks.

  • The Issue: One scientist wrote a note in 1990 using a specific test; another wrote one in 2020 using a totally different test. The notes are inconsistent, biased, and often wrong.
  • The Result: AI models trained on this "noisy library" struggle to generalize. They memorize the specific notes but fail when they see a new lock or a new type of key.

2. The Solution: The "Massive, Unified Factory"

The authors decided to stop reading the messy library and instead built their own massive, perfectly organized factory. This factory uses a technology called DEL (DNA-Encoded Libraries).

  • How it works: Imagine you have a library of 6.5 million tiny keys (chemical compounds). Each key has a unique DNA barcode attached to it, like a serial number.
  • The Test: You dump all 6.5 million keys into a pool with a specific protein lock. You wash away everything that doesn't stick. You then scan the DNA barcodes of the keys that did stick.
  • The Advantage: Because this is done in one giant, automated experiment with the same rules every time, the data is clean, consistent, and huge. It's like having a factory that tests billions of keys against hundreds of different locks in a single day, all following the exact same instruction manual.

3. The Star: Hermes

The team built Hermes, a lightweight AI brain trained only on the data from this DEL factory.

  • What makes it special? Hermes has never seen a "traditional" lab report. It has never been told "this key fits with a strength of 5 stars." It only knows "this key stuck, and that one didn't."
  • The Magic: Despite never seeing the complex, messy data of the real world, Hermes learned the fundamental rules of how keys and locks interact. It learned the "shape" of a good fit so well that it can apply those rules to locks it has never seen before.

4. The Test: Can it handle the real world?

To see if Hermes was truly smart or just memorizing the factory, they put it through three tough exams:

  1. The "New Lock" Test: They gave Hermes locks (proteins) it had never seen in the factory.
    • Result: Hermes did great. It figured out the new locks because it understood the underlying logic, not just the specific examples.
  2. The "New Key" Test: They gave Hermes keys made of materials (chemical structures) it had never seen.
    • Result: It handled these well too, showing it learned general principles, not just specific shapes.
  3. The "Real World" Test: They tested it against data from other labs (the messy library).
    • Result: Hermes performed surprisingly well, proving that the clean data from the factory taught it skills that transferred to the messy real world.

5. The Superpower: Speed

There is another reason Hermes is a game-changer.

  • The Competitor (Boltz-2): Imagine a super-smart detective who solves a case by building a 3D model of the crime scene, analyzing every angle, and simulating physics. It's incredibly accurate, but it takes hours to solve one case.
  • Hermes: Imagine a detective who has seen millions of cases and instantly recognizes the pattern. It doesn't build a 3D model; it just looks at the description of the key and the lock.
  • The Result: Hermes is 500 to 700 times faster than the super-detective. It can screen billions of potential drug keys in the time it takes the other model to check a few thousand.

The Big Picture

This paper tells us that we don't need to rely on messy, inconsistent historical data to train AI for drug discovery. By using clean, massive, and consistent data (like the DEL factory), we can train smaller, faster AI models that are actually better at generalizing to new problems.

In a nutshell: Hermes is a fast, efficient AI that learned to find the right drug keys by watching a massive, perfectly organized factory, and it can now find those keys for diseases it has never even met before. This could speed up the discovery of life-saving medicines by years.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →