ShallowBench: Benchmarking Generative Drug Design Models on Shallow-Pocket Targets

This paper introduces ShallowBench, a curated benchmark of 5,780 shallow-pocket targets, to evaluate and expose the limitations of current generative AI models in designing ligands for historically "undruggable" low-concavity interfaces like KRAS and MYC.

Original authors: Saket Reddy, Shiwei Liu

Published 2026-06-08
📖 4 min read☕ Coffee break read

Original authors: Saket Reddy, Shiwei Liu

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a master architect trying to design a custom key that fits perfectly into a specific lock. For decades, the best architects (AI models) have been trained to build keys for deep, cave-like locks. These caves are easy to work with because the walls surround the key on all sides, giving the architect clear boundaries to follow. The AI learns to "snuggle" the key into these deep holes, creating a tight, secure fit.

However, in the real world of medicine, many of the most dangerous "locks" (disease targets like KRAS and MYC) aren't deep caves at all. They are flat, open surfaces, like a tabletop or a smooth wall. These are the "undruggable" targets that have historically been impossible to treat.

This paper introduces a new testing ground called ShallowBench to see how well our AI architects can design keys for these flat surfaces.

The Problem: The "Flat Surface" Struggle

The authors found that current AI models are like architects who have only ever built keys for caves. When you ask them to design a key for a flat table:

  1. They get lost: Without deep walls to guide them, the AI doesn't know where to place the key. The key might just "float" in the air above the table instead of sticking to it.
  2. They make mistakes: The AI struggles to hold the key together properly, sometimes creating shapes that don't make chemical sense.
  3. They lose their grip: Even when they try, the key doesn't stick as well as it does in a cave.

How They Built the Test (ShallowBench)

To prove this, the researchers needed a fair test. They couldn't just use the old datasets because those were full of deep caves. So, they created a new dataset called ShallowBench from a massive library of 166,500 protein structures.

They used a clever "volume measurement" trick to find the flat ones:

  • Imagine placing a clear, domed lid over a protein surface.
  • They calculated the space inside the lid versus the space taken up by the protein atoms themselves.
  • If the difference (the "empty space" under the lid) was small, it meant the surface was flat and shallow.
  • They filtered out the deep caves and kept 5,780 flat targets that still had enough surface area to hold a drug.

They then split this into a "training" set and a "testing" set, making sure the AI couldn't cheat by memorizing similar proteins.

The Results: The AI Stumbles

The researchers tested three top-tier AI models on this new flat-surface test. Here is what happened:

  • The "Cave" Models Failed: Every single model performed worse on the flat surfaces than on the deep caves. Their predicted ability to "stick" to the target dropped significantly.
  • The "Floaty" Problem: One model (TargetDiff) tried to hug the flat surface but ended up making chemically broken keys (molecules that wouldn't work in real life). It was so desperate to fit the shape that it forgot the rules of chemistry.
  • The "Valid but Loose" Problem: Another model (DiffSBDD) made perfect, chemically valid keys, but they were so loose and unshaped that they didn't fit the flat surface at all. It was like making a perfect key but putting it on the wrong side of the table.
  • The "Scoring" Model: A third model (SimpleSBDD) did the best at sticking, but it wasn't really "designing" new keys from scratch; it was just picking existing ones from a library that happened to fit okay.

The Takeaway

The paper concludes that while AI is amazing at designing drugs for deep, cave-like pockets, it is currently blind to flat surfaces.

The authors suggest that to fix this, we can't just keep training the same way. We need to:

  • Teach the AI differently: Show it more examples of flat surfaces during training.
  • Change the rules: Create new "loss functions" (rules the AI tries to minimize) that punish it for letting the key "float" away from the flat surface.
  • Build new tools: Maybe the AI needs to learn to look at the whole protein landscape, not just the immediate hole, to understand how to anchor a drug to a flat wall.

In short: Our drug-design AI is a great cave explorer, but it's currently terrible at building on flat ground. ShallowBench is the map that shows us exactly where it's failing, so we can build better tools to tackle the "undruggable" diseases.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →