KyDab - a comprehensive database of antibody discovery selection campaigns.

KyDab is a comprehensive, publicly accessible database that curates over 120,000 paired antibody sequences and full-funnel selection data from standardized Kymouse immunization studies to support the development and evaluation of artificial intelligence models for antibody discovery.

Zhou, Q., Chomicz, D., Melvin, D., Griffiths, M., Yahiya, S., Reece, S., Le Pannerer, M.-M., Krawczyk, K.

Published 2026-03-27
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to find the perfect key to unlock a specific door (a disease). In the world of medicine, these "keys" are antibodies, and finding the right one usually involves a massive, expensive, and time-consuming treasure hunt.

For a long time, the "maps" we had for this treasure hunt were incomplete. They only showed us the keys that already worked and made it to the final lock. They didn't show us the thousands of keys that were too big, too small, or just didn't fit at all. Without seeing the failures, it's very hard for computers (Artificial Intelligence) to learn how to design the perfect key from scratch.

Enter KyDab: The "Full-Story" Map.

This paper introduces KyDab (Kymouse Antibody Database), a new, open-source library that changes the game. Here is how it works, broken down simply:

1. The Lab Mouse "Factory"

The researchers used a special type of mouse called a Kymouse. Think of these mice as tiny, biological 3D printers. When you give them a piece of a virus or bacteria (an immunogen), their immune systems automatically start printing millions of different antibody "keys" to fight it. These mice are engineered so that the keys they print are already human-friendly, making them perfect for making medicines.

2. The "Full-Funnel" Collection

Usually, when scientists do this, they test thousands of keys, throw away the bad ones, and only publish the few winners. It's like showing a movie but only releasing the final 5 minutes where the hero wins.

KyDab is different. It releases the entire movie, from the opening scene to the credits.

  • The Good: It includes the winning keys (antibodies that bind well).
  • The Bad: It includes the losing keys (antibodies that didn't work).
  • The Data: It has over 120,000 pairs of antibody sequences and details on how they were tested.

3. Why This Matters for AI

Imagine you are teaching a robot to bake the perfect cake.

  • Old Way: You only show the robot pictures of cakes that turned out perfectly. The robot tries to copy them but doesn't understand why a cake might fail (e.g., "Oh, I forgot the eggs, that's why it's flat").
  • KyDab Way: You show the robot pictures of perfect cakes and burnt cakes, flat cakes, and cakes that fell apart. You tell the robot exactly what ingredients were used for each.

Because KyDab includes the "failures" (negative data) and the "successes" (positive data) in a consistent, organized way, AI models can finally learn the real rules of antibody design. They can learn to predict which keys will fit before a human even picks up a test tube.

4. The "Library" Analogy

Think of previous databases as a library that only has the Bestsellers section. Everyone knows what's in there, but it doesn't help you write a new story.

KyDab is like a library that includes:

  • The Bestsellers.
  • The manuscripts that were rejected by editors.
  • The drafts with typos.
  • The notes on why the editor rejected them.

This gives writers (scientists and AI) a much deeper understanding of what makes a story (or an antibody) successful.

The Bottom Line

The authors of this paper are essentially saying: "We built a massive, organized database of our entire antibody discovery process, including the mistakes. We are giving it away for free so that Artificial Intelligence can learn faster, cheaper, and better."

This could lead to new medicines for cancer, viruses, and other diseases being discovered much faster than ever before, because the computers will finally have the "full story" to learn from.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →