How to make the most of your masked language model for protein engineering

This paper introduces a flexible stochastic beam search sampling method for masked language models that optimizes protein properties by evaluating entire-sequence neighborhoods, demonstrating through extensive in silico and in vitro antibody engineering experiments that the choice of sampling strategy is at least as critical as the model itself.

Calvin McCarter, Nick Bhattacharya, Sebastian W. Ober, Hunter Elliott

Published Thu, 12 Ma
📖 5 min read🧠 Deep dive

Imagine you are a master chef trying to create the perfect new recipe for a dish that cures a specific disease. You have a "Master Cookbook" (a massive AI trained on millions of existing recipes) and a "Base Recipe" (a starting antibody that works okay but needs tweaking).

Your goal is to make small changes to the Base Recipe to make it stronger, safer, and more effective. However, there are two big problems:

  1. The Recipe Book is huge: There are billions of possible ways to change the ingredients.
  2. Tasting is expensive: You can't just cook every single variation and taste it. You have to send a tiny batch to a real lab to test, which takes weeks and costs a fortune.

This paper is about how to use a computer AI to pick the best variations to send to the lab, so you don't waste time on bad ideas.

Here is the breakdown of their discovery, explained simply:

1. The Old Way: "The Blind Guessing Game"

Previously, scientists used a method called Gibbs Sampling. Imagine you are editing a sentence one word at a time.

  • You change word #1, see if it sounds good.
  • Then you change word #2, see if it sounds good.
  • Then word #3...

The problem is that this is like trying to find the best route through a maze by only looking at the next step. You might get stuck in a dead end, or you might miss a shortcut because you were too focused on the immediate next word. Also, doing this one word at a time is slow and computationally heavy.

2. The New Way: "The Whole-Page Scan" (Stochastic Beam Search)

The authors propose a smarter way: Stochastic Beam Search.

Instead of changing one word at a time and waiting, imagine you take the whole page of the recipe and ask the AI: "If I swap out these 5 ingredients at once, how good does the entire dish look?"

  • The Magic Trick: The AI is incredibly fast at calculating how "good" a full sentence (or protein sequence) is, even if it's just a tiny change away from the original.
  • The Beam: Instead of looking at just one path, the AI looks at a "beam" of 5 or 20 different variations simultaneously. It keeps the best ones and discards the bad ones, then expands those winners into new variations.
  • The "Stochastic" Part: To make sure they don't all end up with the exact same boring recipe, they add a little bit of "random noise" (like shaking the spice jar). This ensures they get a diverse set of candidates, not just clones of each other.

Analogy: Think of it like a search party looking for a hidden treasure.

  • Old Way: One person walks forward one step, checks the ground, turns, walks another step. Very slow.
  • New Way: A helicopter flies over a wide area, spots 20 promising spots, and sends a team to check all of them at once.

3. The "Taste Test" (Guidance)

Sometimes, you don't just want a tasty dish; you want it to be healthy (low calories) and cheap to make. In the paper, these are "scoring functions" (like checking if the antibody is stable or if it might cause an allergic reaction).

The new method is flexible. It can say: "Okay, keep the AI's favorite recipes, but if a recipe is too 'spicy' (risky), down-rank it. If it's 'cheap' (easy to make), boost it."

They found that using this "taste test" guidance was just as important as the AI model itself. Even a slightly weaker AI model, if guided well, can beat a super-powerful AI model that is just guessing blindly.

4. The Big Surprise

The researchers tested this on real antibodies in a real lab (not just on a computer).

  • Result: The method they invented (Stochastic Beam Search) produced significantly more successful antibodies than the old methods.
  • The "ESM-2" Surprise: They used a model trained on all proteins (like a general chef who knows everything about food) and it worked surprisingly well for antibodies (a specific type of dish), even though they had models trained only on antibodies.

The Takeaway for Everyone

If you are trying to design a new drug or protein:

  1. Don't just tweak one thing at a time. Look at the whole picture.
  2. Use a "Beam" approach. Explore many options at once, not just one path.
  3. Add a "Guide." Don't just trust the AI's gut feeling; tell it what specific goals you have (safety, cost, stability) and let it balance those goals.

In short: They turned the process of drug discovery from "shooting in the dark one bullet at a time" into "firing a guided missile swarm that hits the target every time."