Unavailability of experimental 3D structural data on… — Plain-Language Explanation

Original authors: Aydin Wells, Khalique Newaz, Jennifer Morones, Jianlin Cheng, Tijana Milenković

Published 2026-04-09

📖 5 min read🧠 Deep dive

Original authors: Aydin Wells, Khalique Newaz, Jennifer Morones, Jianlin Cheng, Tijana Milenković

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine a protein as a long, tangled string of beads (amino acids). To do its job in your body—like digesting food or fighting a virus—this string needs to twist and fold into a very specific, complex 3D shape, like a origami crane. This process is called protein folding.

For a long time, scientists have been great at figuring out what the final origami crane looks like. They have massive libraries (databases) of these finished shapes. But there's a big problem: we don't know what the string looks like while it's being folded.

Think of it like this: If you only have a photo of the finished crane, you can guess the final shape. But if you want to understand how the paper folds, or why it sometimes crumples into a messy ball (which causes diseases like Alzheimer's), you need to see the string at every step of the folding process. These steps are called intermediates.

The Big Problem: We Can't "See" the Steps

The authors of this paper went on a treasure hunt to find photos of these folding steps. They looked for two types of folding:

Post-translational (The "Free Fall"): The whole string is made, then it tries to fold itself in a test tube.
Co-translational (The "Assembly Line"): The string is being built bead-by-bead by a machine (the ribosome) inside your cells, and it starts folding while it's still being made.

The Hunt Results:

For the "Free Fall" type: They found only two studies with actual photos of the folding steps. It's like trying to learn how to bake a cake by looking at only two blurry snapshots of the batter rising.
For the "Assembly Line" type: They found only four studies. Even worse, these studies only looked at very short strings (small proteins).

Why is this hard?
Taking a photo of a folding protein is incredibly difficult. It's like trying to take a high-definition photo of a hummingbird's wings mid-flap with a camera that has a slow shutter speed. The protein moves too fast, and the "camera" (scientific tools like X-ray or MRI) usually only captures the final, stable pose.

The "Magic Crystal Ball" That Failed

Because we lack photos of the folding steps, scientists hoped that AlphaFold2 (a famous AI that predicts protein shapes) could act as a crystal ball. They thought: "If AlphaFold2 is so good at guessing the final shape, maybe it can guess the intermediate shapes too!"

The authors tested this. They fed the AI the sequence of a protein during its folding process and asked, "What does it look like right now?"

The Result: The AI failed miserably.

The Analogy: Imagine you show a master architect a half-built house and ask, "What does the house look like right now?" The AI, trained only on finished houses, keeps drawing the finished house, ignoring the scaffolding and the half-built walls. It tries to force the intermediate shape into a final shape.
The Data: When the AI tried to predict the folding steps, it was often no better than random guessing. It couldn't capture the messy, dynamic reality of a protein in motion.

The "Proxy" Solution (The Best We Can Do for Now)

Since we can't take photos of the steps and the AI can't guess them, the authors looked at a clever workaround they developed in previous work.

Instead of trying to predict the dynamic steps, they took the final finished shape and simply chopped off the end of the string, then chopped off a bit more, and so on.

The Analogy: Imagine you have a finished origami crane. To guess what it looked like halfway through, you just unfold the last few folds. You aren't seeing the real folding process (which might have involved the paper crumpling differently), but it's a "proxy" (a stand-in) that gives you a rough idea of the progression.
The Finding: Surprisingly, these "proxy" steps were just as good (or just as bad) as the AI's guesses. This proves that current tools are fundamentally missing the "physics" of how proteins move and change over time.

Why Should You Care?

Understanding these folding steps isn't just an academic puzzle.

Disease: Many diseases happen because proteins fold incorrectly (misfold) and clump together. If we knew the "steps" where things go wrong, we could design drugs to stop the bad folding before it happens.
New Tools Needed: The paper concludes that we need a new generation of tools. We can't just use tools designed for static, finished shapes. We need "video cameras" (new experimental tech) and "motion-simulators" (new AI models) that understand the dynamics of folding.

The Bottom Line

We have a library of millions of finished protein shapes, but we are blind to the journey they take to get there. Current AI, no matter how smart, is like a photographer who only knows how to take pictures of the destination, not the journey. To cure diseases and understand life at a molecular level, we need to develop new ways to watch the movie of protein folding, not just look at the final frame.

Unavailability of experimental 3D structural data on protein folding dynamics and necessity for a new generation of structure prediction methods in this context

The Big Problem: We Can't "See" the Steps

The "Magic Crystal Ball" That Failed

The "Proxy" Solution (The Best We Can Do for Now)

Why Should You Care?

The Bottom Line

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

A. Data Scarcity

B. AlphaFold2 Performance on Intermediates

C. Comparison: AlphaFold2 vs. "Proxy" Intermediates

D. Existing Computational Approaches

5. Significance and Future Directions

Unavailability of experimental 3D structural data on protein folding dynamics and necessity for a new generation of structure prediction methods in this context

The Big Problem: We Can't "See" the Steps

The "Magic Crystal Ball" That Failed

The "Proxy" Solution (The Best We Can Do for Now)

Why Should You Care?

The Bottom Line

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

A. Data Scarcity

B. AlphaFold2 Performance on Intermediates

C. Comparison: AlphaFold2 vs. "Proxy" Intermediates

D. Existing Computational Approaches

5. Significance and Future Directions

More like this