- Plenty of Room
- Posts
- AI-Powered Protein Dynamics: Decoding Hidden Molecular Movements!
AI-Powered Protein Dynamics: Decoding Hidden Molecular Movements!
Using AI to decode the hidden movements behind protein behavior
Proteins are dynamic machines. But sometimes, it’s hard to see these changes. MD and experiments are limited; is machine learning going to help? And how are emus (yes, the Australian bird) involved? Only one way to find out!
Share this issue, it helps us grow!
Was this email forwarded to you? Subscribe here!
If you’re navigating the biotech or life sciences space, don’t do it blind.
Fabio D’Agostino has condensed 15 years of experience as a scientist and advisor to VCs and startups into the Bio-Innovation Growth Series. This is a strategic resource for anyone entering complex areas like life science tools or advanced therapies.
It helps you map key players, spot hidden opportunities, and make better strategic decisions, without wasting months figuring it all out alone!
👉 Grab it here* using the code Marco10S. And the first 30 to sign up using the code get a £30 Amazon voucher!
*Affiliate link. If you make a purchase, I may receive a small commission at no extra cost to you!
Mapping Protein Motion

Scientists developed BioEmu, a biological emulator that rapidly creates protein conformational distributions. Image credit: Science.
Proteins Do the Work: But How?
Proteins are the functional engines of life. They catalyze reactions, transmit signals, and form the structures of your cells! No surprise, then, that they play a central role in drug development and biotech.
Thanks to next-generation sequencing and tools like AlphaFold, it’s easy to access a protein’s sequence and structure. But we still have trouble exploring the most important piece: protein function.
Many proteins are dynamic, shifting between states: open-closed, folded-partially unfolded, bound-unbound. And these small and rare changes define the function of many proteins, like actin in your muscles or many kinases.
So, mapping these rare, fast, small changes is important! But it’s also hard with our current technologies:
Single molecule experiments and cryo-EM: Powerful and precise! But expensive and time-consuming.
Molecular dynamics (MD) simulations: In theory, ideal to map the conformational landscapes, but they are limited by force-field accuracy and the computational cost of running long simulations.
Machine Learning to the Rescue?
Machine learning could be an alternative to traditional methods. There are two routes:
Machine-learned MD: 2-3 times lower computational costs than traditional MD, but they are still under development.
Emulators: Instead of simulating every atom, these models try to learn the distribution of protein structures directly.
Some early models worked for small molecules, but they don’t scale to proteins, while others can predict alternative protein foldings, but they don’t create real distributions. They are not there yet.
So, can we develop a system that’s fast, accurate, and cheap, that can predict conformational states and their probabilities? Is it even possible?
Bringing the (Bio)Emu
Enter today’s paper. The authors developed BioEmu, short for biological emulator. This generative deep learning system is designed to emulate protein conformational states with MD-level precision, all at a tiny fraction of the computational cost!
But how did they pull it off? Let’s find out the secret sauce.
BioEmu Model and Training
Let’s look at the model architecture and how they trained it. It’s pretty interesting, but a bit technical (well, at least for me!).
TL;DR: BioEmu doesn’t simulate the atoms in the proteins. Instead, it generates thousands of realistic protein structures to mimic how the proteins change shape. BioEmu learnt from existing protein structures, simulations, and experimental data to predict how proteins look, move, fold, and change, without worrying about the underlying physics.
Now it’s time for the details!
Model Architecture
The first step is turning protein sequences into encodings that the model can work with using the AlphaFold2 “evoformer”.
These representations are fed into a diffusion-based generative model, which turns them into realistic structures, with only 30-50 steps.
In this way, 10,000 independent protein structures can be sampled in just minutes to hours on a single GPU! Insane.
Multi-stage Training
The authors integrated different complementary data sources:
BioEmu was pre-trained on a subset of the AlphaFold dataset, with tweaks to encourage conformational diversity.
The model was fine-tuned on over 200 ms of MD simulations (that’s a lot!), across thousands of small-to-medium proteins.
Finally, BioEmu was fine-tuned again on 500,000 experimental stability measurements.
And voilà, the model is ready to predict how proteins look and behave! How did it perform?
Capturing Functional Conformations
The first test for BioEmu was to predict known biologically relevant conformational changes. The team used 4 sets, each testing for a different property:
Sequence generalization: Testing the ability of the model to predict features it hasn’t seen before. BioEmu outperforms previous models in capturing multiple states.
Domain motions: Proteins that undergo large-scale conformational changes after binding. It recovered both open and closed states with high precision in 84% of cases.
Local unfolding: Some proteins have chains that unfold or detach from the main structure as part of a signalling pathway. The model predicted correct folded and unfolded states in 70-80% of examples
Cryptic pocket formation: The formation of transient pockets that can form to bind a small molecule. BioEmu correctly predicts pockets for the bound state in 86% of the cases, but it’s only able to generate correct structures for the non-bound state in 56% of the proteins. This suggests a bias in the training data!
So, it worked well! But the team was not done.
Emulating MD Distributions
MD simulations are powerful, but they can’t handle rare events. Could we use BioEmu as a shortcut?
Fast-Folder Cross Validation
The team used 12 fast-folding proteins from supercomputer-scale MD runs.
The team fine-tuned the model on 11 of the proteins, then tested it on the 12th left out. The results were amazing! BioEmu emulated the free-energy landscapes very closely, using between 1 GPU-minute to 1 GPU-hour.
An acceleration of 10,000 times compared to MD simulations!
CATH Domain
CATH domains are common protein building blocks. BioEmu reproduced native states and alternative substates, matching secondary structures to simulated structures.
The team also tested how the model scaled with data. They trained it with 1%, 10%, and 100% of the CATH domains, and they saw the model’s performance improve. This implies that more MD data could improve the model further!
Larger and Disordered Systems
Large and disordered proteins are a nightmare for MD because of computational constraints. Will BioEmu work?
Complexin II: A protein involved in neurotransmission. BioEMu captured its flexibility and key helices.
Tetraspanin CD9: BioEmu predicted known open/closed states in the extracellular loops, matching MD.
Predicting Protein Stability
Finally, BioEmu also learned to predict protein stability, which is crucial for protein applications. BioEmu accurately matched experimental stability across thousands of mutants.
And, unlike black-box predictors, you can see why a mutation destabilizes a protein!
Fast, Low Cost, and Precise: Where Is The Catch?
Okay, that was a lot, I know! But BioEmu is really cool. To summarize:
It’s a generative machine learning system to approximately sample the conformational distribution of proteins.
Through that, it explores two key aspects of protein functions: protein conformations and stability.
The high upfront cost of MD/experimental data is repaid by the rapid results from BioEmu after the model is trained, which are orders of magnitude faster and cheaper.
It’s complementary, not competitive, with MD: BioEmu can be used to generate initial guesses, which MD can simulate and refine.
It’s a powerful model! I loved the focus on efficiency and not having overlapping or similar sequences in training and test sets (sometimes this is forgotten).
But BioEmu is not without limitations:
Everything is generated based on data, so the model doesn’t understand what’s happening, like a physics simulation would.
It doesn’t generalize to other systems, or even any other temperature than the training one of 300K.
There is no confidence metric, so you can’t know how reliable the model is, aside from comparing it to MD or experiments.
But I guess these are all great places for the model to improve!
This was a cool paper. A bit complicated for me, but I’m excited for the fusion of MD and machine learning. I have done some simulations (oxDNA for the win), and I think there is space for innovation in this area. And if you want to learn more, go and read this paper here!
If you made it this far, thank you! Do you have experience with simulations? Do you think they could help lab work more? Reply and let me know!
P.S: Know someone interested in machine learning? Share this with them!
More Room:
Tracking DNA Origami With Cells: Ah, I love some good old DNA origami + cells. This study tracks how ligand-functionalized DNA origami nanorods interact with cancer cells in real time. Using single particle tracking, the researchers showed that nanorods with EGFR-targeting ligands selectively bind to target cells, while non-functionalized ones do not. They also found that ligand density affects binding behavior, especially for aptamers. Useful findings!
More DNA Origami and Cells? Oh yeah! This review highlights recent advances in using DNA-based strategies to engineer cell receptors for better control of cellular signaling. It covers both genetic methods and non-genetic approaches using DNA nanostructures and functional nucleic acids to modulate receptor behavior. DNA origami offers precision for controlling receptor clustering, while dynamic DNA devices respond to stimuli. Emerging DNA logic circuits and nanorobots enable programmable control of signaling. Let’s stay updated!
Is BioElectronics Coming? Maybe. I hope so, because it’s cool! In the meantime, this review explores the use of biological templates (DNA, RNA, viruses, and proteins) for fabricating nanoelectronic devices. DNA is highlighted for its programmability and precision, enabling the assembly of complex nanoscale structures. Other biomolecules, such as viruses and lipids, offer biocompatibility and self-repair potential. While promising, challenges like scalability and stability remain. I’ll be reading this!
Share Plenty of Room with founders or builders
I help biotech and deep tech companies transform complex technologies into engaging content that builds credibility with investors, partners, and potential hires. Let’s chat!
Know someone who’d love this?
Pass it on! Sharing is the easiest way to support the newsletter and spark new ideas in your circle.Got a tip, paper, or topic you want me to cover?
I’d love to hear from you! Just reply to this email or reach out on social.