- Plenty of Room
- Posts
- AI-Designed Enzyme Engineering: Machine Learning Catalyzes Biotech Breakthroughs!
AI-Designed Enzyme Engineering: Machine Learning Catalyzes Biotech Breakthroughs!
How catalytic motifs and protein backbones can be assembled into functional enzymes from scratch
Enzymes are the catalytic heart of biology, and a fundamental tool in biotech. Yet, we suck at producing new ones!
Can machine learning shorten the time between you and your dreamed enzyme? Apparently, yes!
If you’ve ever gotten any value from this newsletter, it’d mean the world to me if you could share this with just one person who you think would enjoy it!
Was this email forwarded to you? Subscribe here!
Building Enzymes, Piece by Piece

Researchers introduce Riff-Diff, a hybrid computational design pipeline that creates new enzymes with good catalytic activity from scratch. Image credits: Nature.
Enzymes: Accelerating Life
Enzymes accelerate chemistry, and that’s how life happens.
With their high selectivity, efficiency, and mild reaction conditions, the natural catalysts are also scientists’ best friends in drug production and biotechnology.
But identifying natural enzymes for a specific reaction is a huge hassle! You have to sift through countless proteins, and the necessary high-throughput screenings are expensive. Terrible!
Wouldn’t it be awesome to have a better way?
It would be amazing to simply walk to your computer, type your desired function, and get a protein out! And computational protein design promises exactly that.
But we’re not there yet (unfortunately).
There has been amazing progress, especially with AI and ML tools! But the main problem remains the low efficiency of the initial design.
Most designed enzymes start weak, and to compensate, scientists have to use high-throughput screenings and directed evolution. Which brings us right back to being slow and costly!
If you want speed, the real win is designing efficient enzymes from the start.
Why Designed Enzymes Fail
Designed enzymes require extreme precision:
Active sites placement: The correct amino acids have to be placed in the active site with high accuracy.
Backbone design and compatibility: The protein backbone has to support the catalytic function of the active site and be compatible with it.
And even a deviation of less than 1 Å can inactivate an enzyme!
This was pretty much impossible until not long ago. But today, we have the right tools: we just have to combine them properly!
Riff-Diff: Putting the Pieces Together
The authors introduce Riff-Diff, a computational pipeline for the de novo design of enzymes. They focused on the precise placement of catalytic arrays of amino acids into de novo protein backbones.
The goal: useful activity with minimal experimental optimization!
Riff-Diff merges atomistic modelling with machine learning. In short, the pipeline goes: choose catalytic residues → get a library of artificial motifs → stick them on artificial backbones → iteratively improve it.
But you know me, so you also get a long and detailed answer! The major components are:
Catalytic arrays and artificial motif libraries
Here, catalytic arrays are specific arrangements of 3 amino acids that catalyze the chemistry, taken from existing proteins. They are embedded in short helical fragments, with the functional groups fixed while the backbone position varies, converting them into a library of artificial motifs. Riff-Diff ranks the artificial motifs and uses only the highest-quality ones!Backbone diffusion
The next step is designing the protein backbone. The authors used RFdiffusion, a staple of protein design. In short, RFdiffusion starts with a “cloud” of amino acids and iteratively turns it into a real molecule. The authors gave particular attention to the binding pocket, critical for the enzyme’s substrate specificity. Real enzymes bury the substrate deep into the pocket, and Riff-Diff places an α-helix as a “channel placeholder”. The helix is removed after the backbone is ready, ensuring a deep pocket, ready for the substrate!Iterative refinement loop
After diffusion, Riff-Diff refines the backbones iteratively. Each cycle starts with LigandMPNN designing the protein sequence from the 3D structure. The proteins are then relaxed, the sequence redesigned, and the structure predicted. The best-scoring ones start the cycle again! After the backbone refinement, Rosetta Coupled-Moves is used for fine packing of ligand interactions. The top designs are ranked using various metrics: AlphaFold confidence, active-site RMSD, and Rosetta energies.
Well, that’s something! So, “one-shot” enzyme design from catalytic arrays!
And after this, they tested it. They applied Riff-Diff to two campaigns, using catalytic arrays validated from previous studies as a starting point.
Campaign 1: Retro-Aldolase
Retro-aldolases are enzymes that break carbon-carbon bonds in molecules and produce two smaller molecules. A well-studied model, enzymes like this are even involved in glycolysis.
Scientists have created artificial versions, using the classic “compute-then-evolve” approach.
Using the catalytic triad of the artificial enzyme RA95.5-8F (catchy name!), the authors selected 36 designs for experiments (RAD1-36). 35 were cloned and expressed, and 91% showed activity!
RAD29 and RAD35 were the stars, with the highest activities (kcat ≈ 3.1–3.6×10⁻² s⁻¹): a 500,000x acceleration over the background! Still lower than evolved variants, but much higher than previously designed enzymes!
And the enzyme showed high thermal stability, an edge that designed proteins have over natural ones. RAD35 was also highly substrate-selective.
So, it worked! With a small experimental screen.
Campaign 2: MBHase
Time for a more challenging target!
The Morita-Baylis-Hillman (MBH) reaction is a harder, non-biological reaction. Previous computational MBHase designs required extensive evolution to reach useful rates.
The authors started with two evolved MBH catalytic arrays as inputs and applied Riff-Diff to scaffold them into backbones. And most designs (93-94%) showed measurable MBH activity!
MBH48 even outperformed BH32.8, a variant evolved for 8 selection rounds! Still below the highest evolved MBH rates, but approaching useful territories.
Impressive: few designs, but considerable activity!
Surprising Structure-Function Relationships
The authors solved the crystal structures of their designed enzymes.
The crystals showed amazing agreement with the designs, often below 1 Å. Riff-Diff has near-atomic accuracy! But what they found analysing the data surprised them.
If the atomic placement works, why does the activity vary wildly?
The authors merged the crystal structures with molecular dynamics simulations to learn two lessons:
Flexibility and dynamics are important: Many designed active sites move into non-productive conformations.
Substrate-bound geometry matters: Metrics obtained without substrate were poor predictors of activity. However, enzyme-substrate complex models (AlphaFold3 predictions) produced metrics that correlated better with measured activity.
In short, atomic placement is necessary, but not sufficient, with substrate positioning and other components playing a major role.
Riff-Diff allows you to put the same catalytic array in different backbones and see how catalysis changes. This makes it an excellent tool for these studies!
Wrapping it Up
Such a cool work!
You know how excited I am by computational design, be it DNA or proteins. And enzymes are the ultimate challenge!
Riff-Diff scaffolds known catalytic arrays into novel backbones with atomic precision. It produced multiple high-activity de novo enzymes with just tens of designs tested experimentally!
Plus, the pipeline worked on both biological and non-biological chemistries.
Now, it’s not perfect:
Activity prediction is not great: No good correlation between predicted ranking metrics and catalytic activity.
Conformational dynamics: Many designed active sites move into unproductive conformations.
Catalytic array selection: The big one. Riff-Diff relies on known, good starting arrays, but for novel chemistry, this first step remains hard.
But an amazing step towards one-shot enzyme design!
Of course, Riff-Diff can be used to produce new enzymes for biomedicine and sustainable chemistry, no doubt. And the designed enzymes can be improved via directed evolution!
But I also see it as a powerful tool to push our understanding of enzymes. We can’t design new and better enzymes because we don’t understand them: Riff-Diff can help with that!
I highly recommend reading the full paper here!
If you made it this far, thank you! What do you think of designed enzymes? Do you think it’s a waste of time? Reply and let me know!
P.S: Know someone interested in AI-protein design and SynBio? Share it with them!
What did you think of today's newsletter?Your feedback helps create the best newsletter possible! |
More Room:
New Architectures for RNA: Never a dull moment in nucleic acid research! This study reports the high-resolution crystal structure of a high-affinity GTP-binding RNA aptamer, revealing an unusual and previously unseen G-quadruplex architecture. GTP is directly incorporated into a layer of a two-tier G-quadruplex, explaining the aptamer’s strong affinity and specificity, while additional noncanonical tetrads and base-pair stacking further stabilize the structure. Awesome! I wonder if we can see the same in DNA…
Never Enough Artificial Proteins: If you want to read more about protein design, here you go! This paper presents an integrated protein design strategy that combines a latent generative landscape model with molecular dynamics simulations and experimental validation to explore functional protein sequence space. Using this workflow, the authors designed artificial multidomain, ATP-driven copper transporters that exhibit native-like structure and activity.
DNA Origami Data Storage: Yes, even more things to do with DNA origami. This paper introduces a DNA data storage strategy inspired by chromosome condensation, using long single-stranded DNA to encode information and fold it into highly compact DNA origami structures. This approach achieves 100% data payload with no redundancy and reaches extremely high spatial storage densities, exceeding those of natural chromosomes. The stored information can be fully recovered by disassembling the structures and sequencing the DNA, highlighting a promising path toward ultra–high-density DNA-based data storage.
Share Plenty of Room with founders or builders
I help biotech and deep tech companies transform complex technologies into engaging content that builds credibility with investors, partners, and potential hires. Let’s chat!
Know someone who’d love this?
Pass it on! Sharing is the easiest way to support the newsletter and spark new ideas in your circle.Got a tip, paper, or topic you want me to cover?
I’d love to hear from you! Just reply to this email or reach out on social.