Molecular Hard Drives: DNA Origami Rewrites Data Storage!

Can DNA origami bring us a better way to store our cat videos?

Let’s hope so! Read on to find out.

❝

Don’t keep this newsletter a secret: Forward it to a friend today!

Was this email forwarded to you? Subscribe here!

DNA Learns to Edit

Researchers created DONLDS, a DNA origami-based data storage that allows easy deletion and insertion of data. Image credits: Nature.

Data Storage Problems

We’re drowning in data.

From cat videos to TikTok and ChatGPT, we create an absurd amount of information. We’re talking about 400 million terabytes per day! And all that information lives in data centers. These energy-hungry beasts have rows and rows of metal cabinets containing servers, cables, and storage drives.

And the demand for data storage keeps growing!

That’s why researchers are always looking for new materials to expand our hard drives. Microsoft is betting on glass beads; others are pushing the limits of semiconductors or exploring new polymers.

And then, there is the most familiar option of all: DNA!

Why DNA Data Storage?

DNA already stores the genetic information of all life on Earth.

And it has been doing that for billions of years! DNA is an incredible storage material:

Ultrahigh information density: 1 gram of DNA can store 1 million terabytes of data!
Long-term stability: DNA can last for thousands, or even millions, of years!
Immune to electromagnetic interference: Unlike electronics, DNA isn’t bothered by magnetic or electrical noise.

Add in the advances in DNA sequencing and synthesis, and you have the most promising material for the future of data storage!

In some ways, DNA data storage is already a reality. Companies like Biomemory and CacheDNA are working on it now. It’s early days, and there are still kinks to work out, but it’s here!

At least, for some data.

Cold Vs Hot Data Storage

Most of today’s DNA storage systems are built for cold data storage.

Now, you might not be deep into storage tech (just like me). So, let’s make it simple:

Cold storage: For data you don’t need to access often, for long-term preservation.
Hot storage: For data that needs to be accessed frequently and quickly. You use expensive, high-performance media like SSDs.

Most DNA storage systems today are cold-storage systems. These technologies generally rely on the synthesis and sequencing. This is powerful, but relatively slow and expensive at scale!

Scientists have tried alternatives suited for hot storage. Most of them struggle with the deletion and insertion of new files, a must for hot data storage. To change a single piece of data, you might have to rewrite a huge chunk of the system!

Is there a better way?

Storing Data in DNA Origami

This brings us to today’s paper!

The authors introduce DONLDS, a catchy name that stands for DNA origami nanostructure-enabled linked data storage. In practice, it recreates linked list lists (yes, from computer science!) at the molecular scale.

Each element of a linked list is called a node, and it contains two things:

Data
A pointer, telling you where the next node is.

This makes linked lists particularly good for inserting and deleting elements! Just change the pointer, and you can easily modify your data. Sounds like it will solve our DNA data storage!

But how do you implement it in practice?

DNA Origami as Storage Nodes

You know it: DNA origami is my favourite!

It uses a long, single-stranded DNA as a scaffold, together with many smaller “staple” strands, to create nanoscale structures. Completely customizable! You control shape, size, and molecular placement with nanometer precision.

In this case, the authors designed 3 different DNA origami nanostructures (DONs):

Triangular DON (TDON)
Rectangular DON (RDON)
Cross-shaped DON (CDON)

Each acts as a storage node, with a data domain and a pointer domain.

The data domain stores the data. The writing scheme is simple and smart. The authors use streptavidin (SA) attached to biotin-modified strands to represent the “1” signal at specific sites on the DNA origami surface. If SA is absent at that site, that means “0”. DNA origami’s spatial control is key: the SA sites are at least 20 nm apart to avoid interference. How do you read the data? You use atomic force microscopy (AFM) to image everything!

On the edges of the structures, you find the pointer domains. Each structure has a prior and a next pointer. These are single-stranded DNA sequences that can bind to complementary operation strands. Operation strands guide the assembly of different structures, letting you create a controlled linked list: node 1 → node 2→ node 3. Or whatever order you want!

Encoding Data in DNA Origami

So, this is the general idea.

The authors assigned different data types to different DNA origami nodes:

Triangular DON (TDON)
Used to encode English letters. The authors encode each letter in the DNA origami with an 8-bit scheme. This includes an orientation code (to break symmetry) and an alphabetical code. The storage accuracy for each letter ranges between ~91% and ~96%! The team also linked the nodes into dimer structures to form the sentence “HELLO WORLD”, with connection accuracy over 90%.
Cross-shaped DON (CDON)
Encode Arabic numerals. CDON uses four SA positions to store digits from 0 to 9 with high accuracy. The team used this system to encode the coordinates for cities like New York, Beijing, Brasilia, and Sydney. Latitude and longitude are encoded into binary and stored in linked CDON structures!
Rectangular DON (RDON)
Encode Chinese characters. Chinese characters are complex, so the authors used a 16-bit encoding, with two RDON nodes representing one character. The node is divided into an index domain, an orientation domain, and a character domain to improve readout accuracy. The authors successfully encoded 8 Chinese characters, and then stored a traditional Chinese maxim using an RDON tetramer, reaching 48-bit capacity at a 222.22 Gbit/cm² of data density!

Dynamic Editing: Insertion and Deletion

Until now, you had your standard DNA storage system. Cool, but not revolutionary.

Dynamic editing sets this system apart!

Do you want to add or remove information? With DONLDS, instead of rewriting everything, you change connections and switch around nodes. Much easier! Let’s see how it works.

The secret lies in the operation strands. Once you have a linked list of DNA origami nodes, you can use strand displacement to detach them, reattach them, or swap their order around.

The team first tested with CDON dimers.

They connected → disconnected → reconnected them using toehold-mediated strand displacement. The measured accuracy was ~80% for connection and ~95% for disconnection! Pretty good.

But they went even further, with a more complex system!

They first created a list to store “DNA-HELIX-1953”, with TDO encoding “DNA”, RDON encoding “HELIX” (as Chinese characters), and CDON encoding “1953”.

Then, they deleted part of the list and inserted new nodes! Converting the stored data to “DNA-STORAGE-1988”. In this new linked list, the authors report a 100% recoverty rate over 24 samples, using thresholds to interpret SA signals as 1, 0, or invalid.

From DNA Strands to USB Sticks

Are you going to use DNA as a USB stick?

Not soon. But it’s so exciting! I was impressed by DNA data storage while I was researching for this paper. And it’s starting to really pick up! So much research and commercial applications.

This is a cool paper, with a strong proof-of-concept of a novel approach to hot DNA storage. Linked lists are real data structures, and the authors perfectly replicated them at the molecular scale. Editing operations included!

Now, there are 3 main problems:

Tested in a highly controlled lab environment.
The AFM readout is impractical; there are lots of advances here, though, especially for speed! So I might be wrong.
Synthetic DNA is still expensive. But it’s going down!

All in all, a super cool paper! Go and read all the details here.

If you made it this far, thank you! What do you think of DNA data storage? Do you think we’re seeing the beginning of something new? Reply and let me know!

❝

P.S: Know someone interested in DNA nanotech and DNA data storage? Share it with them!

What did you think of today's newsletter?

Your feedback helps create the best newsletter possible!

More Room:

Psychedelic Plants: Psychedelic compounds are getting a lot of attention for their medical use. But we actually know little about their natural synthesis! In this paper, the authors map the full biosynthetic pathway of DMT in plants and reconstruct pathways for multiple natural psychedelics in a single plant system. Using metabolic engineering and enzyme design, they also create non-natural halogenated variants with therapeutic potential. This work establishes a plant-based platform for the scalable production of psychedelic compounds. Awesome!
Fluctuating DNA Computers: Add another problem to the list for DNA origami computing: structural fluctuations. Surface-confined DNA computing using DNA origami can suffer from signal leakage due to structural fluctuations. In this study, the authors design a rigid double-layer DNA origami platform that reduces these fluctuations, enabling more accurate positioning of components. This improves signal fidelity, lowers leakage, and enhances logic gate performance, providing a more reliable platform for DNA-based computing.
Better DNA Synthesis? Most modern biotech relies on synthetic DNA. But if you ever ordered some, you’ll know: it’s not always the best quality. Enzymatic DNA synthesis is promising but limited by poor enzyme access to primers. In this study, the authors use tetrahedral DNA nanostructures to organize primers in an upright, well-spaced configuration, improving enzyme accessibility and reaction efficiency. This leads to higher yields, fewer errors, and successful synthesis of DNA for data storage, demonstrating a more efficient approach to DNA synthesis.