First.. what’s data availability?

According to ethereum.org, data availability (DA) is “the guarantee that the block proposer published all transaction data for a block and that the transaction data is available to other network participants.”

But how exactly do you guarantee data is available?

For most Layer 1s (L1s), it’s pretty straightforward. L1 nodes know transaction data is available by downloading and executing it themselves. This is how nodes verify blocks and is at the core of how blockchains work.

Layer 2s (L2s) change the paradigm. L2s (specifically rollups) use fancy cryptographic proofs to guarantee blocks are valid without nodes having to execute every transaction. This unlocks massive benefits and new L1 designs!

But not so fast. Rollups still need data to be available, just for different reasons.

So how do we scale? Seems like we are back to where we started.

DA layers

Introducing DA layers

DA layers specialize in, as you might expect, assuring nodes that data is available. This can take different forms, including:

DA blockchains
DA committees
DA middleware
Data sharding

We’re only going to discuss the first two, but here are a few resources if you want to learn about DA middleware and data sharding.

DA blockchains vs. DA committees

Because it’s still very expensive to post data on Ethereum, most rollup teams are posting their data off-chain. This design technically classifies them as validiums.

Ethereum’s data-sharding roadmap solves the problem and enables cheap rollup data, but to be safe, let’s assume we’re a year away from the first major upgrade. In the meantime, rollup teams have two major options: DA committees and DA blockchains.

DA committees are selected entities that hold off-chain copies of the transaction data and promise to make it available in case of emergency. These committees often have 7-10 members and are a slight improvement over fully relying on the rollup operator.

DA blockchains take the idea a few steps further by replacing small, permissioned committees with large, permissionless committees that have strong economic incentives to behave.

DA layers vs. data storage layers

A common mistake is thinking that data availability = data storage. However, this is not the case.

An easy way to think about the difference is on a time dimension.

DA layers make sure nodes can access data on a short time horizon. Their main goal is to smoothly progress blockchain state, and they typically do not make assurances about longer time horizons. As ethereum.org puts it, “data availability is relevant when a block is yet to pass consensus.”

In fact, DA layers might even discard the data after a few weeks. In Ethereum’s next major upgrade, this data will be pruned after ~2 weeks.

Data storage layers make sure data is available on a longer time horizon and are closer to the cloud storage solutions most web2 developers are familiar with. Of course, it’s not hard to imagine web3 developers opting for decentralized versions like Arweave.

Sanjay Shah ⚡️

@sanjaypshah

15/ Here's a visual of the full data flow.

With this, rollups get strong guarantees they can re-create the rollup state in an emergency.

DA layer use cases

There are many things that can be built on top of DA layers. Let’s touch on three:

As we mentioned earlier, validiums are common today. Even after Ethereum has implemented its own sharded DA layer, it’s likely that rollup teams will still use off-chain data to reduce costs. Developers have historically always pushed the boundaries of what’s possible.

Sovereign rollups not only use DA layers for data availability but also for consensus. Applications are likely good candidates to become sovereign rollups (rather than smart contract rollups or validiums) if they need full control over state transitions yet don’t want to worry about a validator set.

In his recent talk, Balaji Srinivasan envisions a future where “fiat information” competes with “crypto information.” He describes “reliable data feeds” using crypto oracles like Chainlink, where IRL metadata is posted on chain. That data could be posted onto DA layers.

Source: Creating Sources of Definitive Truth With Blockchain Oracles

DA layer endgames

It’s the early days for DA layers. Polygon Avail, EigenDA, and Celestia are all still in testnet, and Ethereum data sharding is 1-3 years away, depending on the upgrade in question.

However, there’s plenty to look forward to. Let’s highlight what seems to be a common endgame across the board. Most teams envision something like this:

Progressively increasing block sizes and sharding them across the network

Logarithmic Rex

@LogarithmicRex

(18/30) The transition from Proto-Danksharding to Danksharding involves two important changes:

- available blobs per block will increase from 1 to 64 (as of now)
- blob data will be distributed across the network, so that no single node needs to download them all

Relieving nodes of downloading full blocks using KZG commitments

polynya

@apolynya

Polygon Avail is going to be the first data availability sampling network that uses KZG proofs. Similar technology will power danksharding in the future, enabling rollups to hit limitless scale in tandem with increasing decentralization or security.

Polygon

@0xPolygon

#Polygon is building a modular suite of scaling solutions to empower chains and dApps of any size.

And today, we’re sharing our vision for @0xPolygonAvail, a new data availability blockchain that improves scalability across the board.

Learn more

bit.ly/Polygon-Avail

Maintaining low verification costs with data availability sampling.

Logarithmic Rex

@LogarithmicRex

(11/21) Yes, no single node will download the entire block, but if we are careful about how we break up our blob and ensure our sampling is random enough, we can be confident the entire block is available.

Eventually, we get to a place where DA layers enable high throughput applications while trust-minimized light clients verify on mobile devices.

That’s right - performance and decentralization!

Wrapping up

Hopefully, this article helped you gain more familiarity with data availability. The goal was to offer a broad overview and address common misperceptions about the topic.

There are many deep dives into how it works, so if you want to jump down the rabbit hole, here are some resources:

As always, this article is based on a snapshot in time, and web3 moves very quickly. The technology and timelines mentioned might change.

To keep up with the latest, I recommend following along with sources like the Polygon website, the Polygon DAO blog, and The Village Times newsletter. And to get involved, come join us at Polygon DAO.

Polygon Village

Let’s Talk About Data Availability