Special thanks to Luca Donnoh for the feedback.
When discussing rollups, it's important to navigate through a substantial amount of technical information and clear up common misunderstandings (there are also controversies). For this article, I've decided to use the framework by L2BEAT, particularly their well-regarded risk assessment model, as a foundation.
L2BEAT provides thorough information on how rollups work and their associated risks, but it appears to be more suited for readers who already have a lot of familiarity with rollup technology. It can be quite overwhelming for newcomers. My objective here is to simplify this information and make it more accessible.
In analyzing rollup risks, L2BEAT categorizes them into five areas: State Validation, Data Availability, Exit Window, Sequencer Failure, and Proposer Failure. Each rollup listed has a brief description under these categories, usually with additional details that can be difficult to interpret. Let's break down these categories for a high-level, and then detailed understanding.
State Validation: Refers to the process of verifying the accuracy and correctness of state transitions of the rollup. This is about ensuring that the transactions and data are correct and haven't been tampered with.
Data Availability: Focuses on how accessible the data of the rollup is. Good data availability means users and validators can fully verify the blockchain's history and recover the state.
Exit Window: The time during which users can withdraw their assets from the rollup if they need to exit quickly, e.g. due to changes or issues with the rollup. A longer exit window is better because it gives users more security.
Sequencer Failure: Deals with the reliability and functionality of the sequencer. If the sequencer fails, it can lead to potential delays or loss of transaction data.
Proposer Failure: Focuses on the risks related to the proposer. If the proposer fails, it can lead to issues in processing transactions or updating the state of the rollup.
By understanding these risks, we can better assess the safety and reliability of rollups, which is essential for (some) people (sometimes).
The core difference between zk and optimistic rollups lies in their way of finalizing the state. By state, we understand the balance of all accounts, including smart contracts and regular wallets, deployed on a rollup.
Consider the rollup as a processor: it takes the existing state, processes the latest block's transactions, and updates it. This processor is what we call the state change function.
State validation is a critical aspect of rollup security and functionality, and it varies significantly between zk-rollups and optimistic rollups:
zk-Rollups use zero-knowledge proofs to validate transactions on L2 before posting to L1. In zk-rollups, the operator generates a cryptographic proof (a SNARK or STARK) for each batch of transactions processed on L2. This proof, which verifies the correctness of all transactions in the batch, is then posted to L1.
Optimistic Rollups rely on operators who run the rollup nodes. Transactions are aggregated and executed on L2, and the state is periodically committed to L1 in a batch without immediate verification. Instead, it assumes transactions are valid by default (hence "optimistic") unless challenged with fraud proof.
Fraud proof is an interactive protocol between the challenger (a full node) and the operator responsible for submitting the state (proposer), and includes the following steps:
The challenger suspects that a state transition processed by the rollup is incorrect and submits the challenge together with justification.
The proposer responds with a counterargument.
Depending on the protocol, there might be several rounds of responses.
Eventually, the process resolves, either because one party fails to respond in time or because the evidence is conclusive.
After the dispute is resolved, the state transition is either confirmed or the rollup is rolled back to the last valid state.
There are two main types of challenges: single and multi-round.
Previously, Optimism used single-round challenges. This involved re-executing the questioned transaction on L1. However, this approach required the rollup to store all state transitions, which was data-intensive. Moreover, since all transactions were re-executed on L1, it also demanded significant amounts of gas, making it expensive. For these reasons, and to enhance security, Optimism has paused the ability to post fraud proofs and has been working on an alternative solution for almost two years.
Arbitrum uses interactive multi-round challenges. In this system, when a dispute arises, the parties recursively break down the block into segments, and the segments are isolated one by one to identify the incorrect transaction. This method is known as a bisection protocol.
In optimistic rollups, the security depends on having just one honest node. If this node catches any fraudulent transactions, it can submit fraud proof to challenge them. This means that optimistic rollups operate on a fairly low trust model, where only a single honest participant can keep the system secure.
Currently, however, users must place significant trust assumptions while entering an optimistic rollup. In an ideal setup, the ability for anyone to submit fraud proof would be a key feature. The reality is different. Many rollups, even as big as Optimism, are still in the process of developing their systems to accept fraud proofs, which means that such a critical mechanism isn't available.
On the flip side, Arbitrum has put in place a system where only specific approved entities can submit fraud proofs, with L2BEAT being an example of one such entity. Assuming these entities are independent of the rollup's creators, this arrangement needs significantly less trust.
Rollups work by processing transactions off the L1 blockchain and then posting either fraud (in case of incorrect state transitions) or validity proofs back onto L1. Data availability plays here a crucial role as for transactions to be verified, the transaction data must be available. This availability allows nodes to rerun the transactions, confirm their validity and, if required, submit fraud proof. This mechanism is the backbone of integrity in optimistic rollups.
Fraud proofs also are the reason why withdrawing funds from optimistic rollups takes such a long time. The funds need to be locked, usually for seven days, to allow enough time for the transactions to be checked for correctness. However, there are workarounds like using bridges that operate full nodes; they can instantly verify transactions and confirm their validity, which allows them to take risks on behalf of a user.
With zk-rollups, the process is simpler. Here, the operator posts a validity proof directly on L1. This proof confirms that all transactions are correct as soon as they are posted. However, validity proofs don't address one issue: recovering the state of the rollup. For this reason, the zk-rollups must also ensure that data is made available.
The challenge of storing a vast amount of transaction data in a publicly accessible way is known as the data availability problem and it lies at the heart of the blockchain trilemma. Discussing solutions to the data availability problem goes beyond what we cover today. Still, if you remember Reed-Solomon codes from my earlier pieces on zk-STARKs, you're on the right track.
Traditionally, rollups have posted their data directly on L1 blockchains. However, as the concept of modular blockchains gains traction, new solutions are emerging:
Blockchains like Celestia are built specifically to solve the data availability problem. They provide a dedicated space for storing and accessing data.
Solutions like EigenDA, secured by Ether that's restaked with EigenLayer, offer high-throughput options for managing data availability.
L2BEAT assesses how and where data is made available. This can significantly differ from one project to another:
Arbitrum and Optimism make transaction data fully accessible on Ethereum.
zk-Rollups only publish the data necessary to recover state changes. An exceptions are Polygon zkEVM, Scroll, or Linea, which post the entire transaction set.
Validiums store the transaction data off-chain and manage it through committees rather than directly on-chain.
As we know, exiting an optimistic rollup involves a delay to account for the possibility of a successful fraud proof, while with zk-rollups exits happen right away. However, the L2BEAT risk assessment category highlights a separate problem: the risk that comes with malicious updates to the rollup's smart contracts and whether users are given sufficient time to withdraw their funds when they choose to.
The architecture of rollup consists of several types of smart contracts:
Core: Responsible for managing the state root (the summary of the current rollup state), processing transactions, and facilitating communication between L1 and L2.
Bridge: Enable the movement of assets between the L1 and L2. Needed for users to deposit and withdraw funds.
Governance and Upgrades: Often combined with timelocks, provide a process for proposing, voting on, and implementing changes.
Verification: In optimistic rollups, verification contracts manage the fraud proofs. In zk-rollups, handle the verification of validity proofs.
Escrow: Manage the holding and transferring of assets within the rollup.
The upgradability of smart contracts introduces both flexibility and vulnerability. Currently, the update feature is typically managed by a multisig, so the integrity of the rollup is dependent on the security of this multisig:
Arbitrum uses a 9 out of 12 multisig. The public identity of its participants gives transparency but also makes them easy targets for malicious attacks.
Optimism uses a 5 out of 7 mulisig. The identities of the participants are private.
The ability to upgrade smart contracts ideally includes a delay between announcing changes and implementing them, providing users with a chance to leave the rollup if they're wary of malicious updates or don't agree with the changes. Yet, as highlighted by L2BEAT, the reality is that many rollups lack any safeguards against the misuse of their upgrade capabilities, meaning that the changes to the smart contracts apply immediately.
Arbitrum stands out with a three-day waiting period for any contract changes. However, this delay can still be overridden by the Security Council. During this standard waiting period, users have a two-day window to push a transaction through to L1 and exit the rollup.
If a sequencer or proposer encounters problems, it can significantly disrupt a rollup. Here’s a breakdown:
Sequencer: Collects, orders, and processes transactions, updating the rollup's state. A failure in its operations can lead to delays in transaction processing, loss of transaction order integrity, and halts the rollup's ability to update its state.
Proposer: Responsible for submitting transaction batches and their state roots to L1. A failure on the part of the proposer includes submitting incorrect state roots or failing to post transactions to L1. The requirement for proposers to post a bond aims to reduce this risk by providing a financial disincentive for fraudulent or negligent behavior.
Verifiers: Full nodes in optimistic rollups or a verifier contract in zk-rollups. They ensure the correctness of transactions and state transitions.
With an understanding of the roles involved, let's walk through a typical day in the life of a transaction:
A user initiates the process by submitting a transaction to the rollup.
The sequencer picks up the transaction, orders it with others, and processes the batch, calculating the new rollup's state.
The proposer confirms the integrity of the batch and submits it to L1 with the new state root.
Verifiers check the correctness of the transaction and its inclusion in the state root.
If no issues are detected (including the potential fraud proof submission), the transaction is considered finalized on L1.
Most rollups currently offer the option to submit transactions directly to L1 via a bridge contract. This can be done either from within the rollup or directly on L1. However, this solution has limitations if the transaction includes assets native to the rollup, which may not be interoperable with L1. Moreover, if the operators don’t perform their roles, any funds locked in various protocols would not be possible to be moved.
Arbitrum provides a mechanism allowing users to self-propose blocks if the proposer has been inactive for 6 days and 8 hours to address potential proposer failures. Optimism currently doesn’t have a similar feature, which means users don’t have options for direct intervention if the proposer fails to act.
Aiming for decentralization of the operators is seen as a way to increase the security of rollups. However, decentralization introduces other challenges, such as potential delays in transaction processing, and an increased surface for MEV.
L2BEAT, with some help from Vitalik, grouped rollups into three tiers based on the security considerations discussed above. Here's an overview of how these stages are defined.
Stage 0: Full Training-wheels. The rollup is operated by the creators:
Self-identification as a rollup.
Posting L2 state roots on L1 for withdrawals.
Providing data availability.
Open source software to reconstruct the state available.
Stage 1: Limited Training wheels. Governance begins to transition to smart contracts:
Use of a proper proof system.
At least 5 external actors can submit fraud proof.
Users can exit without the operator's coordination.
At least a 7-day exit window for users in case of unwanted upgrades.
A well-structured Security Council, with at least a 75% consensus threshold and diversity in membership.
Stage 2: No Training Wheels. The rollup is fully operated through smart contracts.
Council acts only in case of undeniable bugs e.g. failure to submit valid proof.
Temporary control to the council in specific, bug-related scenarios.
Upgrades permitted with a minimum 30-day delay.
With Ethereum's shift towards a rollup-centric roadmap, the security of rollups has become crucial for both users and the broader crypto ecosystem. Despite advancements, there remain significant security gaps and trust assumptions in many widely used rollups.
Optimistic rollups were the first to launch, with both Optimism and Arbitrum becoming available to the public in 2021. The tech behind optimistic rollups is more mature, with Arbitrum making significant progress in its security. Optimism, while significantly lagging in security aspects, has shifted its focus towards improving the interoperability of its OP stack, which allows for the creation of other rollups using Optimism's technology. Despite security concerns, users appear relatively unbothered by these shortcomings.
On the ZK rollup front, Starknet, based on STARK technology, launched in 2021, marking an early entry. The first SNARK-based ZK rollup, zkSync, launched only in Q1 2023, making it a new kid on the block(chain). This freshness is evident in the risk assessments, which show that many security issues are yet to be addressed.
In summary, while both types of rollups advance the scalability and efficiency of Ethereum, they each come with distinct security considerations. Although we haven't yet experienced a major incident involving rollup security, the growing number of implementations suggests it's only a matter of time before we encounter some form of breach and loss of funds. L2BEAT work in monitoring the security of these systems is therefore essential for the users and the entire crypto space.
Ethereum documentation, Scaling
Yuan Han Li, WTF is Data Availability?
Charles Yu, Optimism & Arbitrum: Tracking Decentralization Progress
James Prestwitch, The Definitive Guide to Sequencing
Luca Donno, Introducing Stages