1. Introduction

Somnia, a new EVM-compatible Layer 1 blockchain, recently announced that it can achieve 1 million TPS with sub-second finality, tested on a network of 100 globally distributed nodes. While this high TPS was tested using ERC-20 transfers, it represents a significant breakthrough in blockchain scalability.

To achieve this level of performance, Somnia introduces four key innovations that improve efficiency compared to traditional EVM chains:

Autobahn-inspired BFT consensus – a seamless, high-throughput, low-latency BFT protocol that avoids the trade-offs of traditional and DAG-based consensus mechanisms.
Compiled EVM bytecode – accelerating sequential execution.
IceDB – a high-performance database optimized for storing blockchain state.
Optimized data propagation and compression – maximizing the utilization of network bandwidth.

While all these features are interesting, covering them all in one article would be too much. In this article, we’ll focus on Autobahn, the core innovation behind Somnia’s consensus. We’ll explore the improvements Autobahn brings compared to both traditional BFT and DAG-based BFT, as well as how it works.

2. Definition of Hangover and Seamlessness

To highlight inefficiencies in current consensus protocols, Autobahn introduces the concepts of hangovers and seamlessness. Below is the definition of hangover from the Autobahn paper:

“Definition 1:* A hangover is any performance degradation caused by a blip that persists beyond the return of a good interval."*

Simply put, a blip is an event that temporarily stalls a BFT consensus protocol. A hangover is a period of performance degradation that continues even after a blip ends.

For example, if a consensus protocol gets stuck and cannot make progress for 30 seconds, this is a blip. After the blip ends, if the system does not immediately return to full performance (e.g., lower TPS or higher latency than normal), the period between the blip ending and the network's full recovery is called a hangover.

Some hangovers are inevitable. For instance, if the system experiences message delays or network instability, bandwidth temporarily drops, causing TPS to decrease and latency to increase. No matter how well-designed a protocol is, it cannot exceed network speed and must simply wait for the network to return to normal.

However, there is another type of hangover known as a protocol-induced hangover. As described in [1], protocol-induced hangovers result from suboptimal system design, where the protocol itself introduces unnecessary delays. This leads to the concept of seamlessness, which the Autobahn paper defines as:

"Definition 2: A partially synchronous system is seamless if (i) it experiences no protocol-induced hangovers, and (ii) it does not introduce any mechanisms that make the protocol newly susceptible to blips."

The paper highlights that traditional BFT prioritizes low commit latency under normal network conditions but suffers from protocol-induced hangovers after blips. On the other hand, DAG-based BFT is more resilient to hangovers but comes with the trade-off of higher commit latency.

In the next section, we will examine how these two types of BFT protocols behave during a blip caused by a Byzantine leader preventing the network from reaching consensus. This will show why traditional BFT suffers from hangovers, while DAG-based BFT mitigates them at the cost of higher commit latency.

3. Problems with Current BFT Protocols

In this scenario, we consider a case where the leader of the protocol intentionally stalls the system by attempting to prevent other validator nodes from reaching consensus. Let’s examine how traditional BFT and DAG-based BFT behave during and after the blip to understand their inefficiencies.

3.1 Traditional BFT Protocols

Traditional BFT doesn’t decouple data dissemination from consensus (data ordering). This means that, in order to commit a batch of transactions (a block), a selected leader orders transactions into a block and then disseminates this block to other validators so that they can vote and agree on the order of transactions before committing it. We have discussed in the Narwhal blog [3] that this leads to redundant data broadcasting, where transactions get broadcast twice: first from validators sharing transactions in the mempool with others and second during the consensus process. In addition to inefficiency, leaders can easily force the protocol into a hangover by simply failing.

In traditional BFT styles, users submit transactions to a mempool. After that, a leader picks transactions from its mempool, orders them, puts them in a block, and enters a consensus protocol so that other nodes can vote and agree on it. As the protocol continues, users keep filling mempools with new transactions while old transactions get committed and processed by validators. When a leader fails, new blocks aren’t proposed, which prevents new transactions from being committed and processed. Once the system resumes, it cannot immediately process new transactions because it must first clear the backlog of pending transactions that accumulated during the blip. However, traditional BFT protocols allow the system to commit only one block at a time, and each block can contain a limited number of transactions. As a result, the longer the blip lasts, the longer the hangover period—the time it takes for the protocol to process the accumulated backlog.

A solution where the leader proposes a block containing only transaction digests (hashes) can help reduce network bandwidth and may shorten the time required for block commitment during consensus. However, validators who do not have the actual transactions must still fetch the data from other nodes before voting to verify that the proposed hashes correspond to real, existing transactions. As a result, data dissemination can still become a bottleneck in the consensus process, increasing the risk of a timeout and causing a blip.

3.2 DAG-based BFT Protocols

In round-based DAG approaches like Narwhal and Bullshark, data dissemination is decoupled from data ordering. In each round of the data layer, all validators act as proposers, independently proposing their own batches of transactions (vertices). Each vertex must reference a supermajority (2f+1) of vertices from the previous round and must be recognized by a supermajority of validators before it can be added to a validator's local view. Together, the data stored in each validator's local view forms a DAG structure, where vertices represent batches of transactions and edges represent references between them.

For a vertex v at round r to be accepted into a validator's local view, it must be acknowledged by 2f+1 validators. A validator acknowledges a vertex by sending its signature back to the proposer of that vertex. This ensures that if two validators include vertex v in their DAGs, both will have exactly the same vertex (along with its associated transactions and references), preventing equivocation. As a result, both validators perceive the same causal history of v, ensuring they maintain a consistent view of the DAG.

Disseminating data in this way ensures that all validators will eventually perceive the same DAG, ensuring eventual agreement on the same data. Once a DAG is constructed, validators can build a consensus protocol on top of it to interpret the DAG and totally order transactions. Separating data dissemination from transaction ordering makes the protocol more resilient to hangovers. Since DAG construction is leaderless, all validators actively disseminate their own batches of transactions to construct the DAG. This means data is disseminated at network speed, regardless of the latency of the consensus layer—unlike traditional BFT protocols, where data dissemination is tied to the pace of consensus.

Let’s use Bullshark as an example to illustrate the impact of a malicious leader. Every two rounds, there is a special predefined vertex called an anchor, which serves as a reference point to facilitate consensus ordering. An anchor is committed once it is referenced by at least f+1 vertices in the following round. Once an anchor is committed, its entire causal history (i.e., all vertices in previous rounds that have a path leading to the anchor) can also be committed.

If a validator knows that its vertex will become an anchor in the next round, it might attempt to stall consensus by deliberately not broadcasting its vertex, causing a blip at the consensus layer and delaying commitment. However, this does not affect the data layer, since data dissemination continues independently. Once an anchor in a future round is added to the DAG and committed, all transactions in its causal history (backlog) are committed at once.

Unlike traditional BFT protocols, which rely on a leader to build a block—committing only a limited number of transactions at a time—DAG-based protocols can commit the entire backlog at once in constant time, regardless of the blip’s duration.

However, the trade-off is that each vertex requires three message delays before it can be added to a DAG, resulting in 6–12 message delays before a transaction is committed—even when the system experiences no blips. In contrast, traditional BFT requires only 3–5 message delays. To address this latency, alternative approaches like pipelining or uncertified DAGs have been introduced, as seen in Mysticeti [5].

4. Autobahn

Autobahn considers traditional BFT protocols to be non-seamless because the hangover duration depends on the blip duration. In contrast, it views DAG-based BFT as nearly seamless since it eliminates hangovers but suffers from higher latency during normal operation compared to traditional BFT approaches. These limitations highlight the need for a protocol that combines the best aspects of both approaches. Hence, Autobahn aims to achieve seamlessness like DAG-based BFT while preserving the low latency of traditional BFT, all while maintaining high throughput.

4.1 The Idea of Autobahn

Similar to DAG-based protocols, Autobahn decouples data dissemination from consensus logic, forming two distinct layers: the data dissemination layer and the consensus layer.

Note: To keep things simple, I will avoid using complex symbols and notations and instead explain the protocol in a way that is easier to follow. While this explanation won’t cover every technical detail, it should still provide a clear understanding of how the system works.

4.2 Autobahn’s Data Dissemination Layer

This layer allows validators to disseminate data independently from consensus and at the pace of the network, while the consensus protocol operates on top to enable validators to agree on snapshots of the disseminated data. Although this layer serves a similar function to DAG-based BFT protocols, such as Narwhal, the data structure and dissemination process are fundamentally different.

Unlike Narwhal, Autobahn's data layer does not operate on a round-based basis. Instead, each validator maintains its own lane, which can be thought of as a sequential chain of transaction batches. Assume that a validator 𝑣 maintains its lane 𝑉 with length 𝑙 (i.e., 𝑙 batches of transactions exist in the lane, and each batch has a proof of availability (PoA) associated with it). Here’s how the validator broadcasts its new transaction batch to other validators:

Validator 𝑣 creates and broadcasts a proposal for batch 𝐵
Validator 𝑣 assembles a batch of transactions 𝐵 for its lane 𝑉 and broadcasts the proposal PROP(𝐵) to other validators. The proposal consists of:
- The batch itself: 𝐵
- The position of 𝐵 in lane 𝑉: 𝑙+1
- The hash of the parent proposal (where the parent of batch 𝐵 is the batch at position $l$ , the one immediately before 𝐵).
- The PoA (certificate) of the parent batch.
Validators verify and vote on PROP(𝐵)
When a validator receives PROP(𝐵), it checks the validity of the proposal by verifying that:
- It has already voted for the parent of the proposal.
- It has not already voted for this lane position (𝑙+1) before.
If both conditions are met, the validator stores the PoA of the parent batch and votes for PROP(𝐵) by replying to 𝑣 with its signature. Additionally, if the validator has not yet received the proposal for the parent batch, it does not immediately vote. Instead, it stores PROP(𝐵) in a buffer and waits.
Validator 𝑣 aggregates votes to form a PoA for 𝐵
Validator 𝑣 collects 𝑓+1 votes for batch 𝐵, forming a PoA for 𝐵 (denoted as cert(𝐵)). This PoA will be included in the proposal for the next batch (batch at position 𝑙+2). The validator can also broadcast cert(𝐵) immediately if the next batch is not yet ready. This helps the consensus leader quickly determine that batch 𝐵 is ready to be committed without waiting for the next batch from 𝑣.

Important Notes

There are two important notes here. The first one is that, in DAG-based BFT, validators must wait until $2𝑓+1$ vertices exist in the current round before moving to the next round and, therefore, need to compete to get their certificates quickly, or their vertices might be ignored (i.e., not referenced by any vertex in the next round), causing unnecessary commitment delays. Autobahn validators, on the other hand, do not need to keep up with others or wait for a round advancement, allowing each validator to construct its lane independently and asynchronously at its own pace.

The second note is that a PoA on the data layer requires only $𝑓+1$ votes, which means it does not enforce non-equivocation. Instead, a PoA only ensures that at least one honest validator has stored the batch associated with it, making it available for others to request if needed (hence, it is also called proof of availability). Since a malicious validator can arbitrarily fork its lane, the consensus protocol is responsible for preventing equivocation and ensuring that only one lane per validator is accepted.

4.3 Autobahn’s Consensus Layer

To achieve low latency, Autobahn’s consensus follows a traditional leader-centric PBFT-style approach with several improvements. In my opinion, the three most important improvements are:

Block Proposal: Instead of having validators agree on an order of transactions, Autobahn’s block proposal contains a snapshot cut of all data lanes. This allows Autobahn to achieve DAG-based BFT-like properties, such as:
- By agreeing on a proposed lane cut, validators can commit an arbitrarily large backlog, as it constitutes the causal history of the cut.
- Consensus throughput scales horizontally based on the number of lanes (i.e., the number of validators).
Fast Path: In a gracious interval—where the network is synchronous and there are no faulty nodes—a leader can assemble a commit certificate and let validators commit a proposal directly, skipping the second voting phase. This reduces the message delay from 5 down to 3 rounds.
Parallel Multi-Slot Processing: Instead of committing snapshots sequentially or pipelining consensus, Autobahn allows its consensus to process multiple snapshots in parallel, further reducing consensus latency.

Autobahn’s consensus operates in a series of slots. Within each slot, a leader proposes a lane tip snapshot, while other validators cooperate to reach consensus and commit the proposal. A slot in Autobahn is equivalent to a "height" in traditional BFT. In Autobahn, a proposal contains a lane tip snapshot, which includes one tip from each lane, whereas in traditional BFT, a proposal is a block containing an ordered list of transactions.

Like other leader-centric, partially synchronous BFT protocols, Autobahn’s consensus consists of two voting phases:

Prepare Phase: Ensures view synchronization, helping validators agree on the same view.
Confirm Phase: Finalizes consensus, allowing validators to commit the agreed-upon proposal from the Prepare phase.

Each slot progresses through a view-based process, where the leader aggregates and broadcasts messages to validators, ensuring fast and efficient consensus.

4.4 Slot Initialization (View ω = 0)

4.4.1 Propose

A leader of slot S can start the slot at view ω = 0 when it receives a commit certificate for the previous slot S-1, CommitQC(P’, S-1), where P’ is the proposal that gets committed at slot S-1. The leader creates a proposal P, which is a lane cut—a list of certified batches, each from a different validator's lane. The leader then broadcasts the proposal along with CommitQC(P’, S-1).

4.4.2 Prepare

Validators

Once a validator receives the proposal P, it checks its validity, ensuring that the proposal comes from the correct leader and that it has not yet voted in the Prepare phase in the current view. If the checks pass, the validator casts a vote for P by replying to the leader with the PREP-VOTE(P) message.

Leaders

Slow-path
The leader aggregates 2f+1 PREP-VOTE(P) messages from different validators into a prepare certificate for P, PrepareQC(P). This certificate ensures that at least f+1 honest validators are in this view. The leader can then broadcast PrepareQC(P) to other validators, which moves the protocol to the Confirm phase.

Fast-path
If the system is in a gracious interval, where the leader is able to aggregate n PREP-VOTE(P) messages, instead of forming a PrepareQC(P), the leader can upgrade the certificate to a CommitQC(P, S), commit the proposal P locally, and broadcast CommitQC(P, S) to other validators, allowing them to skip the Confirm phase and commit the proposal P directly

Confirm
Once a validator receives PrepareQC(P), it acknowledges the certificate and casts a vote on the Prepare certificate by responding back to the leader with CONFIRM-ACK(P). The leader aggregates 2f+1 CONFIRM-ACK(P) messages into a Commit certificate for the proposal P, CommitQC(P), which is then broadcast to all validators.

Commit
Once a validator receives CommitQC(P), it commits the certified batches contained in the proposal (the lane cut). However, the validator does not only commit the batches in the lane cut—it also commits all batches across all lanes, starting from the lane cut and extending back to the last committed point (slot S-1).
Finally, since the structure of the committed batches is unstructured, similar to DAG-based BFT, a validator can use any deterministic rule to establish a total order for all transactions committed in slot S.

4.5 Handling View Changes (view ω ≠ 0)

A view change can occur when a majority of validators fail to see progress in the current view. Since these validators cannot determine the exact cause—whether it is message delays, network partitions, or a faulty leader who may intentionally or unintentionally fail to send messages—they may attempt to trigger a leader change by demanding a new leader and proceeding with a new proposal. Each view has only one leader, and here is how the protocol progresses to a new view.

4.5.1 Start Timer

At each slot S and view ω, each validator maintains its local timer. A validator starts the timer of view ω of slot S once it receives a ticket T(ω, S). A ticket for view ω - 1, T(ω - 1, S), is a pass that allows the leader of view ω to start the new view ω.

For view ω = 0, the ticket is simply the commit certificate of the previous slot S - 1, CommitQC(P’, S - 1).
For view ω > 0 of slot S, the ticket is the timeout certificate formed from 2f+1 timeout messages from different validators, indicating that at least f+1 honest validators are ready to move to the next view. The validator can stop the timer for view ω once it can commit slot S or receive a ticket T(ω + 1, S), indicating that the network is ready to move to the next view.

4.5.2 Trigger Timeout

Once a validator's local timer expires, the validator broadcasts a timeout message TIMEOUT(ω, S, Highest_PrepareQC, Highest_P) to other validators. Highest_PrepareQC and Highest_P represent the highest PrepareQC and highest proposal view locally observed by the validator, respectively. These two values are important because they help validators commit the latest proposal that could have been committed at some correct validators.

If a validator receives f+1 timeout messages for view ω, it joins the revolt and broadcasts its own timeout message even if its timer for view ω has not yet expired. This is because f+1 timeouts already guarantee that at least one honest validator has failed to see progress, meaning the protocol will eventually move to the next view.

4.5.3 Forming a Timeout Certificate

When a validator receives a timeout message for slot S and view ω, it can respond with a commit certificate CommitQC of slot S if it has one. Otherwise, it waits until receiving 2f+1 timeout messages for ω, which can be aggregated into a timeout certificate TC(S, ω).

4.5.4 Advancing the View

A validator that forms a timeout certificate (TC) moves to the next view ω + 1 and resets its timer. If the validator is the leader of view ω + 1, it begins the Propose phase by broadcasting a proposal along with the timeout certificate TC(S, ω) as evidence that the network is ready to advance.

To ensure liveness and consistency, the leader must propose the latest valid proposal that could have been committed by some correct validators. The leader selects a proposal from the timeout certificate TC(S, ω) using the following steps:

Extract two candidate proposals from TC(S, ω):
- Candidate 1: The highest Highest_PrepareQC contained in TC(S, ω). (Recall: TC(S,ω) is formed using 2f+1 timeout messages, each containing a Highest_PrepareQC and a Highest_P.)
- Candidate 2: The highest Highest_P that appears at least f+1 times in TC(S, ω).
Choose the winning proposer:
- The leader selects the higher of the two candidates.
- If there is a tie, Candidate 1 (Highest_PrepareQC) takes priority.

If the system is operating normally, the first view (ω = 0) is sufficient for the system to decide on a value, commit a proposal, and move to the next slot. However, during blips, when validators fail to see progress and cannot reach consensus on a proposal, the system transitions to new views until the network stabilizes. Eventually, once the network recovers, all validators will be able to decide on a value at a later view (though we cannot predict exactly when).

This concludes how Autobahn consensus works.

4.6 Parallel Multi-slot Processing

In traditional BFT protocols, protocol latency is often reduced by pipelining slots, meaning slot S+1 can piggyback on slot S. However, Autobahn argues that this pipeline structure introduces challenges, such as liveness concerns, since resolving these issues requires additional logic. Moreover, the performance of one slot can directly impact the next slot, leading to unnecessary delays.

As explained earlier, the leader of slot S can begin the slot once it receives the ticket CommitQC(P’, S-1). However, this means that the leader of slot S must wait for slot S-1 to end before starting. In the parallel multi-slot approach, the leader of slot S can start as soon as it receives the Prepare message for slot S-1, rather than waiting for the entire slot to finish. However, unlike pipelining approaches, each slot in Autobahn still operates independently—only the start time of a slot depends on the previous slot, but each slot has its own consensus messages. In contrast, pipelining allows one consensus message to contain multiple messages from different slots, each from a different phase.

Running multiple consensus instances, one for each slot, could lead to a later slot being committed before an earlier one. This means that if validators commit a later slot before an earlier one, the committed lane cut is more recent, and its causal history contains batches from the earlier slot’s proposal that haven't been committed yet. In this scenario, Autobahn does not terminate earlier slots once a more recent one is committed. Instead, it allows the earlier slots to continue running until they complete, but their output is simply ignored since their batches are already included in the causal history of the more recent proposal that has been committed.

5. Additional Insights and Discussions

5.1 Non-blocking Sync

Notice that Autobahn’s data dissemination layer more closely resembles the linear chain structure of traditional BFT rather than the DAG-based BFT. The overall protocol functions like parallel consensus, where each validator creates its own chain of blocks. At certain points, validators cooperate to take a snapshot of the lane cut, merging their individual chains into a single chain through total ordering.

In the data layer, a batch of transactions requires only f+1 votes from validators to be added to a lane (which can be thought of as a chain owned by a validator). This guarantees availability but does not enforce equivocation prevention, as that responsibility is handled by the consensus logic.

An availability certificate alone is sufficient to guarantee that a batch associated with the certificate exists and is stored by at least one honest validator—even if some validators have not yet received it. Since batch B contains the certificate of its parent batch (the batch immediately before it), this creates a recursive structure where the PoA of B not only guarantees the existence of B but also the entire history of B (all preceding batches in the same lane). As a result, a validator can vote on a proposal containing only a lane snapshot (a snapshot of certified batches across all lanes) even if it has not yet received those batches or their history, allowing missing data to be fetched asynchronously.

This approach prevents blocking the consensus process, unlike traditional BFT, where nodes must fully sync data before voting—causing bottlenecks.

5.2 View Change and Proposal Candidate

In normal traditional BFT, Highest_PrepareQC is sufficient to help prevent equivocation, where one group might commit one proposal on one view while another commits a different proposal on another view.

The need for Highest_P arises due to the introduction of the fast path. Since the commit phase is skipped, some validators might commit a proposal from the fast path, while others might experience network delays or faulty leaders. The choice of selecting Highest_P that appears at least f+1 times ensures that when a new view takes place, the honest leader will pick the same proposal that should be committed in the unfinished fast path.

6. Summary

In this blog, we explored Autobahn, the inspiration behind Somnia’s consensus protocol. We saw how Autobahn combines the best aspects of both traditional BFT and DAG-based BFT, achieving seamlessness while maintaining low latency.

Although Autobahn benefits from data dissemination and consensus decoupling, similar to DAG-based approaches, it introduces its own unique method of handling data dissemination. Autobahn’s data layer allows validators to build their lanes independently and at their own pace, enabling horizontally scalable and non-blocking synchronization.

The consensus layer, built on top of the data layer, inherits core principles from traditional BFT but optimizes efficiency by leveraging the structure of the data layer. Instead of committing fixed-size blocks, Autobahn commits lane snapshots (lane cuts) along with their entire backlog, improving performance.

Additionally, Autobahn incorporates further optimizations, such as fast-path execution and multi-slot processing, which further minimize consensus latency. As a result, Autobahn achieves low latency, hangover resistance, and high throughput, making it an ideal consensus mechanism for large-scale blockchain adoption.

7. References

[1] https://arxiv.org/pdf/2401.10369
[2] https://mirror.xyz/contributiondaoblog.eth/5nbATpfI0G-ryvhCAexc0_bCdVAk1AQbqC0HZ8hTNgc
[3] https://mirror.xyz/contributiondaoblog.eth/k3oqzOxwRXXDWLmzLYNxEoLKpoPA25ke1EOBLvqOzYg
[4] https://mirror.xyz/contributiondaoblog.eth/5nbATpfI0G-ryvhCAexc0_bCdVAk1AQbqC0HZ8hTNgc
[5] https://mirror.xyz/contributiondaoblog.eth/_VgDknMw1ysJlrXQwjpZnOC2sYRcPEEYPMakzDfoa2o

More from ContributionDAO

More from ContributionDAO

No activity yet

More from ContributionDAO

ContributionDAO

No activity yet

More from ContributionDAO

Somnia - Understanding Autobahn, the idea behind Somnia Consensus

Somnia - Understanding Autobahn, the idea behind Somnia Consensus

No activity yet

No activity yet

1. Introduction

2. Definition of Hangover and Seamlessness