Simple Introduction to Blockchain for Software Engineers

Recently I bumped into the Andreessen Horowitz’s Crypto Startup School which has published a playbook to start crypto companies. I was always interested in learning about the Blockchain so I watched a lecture “Blockchain Primitives: Cryptography and Consensus” by Stanford Professor of Computer Science — Dan Boneh. It was one of the best talks I had ever seen on block chain. I am well versed in distributed systems and understand how they are architected, scale etc. Professor Dan Boneh explained how a blockchain is similar to and different from a distributed systems, which made a lot of sense to me (finally). I thought to document the main parts of the lecture for those who understand distributed systems and want to demystify Blockchain tech. Disclaimer: All the images below are copyright of the a16z.com. You can watch the video in full here

Seems like Blockchain has multiple layers, just like a multi-tier application (which has a data layer, an application later and the frontend layer). This was new to me. In blockchain, there are 4 layers

Layer 1: Consensus Layer — this layer makes sure everyone sees the same data. This is similar to a centralized database layer where every user or application using the database will see the exact same data.

The characteristics of this layers are

What is consensus?

In a traditional system, the state of the system is replicated on all hosts and there are known number of servers, each of which is authorized. This makes sure every host is aware of the most recent state of the system. In a block chain, where anyone can write data, it was one proved that it is impossible to achieve consensus if you don’t have authentication and you dont know how many people are participating in the network. Nakamoto’s method of using proof of work gets around this problem

How blocks get added to the Blockchain? — There are miners and there are participants (regular users like me) in the Blockchain (BC). Each participant has a secret key which signifies ownership. You own the token and data and no one can take it away from you, unless, you sign it away using your secret key or your secret key is stolen. This is similar to SSL authentication or any other authentication we use in a distributed system.

  1. These transactions are sent to miners

  2. The consensus system picks a random leader amongst multiple miners

  3. The random leader orders the blocks and posts to the blockchain and they get paid by the native currency (in this case ETH). The other miners verify the block and if the block is not verified, the block is discarded.

  4. The leader gets to decide the order of the blocks (ie order of transactions). This is super powerful ability for the miner. If lot of people are trying to sell currencies in the blockchain, who gets to sell first can be determined by the leader. This power can be “misused”. This problem is still not solved for — where one miner gets to decide the order of transactions in the block.

  5. Sybil attack — since every-time a miner adds a block to the block chain, they get paid in the currency, it is possible for a miner to have thousands of miner accounts such that one minor always gets chosen and gets paid. This is called a “sybil attack”. Nakamoto’s innovation solved this problem by forcing miners to commit to “resources”.

  6. In Bitcoin the resource commitment is proof-of-work, that is to pretend to have 1000 miners, one needs to have 1000 machines. Proof of work is the first generation of blockchain and it wastes lot of energy (computers computing the blocks and competing to submit to the blockchain first) and we may be moving away from it. Future mechanisms, such as “proof-of-stake”, are going to be much more energy efficient.The proof of stake being adopted by other blockchains.

  7. “Proof of work” and “proof of stake” are the two major consensus mechanisms cryptocurrencies use to verify new transactions, add them to the blockchain, and create new tokens. Proof of work, first pioneered by Bitcoin, uses mining to achieve those goals. Proof of stake — which is employed by Cardano, the ETH2 blockchain, and others — uses staking to achieve the same things.

  8. See more explanation from Coinbase here

  9. Layer 1.5: Compute Layer (Blockchain computer) — this is kind of the operating system of blockchain. Think of this as the compute layer of distributed system where your system maybe performing large distributed operations such as map reduce, DB lookups, business logic execution etc.

  10. You write applications and post it on the blockchain and these apps respond to outside events. These events are called transactions and people send transactions to apps running on the blockchain and the apps process these transactions.

  11. Unlike a distributed app (such as a banking app) where we cannot verify the business logic and have to trust the app is doing the right thing, the blockchain apps can be verified by anyone.

  12. Deploying app on the blockchain — This is very similar to writing a distributed app. You create the app in solidity or other language. you compile it in the a format such as evm for Ethereum, then you post the code to the blockchain. The code or the app starts with an initial state (the purple block below)

  13. When the app processes some transaction, it changes its state. Each such action changes the state of the app

  14. Transactions per second: Ethereum network can process only 15 transactions per second — thats about 1.3M transactions per day. On a typical large distributed system such as Amazon or Instagram, they can process billions of transactions on a daily basis. This is the biggest bottleneck of blockchains so the apps have to make sure they are able to scale with this constraint.

  15. Ethereum’s decentralized design ends up limiting the amount of transactions it can process to just 15 per second. Since Ethereum’s popularity far exceeds 15 transactions per second, the result is long waits and fees as high as $200 per transaction. Ultimately, this prices out many users and limits the types of applications Ethereum can handle today.

  16. Execution environments — Similar to execution environment of Java or Node or Python applications, Blockchains have their execution environments.

  17. Bitcoin provides bitcoin script which is a restrictive language without the support of loops so it cannot do many things. Ethereum provides more general purpose execution environment. Ethereum uses EVM as the virtual machine but many new modern blockchains are moving to WebAssembly which allows apps to be written in any regular language (such as Java) and can be compiled to EVM byte code.

  18. Gas fee — Any data that you put on blockchain is expensive. You need to pay a fee to write data to the blockchain known as the Gas fee. Blockchain App Developers have the program to avoid storing data in the blockchain as much possible.

  19. Layer 2: Application layer — This sits about the Compute layer and is the layer where applications are built. This is similar to the application server layer of a distributed system which interacts as a bridge between the frontend layer (or user layer) and the data/compute layers. In a distributed system, we write these in languages like Java, Python, Node.JS etc but in blockchain, the applications are written in languages like Solidity, Move, Motoko. These applications are run on the Compute Layer or the Blockchain computer.

  20. Production bugs can cost a lot on blockchain as you can jeopardize transactions worth millions of dollars. So the choice of programming language is important.

  21. Layer 3: User Interface layer — this is similar to the frontend layer of a distributed system. This it the user facing layer of the blockchain. Users dont talk to the blockchain, they talk to web applications running on the cloud. These web applications get data from the decentralized apps running on the blockchain.

  22. As of late late 2021, most of the development is happening in Layer 2 (application layer)

  23. Layer 1 and Layer 1.5 can sometimes be combined together. You can think of this as an infrastructure layer (think EC2) which we, the distributed system guys, take it for granted :-)

  24. Thats all for now! I hope this explanation was simple and intuitive to novice engineers but if not, I highly recommend watching the video.