# 100x faster EVM traces with Python

By [banteg](https://paragraph.com/@banteg) · 2022-07-26

---

An introduction to [evm-trace](https://github.com/apeworx/evm-trace), a fast and correct Python library to work with Ethereum Virtual Machine traces.

There are two common trace formats, which we’ll refer to as Geth-style and Parity-style traces. Geth traces provide opcode-level resolution, while the most commonly used Parity trace format provides a call tree.

Our goal today would be to make opcode-resolution traces 100x faster, making them from a little-known Parity trace format. We’ll also make them compatible with Geth traces so they can be used as a drop-in replacement.

debug\_traceTransaction
=======================

[Geth traces](https://geth.ethereum.org/docs/rpc/ns-debug#debug_tracetransaction) return frames containing the full state of the virtual machine at each step of the execution, including the program counter, opcode, remaining gas, gas cost of the operation, call depth, items on the stack, a snapshot of the memory, and a storage diff.

Each subcall starts with a fresh and separate memory, stack, and storage changes.

It’s a powerful but a bit unwieldy format, since you need to do extra work to know things like what’s the current contract being called or what’s the calldata.

I started digging into this when I [stumbled upon](https://github.com/ledgerwatch/erigon/issues/4637) several transactions that crash Erigon if you request a debug trace. It appeared that they generated very large traces, and soon enough I was able to confirm why when Erigon [re-enabled streaming](https://github.com/ledgerwatch/erigon/pull/4647) for heavy RPC endpoints.

In Geth format each frame contains the full copy of the EVM memory, which can lead to traces that don’t fit in the system memory. A contract expanding the memory to 100kb could generate traces [as large as 70 GB](https://gist.github.com/banteg/98dbccbf6e2a3f997199a1b16eb93c5a).

This can be addressed by streaming and [iteratively decoding](https://github.com/ApeWorX/ape/pull/885) the response as you get it from the Ethereum client. This is what ape does by default with Geth traces when you call `receipt.get_transaction_trace()`.

If we look up the memory expansion gas cost formulas at [evm.codes](https://www.evm.codes/about#memoryexpansion), it becomes apparent that EVM memory is not that expensive and we can possibly construct or find transactions with even larger traces.

![memory_cost = (memory_size_word \*\* 2) / 512 + (3 \* memory_size_word)](https://storage.googleapis.com/papyrus_images/70c90df0940d633e9944e025cb590037f4216ff6b56ef16db48e9062085c3a04.png)

memory\_cost = (memory\_size\_word \\\*\\\* 2) / 512 + (3 \\\* memory\_size\_word)

After sampling peak memory usage over 10,000 most recent blocks, we find even larger traces. For [one of those](https://etherscan.io/tx/0x9ce97c7e8040002df869fff04caa9b5f176dd11859ab4af317bd86fd4455918d), Erigon hangs up on us after 30 minutes, after returning 460 GB and 898,836 frames.

We can patch Erigon to [increase the write timeout](https://github.com/ledgerwatch/erigon/blob/b6ea28ea806b922617eec3df2ea3ffa663581ad5/rpc/rpccfg/rpccfg.go#L35), and learn that the full trace consists of 960,198 frames, takes up 524 GB and 34m7s to fetch, before deserialization or processing. We can calculate that we got this trace at 275 MB/s and **469 frames/s**.

Keep that number in mind, because with a method presented below, we can recompute the same trace using 137 MB of data (3916x less), which we can fetch in 5.6s, deserialize in 7.1s and **replay in 4.6 seconds**, with **full processing taking 117x less time** than just fetching a Geth trace.

![contracts with peak memory usage](https://storage.googleapis.com/papyrus_images/211eefe56b51f2921c5a98eafca469295559748fa9188602cdb99a00a52a13be.png)

contracts with peak memory usage

trace\_replayTransaction
------------------------

[Parity traces](https://openethereum.github.io/JSONRPC-trace-module) are quite different from Geth, the ad hoc traces come in three different flavors: `trace`, `vmTrace`, and `stateDiff`.

> Parity traces are supported in Erigon, Nethermind, Besu, Foundry. They are not supported in Geth, Hardhat, Ganache. Anything beyond `trace` type requires an archive node.

### trace

The `trace` is the most widely used mode, it returns a call tree, similar to what you can see on [ethtx.info](https://ethtx.info/) or when calling `receipt.show_trace()` in [ape](https://github.com/apeworx/ape). If you need a raw call tree, you can get it from `chain.provider.get_call_tree(tx)`. You can also provide a custom display class and only display the parts you care about.

![Lido.executeVote as raw and decorated call tree](https://storage.googleapis.com/papyrus_images/e393de6bb7b114b929fcf98b1aa4609b354dee08a290de13d93d7bc178dbe3bc.png)

Lido.executeVote as raw and decorated call tree

### stateDiff

The `stateDiff` mode shows the changes of balance, code, nonce, and storage grouped by account.

![a raw stateDiff](https://storage.googleapis.com/papyrus_images/0d561339d705d705843f013ebde71484d4e32189d1d0cb0a2566ff6c3ce4c0e3.png)

a raw stateDiff

A more interesting problem here is how to map storage keys to readable names like `balanceOf[account]`. I’ve seen it solved in [Tenderly](https://dashboard.tenderly.co), which is a proprietary tool.

Maybe you, the reader, could be the one to tackle this problem and release it as an open source tool for everyone to use.

![a decoded stateDiff in tenderly](https://storage.googleapis.com/papyrus_images/bc262d784448157246470d0fa2abaf4ef30c47fb7696640eded83e58e9b64448.png)

a decoded stateDiff in tenderly

### vmTrace

The `vmTrace` mode is one of the most enigmatic and rarely used, mainly because it was never really documented. This mode will be the main focus of this article and I’ll show you how you can make a drop-in replacement for Geth traces that requires 1000x less data and works 100x faster.

vmTrace returns a nested structure of incremental updates which we can use to replay the transaction, while tracking stack, memory, storage at each step of the execution. Of course, the incremental format doesn’t come for free and requires some legwork.

vmTrace struct
--------------

Here is what we get in a `vmTrace` response:

*   `VMTrace` represents a call and contains all subcalls
    
    *   `code` EVM bytecode to be executed
        
    *   `ops` list of `VMOperation` to be executed
        
*   `VMOperation` represents a step of the execution
    
    *   `pc` program counter
        
    *   `cost` gas cost of the operation
        
    *   `ex` the result of the execution, could be `null` if the operation has reverted
        
    *   `sub` list of `VMTrace` subcalls
        
    *   `op` opcode name
        
    *   `idx` index in the call tree
        
*   `VMExecutedOperation` represents side effects of executing the operation
    
    *   `used` incorrectly named, shows the remaining gas
        
    *   `push` the items to be placed on the stack
        
    *   `mem` the memory delta
        
    *   `store` the storage delta
        
*   `MemoryDiff` represents a memory update
    
    *   `off` memory offset where to write the data
        
    *   `data` the bytes to write starting at the offset
        
*   `StorageDiff` represents a storage write
    
    *   `key` storage key to write to
        
    *   `val` value to write
        

I defined this structure using [msgspec](https://jcristharif.com/msgspec/) to get fast deserialization and type validation.

Missing pieces
--------------

Now we need to replay the transaction, applying the incremental updates and untangling the nested structure into a flat execution order.

I borrowed the real [Stack](https://github.com/ethereum/py-evm/blob/master/eth/vm/stack.py) and [Memory](https://github.com/ethereum/py-evm/blob/master/eth/vm/memory.py) implementations from [py-evm](https://github.com/ethereum/py-evm), which is a great EVM reference itself.

I wrote a small harness to do [pairwise comparison of each frame](https://github.com/banteg/vmtrace/blob/main/demo.py) of my synthetic trace with Geth trace frames, which exited after the first divergence in stack or memory. I learned that Geth sends frames after memory expansion but before anything from `ex` has been applied.

Memory was no problem, but the stack was missing a few important pieces. We had `push`, but no info about the number of items popped from the stack by each opcode.

### Popcodes

I’ve assembled this list by hand using [this file](https://github.com/ethereumjs/ethereumjs-monorepo/blob/master/packages/evm/src/opcodes/functions.ts) from `ethereumjs` as a reference. These opcodes pop a fixed amount of items:

1.  `EXTCODEHASH`, `ISZERO`, `NOT`, `BALANCE`, `CALLDATALOAD`, `EXTCODESIZE`, `BLOCKHASH`, `POP`, `MLOAD`, `SLOAD`, `JUMP`, `SELFDESTRUCT`
    
2.  `SHL`, `SHR`, `SAR`, `REVERT`, `ADD`, `MUL`, `SUB`, `DIV`, `SDIV`, `MOD`, `SMOD`, `EXP`, `SIGNEXTEND`, `LT`, `GT`, `SLT`, `SGT`, `EQ`, `AND`, `XOR`, `OR`, `BYTE`, `SHA3`, `MSTORE`, `MSTORE8`, `SSTORE`, `JUMPI`, `RETURN`
    
3.  `RETURNDATACOPY`, `ADDMOD`, `MULMOD`, `CALLDATACOPY`, `CODECOPY`, `CREATE`
    
4.  `CREATE2`, `EXTCODECOPY`
    
5.  —
    
6.  `STATICCALL`, `DELEGATECALL`
    
7.  `CALL`, `CALLCODE`
    

These opcodes can be expressed as a formula:

*   `LOG0-LOG4` pops 2-6 items
    
*   `SWAP1-SWAP16` pops¹ 2-17 items
    
*   `DUP1-DUP16` pops¹ 1-16 items
    

¹ not really, this is only true for how the vmTrace format is served

### Opcode incognita

Some transactions reconciled, but the stack still diverged, especially for large transactions. Using foundry’s debugger I found that the incremental updates have missed occasional pushes for several opcodes, namely `SMOD`, `CHAINID`, `COINBASE`. I [fixed this](https://github.com/ledgerwatch/erigon/pull/4806) in Erigon and the patch has already made it to `2022.07.03-alpha` release.

This finding has made me question a lot of things. Most people fully trust the data they get from the node, but Ethereum is a very complex system and bugs happen. It’s great having so many clients and references and different approaches to check on each other.

Performance
-----------

I’ve added full block tracing with opcode-level resolution and an option to return a `memoryview` of the memory `bytearray` instead of copying it for each frame. It works like a pointer and comes with a caveat that you must process the frame immediately after receiving it from the iterator, otherwise the value may be mutated. It should work for most streaming workflows with traces, since you are usually scanning for some specific frames and don’t look back.

I’ve benchmarked evm-trace against the largest traces I could identify. Most of them were replayed in under 3 seconds, with no significant slowdown even when a lot of memory is allocated. I’ve also benchmarked it against 1000 most recent Ethereum mainnet blocks. It was pleasant seeing the needle has moved from seconds per trace to traces per second.

![evm-trace performance benchmark](https://storage.googleapis.com/papyrus_images/f3ac613e18ac2d4a38c3b8fda490d8f59f0575ef50f963f5536917908afc7b8a.png)

evm-trace performance benchmark

When dealing with full blocks, you can expect to fetch a median of 27.4 transactions per second from the node and **replay 108.8 transactions per second** at around **252,000 frames per second**. When requesting traces of individual transactions, your client replays all transactions in the block up to the requested transaction index to get to the correct state and discards all other traces. To give you the intuition, here is how many frames you would usually encounter in a block:

![](https://storage.googleapis.com/papyrus_images/d3021abfe79eecaa3b1fcfc2be2d29063f368587e9ab57780963197461650186.png)

Give it a try
-------------

This lightning-fast drop-in replacement for Geth-style traces is available in the latest [evm-trace](https://github.com/ApeWorX/evm-trace) and soon will be integrated into [ape](https://github.com/ApeWorX/ape), your new favorite Ethereum development and data science framework.

References
----------

*   Rust web3 [VMTrace type](https://docs.rs/web3/latest/web3/types/struct.VMTrace.html)
    
*   Erigon’s [instructions.go](https://github.com/ledgerwatch/erigon/blob/devel/core/vm/instructions.go) and [trace\_adhoc.go](https://github.com/ledgerwatch/erigon/blob/devel/cmd/rpcdaemon/commands/trace_adhoc.go)
    
*   [EVM chapter](https://github.com/ethereumbook/ethereumbook/blob/develop/13evm.asciidoc) from ethereumbook
    
*   [EVM. How does it work? Part 2.](https://medium.com/coinmonks/ethereum-virtual-machine-evm-how-does-it-work-part-2-4198401d2a11)
    
*   [Getting Deep Into EVM: How Ethereum Works Backstage](https://hackernoon.com/getting-deep-into-evm-how-ethereum-works-backstage-ac7efa1f0015)

---

*Originally published on [banteg](https://paragraph.com/@banteg/100x-faster-evm-traces-with-python)*
