
I ran EVMBench against 2 sets of contracts I built over the last year. One simple repo, and another much more complicated one.
The instructions in their README were clear and with a couple of minutes I had the local version running
I needed to add some credits to run - nice work OpenAI
The first repo I tried was over the size limit, but that was my bad as I zipped with a lot of artifacts
Adding a folder non zipped gives you a good overview of the code with issues highlighted

The detect.md prompt says to only flag issues related to loss of funds. Because of this it is instructed to only report issues as high, and seems a bit trigger happy (especially with the 5.2 model). Though the results handler in the code itself seems ready to handle cases from critical to info.
Megaparlay was a simple set of contracts for making parlays on the Conditional Token Framework. It had ~700 LOC of interest in the main contracts, along with testing and deployment scripts.
userBalances is only incremented on deposit and (sometimes) decremented on migration refunds, but is never settled on loss or win, enabling past participants to withdraw against future deposits after withdrawalsEnabled is triggered.
withdrawlsEnabled is a flag that lets users claim their balances in case the owner stops uploading parlays for over a month. That would result in the game becoming stale and un-winnable.
Megaparlay V1 was planned as a single use jackpot game, that was eluded to in the comments on the contract and README. It's true that if they wanted to players could have kept making parlays after the first jackpot was won.
After the withdrawal timeout was triggered any players who lost in the original game could have claimed their refund in this non existent second game before the new players. This was known but could have been easily fixed by closing the game more officially with another flag, good to highlight tho.
claimWin() does not require msg.sender to be the parlay owner, allowing third parties to decide the jackpot recipient among multiple winning parlays by calling claimWin() first for their chosen parlayId.
This was covered in the comments and was intentional. V1 was a simple game where the first winner to claim took the jackpot. This is not high, but possibly useful to flag as info.
The issue caught by both was not going to be a problem given the wider context of the project, but it certainly was something that could have been easily fixed on the contract and was correctly flagged as high.
5.2 also flagged something else that was not an issue as it was commented and part of the design of the simple V1 contract, 5.1-max correctly ignored this.
5.2 gave a deeper walk through of the issue and a better POC.
Onit was a complex set of contracts that implemented a new type of prediction market. The code included low level bit manipulation, math functions (with prb/math), multiple token standards, and factory contracts.
The order router does not verify transferFrom success for ERC20 markets, so a trade can mint outcome shares even when no tokens were transferred to the market, enabling subsequent draining of the market’s real token reserves.
This would definitely be high. It exists in a file that was never used in production and should have been deleted, good flag.
Several functions are payable but do not account for msg.value when the market uses an ERC20, and there is no recovery mechanism, causing accidental or UI-induced ETH to become irretrievable.
Maybe this could be flagged as info, but there is no attack that can come from this. If the user or client accidentally sends ETH to a payable function when creating a market or betting on an ERC20 market that ETH would not be recoverable. But this is not something essential to deal with on the contract, and adding 'withdraw' functions to deal with this is another attack vector.
Refunds in voided markets require burning an ERC1155 NFT from the trader’s address, but the NFT is transferable and not kept in sync with tradersStake, which can make refunds impossible and lock funds.
The only sell mechanism we had in this version of Onit was that a trader could sell their shares back to the market for their current value. We didn't have direct sales of positions to others, but the underlying ERC1155 transfer function was still usable. There was no functionality built around this on the protocol, and no real reason why someone would buy this NFT as it was more of an artifact related to the market to hold in your wallet. But without context on that, it's correct to flag this as high this could have been made non transferrable to make this clearer.
When a permit signature is provided, setAllowances allows any caller to set (or rewrite) the router’s internal allowances for the market admin without authenticating who chose the spender list, making the transaction front-runnable and enabling an attacker to redirect allowances to themselves and spend the market admin’s funds.
This is definitely high and was missed. If a market admin sets allowances, we check their signature to make the ERC20 permit to allow the order router to allow a spender to use some tokens. But this signature is only over the amount, nonce, deadline, owner and the order router - it does not account for the array of allowed spenders, meaning as long as the overall amount permitted matches the signed amount, any allow list would pass. If this tx was picked up in the mempool an attacker could replace one of the allowed spenders with their own address and make a bet on the market.
The tool is useful and it's cool to be able to compare the models findings / style easily in the UI
The UI is nice and very easy to use.
Need more detail on the recent runs, like what model was used, did it pass etc.

Only detect was enabled, there are plans to get patch and exploit working and I'd be keen to try them.
It is something I will likely use in my next project.

I ran EVMBench against 2 sets of contracts I built over the last year. One simple repo, and another much more complicated one.
The instructions in their README were clear and with a couple of minutes I had the local version running
I needed to add some credits to run - nice work OpenAI
The first repo I tried was over the size limit, but that was my bad as I zipped with a lot of artifacts
Adding a folder non zipped gives you a good overview of the code with issues highlighted

The detect.md prompt says to only flag issues related to loss of funds. Because of this it is instructed to only report issues as high, and seems a bit trigger happy (especially with the 5.2 model). Though the results handler in the code itself seems ready to handle cases from critical to info.
Megaparlay was a simple set of contracts for making parlays on the Conditional Token Framework. It had ~700 LOC of interest in the main contracts, along with testing and deployment scripts.
userBalances is only incremented on deposit and (sometimes) decremented on migration refunds, but is never settled on loss or win, enabling past participants to withdraw against future deposits after withdrawalsEnabled is triggered.
withdrawlsEnabled is a flag that lets users claim their balances in case the owner stops uploading parlays for over a month. That would result in the game becoming stale and un-winnable.
Megaparlay V1 was planned as a single use jackpot game, that was eluded to in the comments on the contract and README. It's true that if they wanted to players could have kept making parlays after the first jackpot was won.
After the withdrawal timeout was triggered any players who lost in the original game could have claimed their refund in this non existent second game before the new players. This was known but could have been easily fixed by closing the game more officially with another flag, good to highlight tho.
claimWin() does not require msg.sender to be the parlay owner, allowing third parties to decide the jackpot recipient among multiple winning parlays by calling claimWin() first for their chosen parlayId.
This was covered in the comments and was intentional. V1 was a simple game where the first winner to claim took the jackpot. This is not high, but possibly useful to flag as info.
The issue caught by both was not going to be a problem given the wider context of the project, but it certainly was something that could have been easily fixed on the contract and was correctly flagged as high.
5.2 also flagged something else that was not an issue as it was commented and part of the design of the simple V1 contract, 5.1-max correctly ignored this.
5.2 gave a deeper walk through of the issue and a better POC.
Onit was a complex set of contracts that implemented a new type of prediction market. The code included low level bit manipulation, math functions (with prb/math), multiple token standards, and factory contracts.
The order router does not verify transferFrom success for ERC20 markets, so a trade can mint outcome shares even when no tokens were transferred to the market, enabling subsequent draining of the market’s real token reserves.
This would definitely be high. It exists in a file that was never used in production and should have been deleted, good flag.
Several functions are payable but do not account for msg.value when the market uses an ERC20, and there is no recovery mechanism, causing accidental or UI-induced ETH to become irretrievable.
Maybe this could be flagged as info, but there is no attack that can come from this. If the user or client accidentally sends ETH to a payable function when creating a market or betting on an ERC20 market that ETH would not be recoverable. But this is not something essential to deal with on the contract, and adding 'withdraw' functions to deal with this is another attack vector.
Refunds in voided markets require burning an ERC1155 NFT from the trader’s address, but the NFT is transferable and not kept in sync with tradersStake, which can make refunds impossible and lock funds.
The only sell mechanism we had in this version of Onit was that a trader could sell their shares back to the market for their current value. We didn't have direct sales of positions to others, but the underlying ERC1155 transfer function was still usable. There was no functionality built around this on the protocol, and no real reason why someone would buy this NFT as it was more of an artifact related to the market to hold in your wallet. But without context on that, it's correct to flag this as high this could have been made non transferrable to make this clearer.
When a permit signature is provided, setAllowances allows any caller to set (or rewrite) the router’s internal allowances for the market admin without authenticating who chose the spender list, making the transaction front-runnable and enabling an attacker to redirect allowances to themselves and spend the market admin’s funds.
This is definitely high and was missed. If a market admin sets allowances, we check their signature to make the ERC20 permit to allow the order router to allow a spender to use some tokens. But this signature is only over the amount, nonce, deadline, owner and the order router - it does not account for the array of allowed spenders, meaning as long as the overall amount permitted matches the signed amount, any allow list would pass. If this tx was picked up in the mempool an attacker could replace one of the allowed spenders with their own address and make a bet on the market.
The tool is useful and it's cool to be able to compare the models findings / style easily in the UI
The UI is nice and very easy to use.
Need more detail on the recent runs, like what model was used, did it pass etc.

Only detect was enabled, there are plans to get patch and exploit working and I'd be keen to try them.
It is something I will likely use in my next project.
<100 subscribers
<100 subscribers

Probability Density Function Library In Solidity
Documenting the process of building PdfLib.sol and explaining some of it's features

Making Smart Accounts Easy To Use For Developers

A Year In Berlin
It's been 1 year since I moved to Berlin. I had a good impression after my first month here, and I'd say after a year it lived up to that.

Probability Density Function Library In Solidity
Documenting the process of building PdfLib.sol and explaining some of it's features

Making Smart Accounts Easy To Use For Developers

A Year In Berlin
It's been 1 year since I moved to Berlin. I had a good impression after my first month here, and I'd say after a year it lived up to that.
Share Dialog
Share Dialog
No comments yet