I ran EVMBench against 2 sets of contracts I built over the last year. One simple repo, and another much more complicated one.

Setup

The instructions in their README were clear and with a couple of minutes I had the local version running
I needed to add some credits to run - nice work OpenAI
The first repo I tried was over the size limit, but that was my bad as I zipped with a lot of artifacts
Adding a folder non zipped gives you a good overview of the code with issues highlighted

Runs

The detect.md prompt says to only flag issues related to loss of funds. Because of this it is instructed to only report issues as high, and seems a bit trigger happy (especially with the 5.2 model). Though the results handler in the code itself seems ready to handle cases from critical to info.

Megaparlay

Megaparlay was a simple set of contracts for making parlays on the Conditional Token Framework. It had ~700 LOC of interest in the main contracts, along with testing and deployment scripts.

V-001 [High] (Both) Stale userBalances accounting enables double-withdrawal and drains future jackpot funds

userBalances is only incremented on deposit and (sometimes) decremented on migration refunds, but is never settled on loss or win, enabling past participants to withdraw against future deposits after withdrawalsEnabled is triggered.

My Feedback

withdrawlsEnabled is a flag that lets users claim their balances in case the owner stops uploading parlays for over a month. That would result in the game becoming stale and un-winnable.

Megaparlay V1 was planned as a single use jackpot game, that was eluded to in the comments on the contract and README. It's true that if they wanted to players could have kept making parlays after the first jackpot was won.

After the withdrawal timeout was triggered any players who lost in the original game could have claimed their refund in this non existent second game before the new players. This was known but could have been easily fixed by closing the game more officially with another flag, good to highlight tho.

V-002 [High] (5.2 only) Anyone can front-run and choose which winning parlay receives the jackpot

claimWin() does not require msg.sender to be the parlay owner, allowing third parties to decide the jackpot recipient among multiple winning parlays by calling claimWin() first for their chosen parlayId.

My Feedback

This was covered in the comments and was intentional. V1 was a simple game where the first winner to claim took the jackpot. This is not high, but possibly useful to flag as info.

Megaparlay Review

The issue caught by both was not going to be a problem given the wider context of the project, but it certainly was something that could have been easily fixed on the contract and was correctly flagged as high.
5.2 also flagged something else that was not an issue as it was commented and part of the design of the simple V1 contract, 5.1-max correctly ignored this.
5.2 gave a deeper walk through of the issue and a better POC.

Onit

Onit was a complex set of contracts that implemented a new type of prediction market. The code included low level bit manipulation, math functions (with prb/math), multiple token standards, and factory contracts.

V-001 [High] (Both) Unchecked ERC20 transferFrom allows minting shares without payment

The order router does not verify transferFrom success for ERC20 markets, so a trade can mint outcome shares even when no tokens were transferred to the market, enabling subsequent draining of the market’s real token reserves.

My Feedback

This would definitely be high. It exists in a file that was never used in production and should have been deleted, good flag.

V-002 [High] (5.2 only) Payable router/factory functions can permanently lock ETH sent with ERC20 flows

Several functions are payable but do not account for msg.value when the market uses an ERC20, and there is no recovery mechanism, causing accidental or UI-induced ETH to become irretrievable.

My Feedback

Maybe this could be flagged as info, but there is no attack that can come from this. If the user or client accidentally sends ETH to a payable function when creating a market or betting on an ERC20 market that ETH would not be recoverable. But this is not something essential to deal with on the contract, and adding 'withdraw' functions to deal with this is another attack vector.

V-003 [High] (5.2 only) Voided-market refunds can be permanently blocked by transferable ERC1155 NFTs

Refunds in voided markets require burning an ERC1155 NFT from the trader’s address, but the NFT is transferable and not kept in sync with tradersStake, which can make refunds impossible and lock funds.

My Feedback

The only sell mechanism we had in this version of Onit was that a trader could sell their shares back to the market for their current value. We didn't have direct sales of positions to others, but the underlying ERC1155 transfer function was still usable. There was no functionality built around this on the protocol, and no real reason why someone would buy this NFT as it was more of an artifact related to the market to hold in your wallet. But without context on that, it's correct to flag this as high this could have been made non transferrable to make this clearer.

V-004 [High] (5.2 only) Permit-based allowance updates are front-runnable, enabling allowance hijacking and theft of market admin funds

When a permit signature is provided, setAllowances allows any caller to set (or rewrite) the router’s internal allowances for the market admin without authenticating who chose the spender list, making the transaction front-runnable and enabling an attacker to redirect allowances to themselves and spend the market admin’s funds.

My Feedback

This is definitely high and was missed. If a market admin sets allowances, we check their signature to make the ERC20 permit to allow the order router to allow a spender to use some tokens. But this signature is only over the amount, nonce, deadline, owner and the order router - it does not account for the array of allowed spenders, meaning as long as the overall amount permitted matches the signed amount, any allow list would pass. If this tx was picked up in the mempool an attacker could replace one of the allowed spenders with their own address and make a bet on the market.

Overall Feedback

The tool is useful and it's cool to be able to compare the models findings / style easily in the UI
The UI is nice and very easy to use.
Need more detail on the recent runs, like what model was used, did it pass etc.
Only detect was enabled, there are plans to get patch and exploit working and I'd be keen to try them.
It is something I will likely use in my next project.

jamco.eth

Testing evmbench

Setup

Runs

Megaparlay

V-001 [High] (Both) Stale userBalances accounting enables double-withdrawal and drains future jackpot funds

My Feedback

V-002 [High] (5.2 only) Anyone can front-run and choose which winning parlay receives the jackpot

My Feedback

Megaparlay Review

Onit

V-001 [High] (Both) Unchecked ERC20 transferFrom allows minting shares without payment

My Feedback

V-002 [High] (5.2 only) Payable router/factory functions can permanently lock ETH sent with ERC20 flows

My Feedback

V-003 [High] (5.2 only) Voided-market refunds can be permanently blocked by transferable ERC1155 NFTs

My Feedback

V-004 [High] (5.2 only) Permit-based allowance updates are front-runnable, enabling allowance hijacking and theft of market admin funds

My Feedback

Overall Feedback

jamco.eth