# Testing evmbench

By [jamco.eth](https://paragraph.com/@jamesmccomish) · 2026-02-22

evm, paradigm, openai, llm

---

I ran [EVMBench](https://paradigm.xyz/evmbench) against 2 sets of contracts I built over the last year. One simple repo, and another much more complicated one.

Setup
=====

*   The instructions in their [README](https://github.com/paradigmxyz/evmbench) were clear and with a couple of minutes I had the local version running
    
*   I needed to add some credits to run - nice work OpenAI
    
*   The first repo I tried was over the size limit, but that was my bad as I zipped with a lot of artifacts
    
*   Adding a folder non zipped gives you a good overview of the code with issues highlighted
    

Runs
====

![](https://storage.googleapis.com/papyrus_images/480decb5c52ba82b211acc364f95fdd6f2be3f4fec5cc440a77736ca9fa37727.png)

The `detect.md` prompt says to only flag issues related to loss of funds. Because of this it is instructed to **only report issues as high**, and seems a bit trigger happy (especially with the 5.2 model). Though the [results handler](https://github.com/paradigmxyz/evmbench/blob/main/backend/resultsvc/routers/v1.py) in the code itself seems ready to handle cases from critical to info.

Megaparlay
----------

[Megaparlay](https://github.com/jamesmccomish/megaparlay) was a simple set of contracts for making parlays on the Conditional Token Framework. It had ~700 LOC of interest in the main contracts, along with testing and deployment scripts.

#### V-001 \[High\] (Both) Stale userBalances accounting enables double-withdrawal and drains future jackpot funds

`userBalances` _is only incremented on deposit and (sometimes) decremented on migration refunds, but is never settled on loss or win, enabling past participants to withdraw against future deposits after_ `withdrawalsEnabled` _is triggered._

#### My Feedback

`withdrawlsEnabled` is a flag that lets users claim their balances in case the owner stops uploading parlays for over a month. That would result in the game becoming stale and un-winnable.

Megaparlay V1 was planned as a single use jackpot game, that was eluded to in the comments on the contract and README. It's true that if they wanted to players could have kept making parlays after the first jackpot was won.

After the withdrawal timeout was triggered any players who lost in the original game could have claimed their refund in this non existent second game before the new players. **This was known but could have been easily fixed by closing the game more officially with another flag, good to highlight tho.**

#### V-002 \[High\] (5.2 only) Anyone can front-run and choose which winning parlay receives the jackpot

`claimWin()` _does not require_ `msg.sender` _to be the parlay owner, allowing third parties to decide the jackpot recipient among multiple winning parlays by calling_ `claimWin()` _first for their chosen parlayId._

#### My Feedback

This was covered in the comments and was intentional. V1 was a simple game where the first winner to claim took the jackpot. **This is not high, but possibly useful to flag as info.**

### Megaparlay Review

*   The issue caught by both was not going to be a problem given the wider context of the project, but it certainly was something that could have been easily fixed on the contract and was correctly flagged as high.
    
*   5.2 also flagged something else that was not an issue as it was commented and part of the design of the simple V1 contract, 5.1-max correctly ignored this.
    
*   5.2 gave a deeper walk through of the issue and a better POC.
    

Onit
----

[Onit](https://github.com/onit-labs/pm-contracts) was a complex set of contracts that implemented a new type of prediction market. The code included low level bit manipulation, math functions (with prb/math), multiple token standards, and factory contracts.

#### V-001 \[High\] (Both) Unchecked ERC20 transferFrom allows minting shares without payment

_The order router does not verify_ `transferFrom` _success for ERC20 markets, so a trade can mint outcome shares even when no tokens were transferred to the market, enabling subsequent draining of the market’s real token reserves._

### My Feedback

**This would definitely be high**. **It exists in a file that was never used** in production and should have been deleted, good flag.

#### V-002 \[High\] (5.2 only) Payable router/factory functions can permanently lock ETH sent with ERC20 flows

_Several functions are_ `payable` _but do not account for_ `msg.value` _when the market uses an ERC20, and there is no recovery mechanism, causing accidental or UI-induced ETH to become irretrievable._

### My Feedback

**Maybe this could be flagged as info, but there is no attack that can come from this**. If the user or client accidentally sends ETH to a payable function when creating a market or betting on an ERC20 market that ETH would not be recoverable. But this is not something essential to deal with on the contract, and adding 'withdraw' functions to deal with this is another attack vector.

#### V-003 \[High\] (5.2 only) Voided-market refunds can be permanently blocked by transferable ERC1155 NFTs

_Refunds in voided markets require burning an ERC1155 NFT from the trader’s address, but the NFT is transferable and not kept in sync with_ `tradersStake`_, which can make refunds impossible and lock funds._

### My Feedback

The only sell mechanism we had in this version of Onit was that a trader could sell their shares back to the market for their current value. We didn't have direct sales of positions to others, but the underlying ERC1155 transfer function was still usable. There was no functionality built around this on the protocol, and no real reason why someone would buy this NFT as it was more of an artifact related to the market to hold in your wallet. **But without context on that, it's correct to flag this as high** this could have been made non transferrable to make this clearer.

#### V-004 \[High\] (5.2 only) Permit-based allowance updates are front-runnable, enabling allowance hijacking and theft of market admin funds

_When a permit signature is provided,_ `setAllowances` _allows any caller to set (or rewrite) the router’s internal allowances for the market admin without authenticating who chose the spender list, making the transaction front-runnable and enabling an attacker to redirect allowances to themselves and spend the market admin’s funds._

### My Feedback

**This is definitely high and was missed**. If a market admin sets allowances, we check their signature to make the ERC20 permit to allow the order router to allow a spender to use some tokens. But this signature is only over the amount, nonce, deadline, owner and the order router - it does not account for the array of allowed spenders, meaning as long as the overall amount permitted matches the signed amount, any allow list would pass. If this tx was picked up in the mempool an attacker could replace one of the allowed spenders with their own address and make a bet on the market.

Overall Feedback
================

*   The tool is useful and it's cool to be able to compare the models findings / style easily in the UI
    
*   The UI is nice and very easy to use.
    
*   Need more detail on the recent runs, like what model was used, did it pass etc.
    
    ![](https://storage.googleapis.com/papyrus_images/4683219c3b02fef4e30666382d06e0dd02bacf29abb5d73c7d1489a84b09e9a3.png)
    
*   Only _detect_ was enabled, there are plans to get _patch_ and _exploit_ working and I'd be keen to try them.
    
*   It is something I will likely use in my next project.

---

*Originally published on [jamco.eth](https://paragraph.com/@jamesmccomish/testing-evmbench)*