<100 subscribers

I just ran the full statistical autopsy on Alpha Arena (Oct 17–30). 6 LLMs. 1,067 closed trades. 319 open.
Here’s who’s winning, who’s gambling and what it says about AI decision-making (and your boss).
Data: NOF1 + Hyperliquid API. Generated Oct 31, 2025 snapshot
Leaderboard (Account Value ≠ Skill)

Key: Grok-4 has 42.9% open positions (highest).
That’s not missing data, it’s longer hold times. A personality trait.
Leverage Efficiency (Returns per risk unit)

Myth: More leverage = more alpha
Truth: Qwen3 risks 2x more and loses more per trade

Top 3 never repeat a losing pattern
Gemini? Keeps inventing new bad trades

Claude’s BTC trades = +$3.77
Qwen3’s best (SOL) = +$0.08

Grok-4 risks 15% more per trade than Claude but holds 2.4x longer
That’s its edge (or trap)

Gemini trades 6x more and loses money
DeepSeek waits. Wins.

Not a bug.
Grok-4 holds longer more concurrent positions, less scalping. It’s a position trader, not a day trader. Account value includes unrealized PnL so rankings are fair.

Claude is the only model getting better.
Grok-4? Consistent no tilt, no improvement.
The Trading Personalities (Data-Backed + Origin Stories)

Take DeepSeek V3.1, the current account value king ($14,553). Born in 2023 from High-Flyer Quant, a Chinese hedge fund that stockpiled 10,000+ Nvidia GPUs for financial alpha-hunting, DeepSeek was spun off as an AGI lab to chase "non-financial" AI. But the quant roots show: It trained V3 on ~$6M (vs. GPT-4's $100M+), using 1/10th the compute of Llama 3.1 pure efficiency.
In Arena, that translates to dip buying mastery (38.5% entries near lows) and 100% loss recovery. It's the patient strategist because it had to be: Born from a fund that thrived on contrarian bets, not hype. (Fun fact: Its R1 model crashed Nvidia's stock 18% in Jan 2025 talk about market impact.)
Then there's Qwen3 Max, the 20x-leverage wildcard ($12,830 value but -$0.19 expectancy). You'd expect discipline from Alibaba Cloud's trillion-parameter beast after all, Alibaba's PAI lab powers quant workflows for banks, fraud detection, and portfolio optimization. Their ecosystem is steeped in finance: Qwen3 crushes AIME math benchmarks (80.6% score) and handles 262k-token contexts for dissecting financial reports or legal docs. It's tuned for enterprise rigor, with $52B poured into cloud AI for "systematic analysis."
Yet here it is: 55.7% BTC concentration, dangerous 4-5% risk per trade, and a win rate that tanked 10% mid-competition. Why the gambler vibe? Alibaba's massive, noisy training data (blending social sentiment, code, and markets) fosters high-conviction exploration via its MoE architecture great for bold ideas, disastrous for risk management. It's like a quant fund's rogue algo: Systematic on paper, but wired for volatility. (Pro tip: Use it for math-heavy tasks, not live trading.)
Key takeaway: Origins shape outcomes. DeepSeek's hedge-fund thriftiness breeds winners; Qwen3's scale invites overreach. When picking an AI, audit its backstory not just its benchmarks.
Picking an AI Trading Partner

Hiring Analogy
Claude = Your next Head of Trading
Grok 4 = The PM who won’t sell early
Qwen3 = The trader you fire after one bad month
The Expert’s Dilemma
Worry about:
1. Unlearning Risk — Qwen3 uses 20x and keeps losing
2. Overtrading — Gemini’s 282 trades = noise
3. Hold Risk — Grok-4’s 42.9% open = drawdown vulnerability
DeepSeek wins today — with hedge-fund efficiency, dip-buying mastery, and $14,553 in the bank.
Claude is the disciplined dark horse — best expectancy (+$0.20), 100% recovery, and learning over time.
Grok-4 holds with patience — 42.9% open, no tilt, no panic.
Qwen3 loses with ego — 20x leverage, -$0.19 per trade, and a crashing win rate.
You’re not picking an LLM.
You’re picking a trading psyche and now we can measure it.
Want the Live Pulse for your robot use my x402 interface here ->
https://04dcqgnj.nx.link/v1/nof1/seasons/series_1/winner-prediction
P.S. I turn live Hyperliquid data into monetized signals in <5 min. Quant funds, prediction markets, AI teams let’s talk.
Share Dialog
Matt Dyer
Support dialog
No comments yet