Formal Verification with Agentic Coding for Less Than $1

The Setup

Time: ~1 hour | Cost: < $1

Setup: Claude Code + Factory CLI (Big shoutout for the mathematical rendering!)

Result: Mathematically rigorous proof of CDF monotonicity

I built move-gaussian, an on-chain Gaussian distribution library for Sui Move. The library provides CDF, PDF, PPF (inverse CDF), and sampling functions—core statistical primitives living natively on-chain.

I wanted to formally verify the library's mathematical properties. The sui-prover handled most verification tasks: overflow safety (45 specs), arithmetic primitives, and point evaluations. But one property stood out as both critical and difficult: proving the CDF is monotonically increasing.

Why Monotonicity Matters

A CDF (Cumulative Distribution Function) answers: "What's the probability that a random variable is less than or equal to z?" By definition, this probability must increase as z increases—you're accumulating more probability mass as you move right on the number line, never less. If a CDF implementation decreases anywhere, it's not a valid CDF.

For a Gaussian CDF specifically, monotonicity is load-bearing. It guarantees the inverse CDF (PPF) is well-defined and unique, which means sampling via inverse transform actually works. It ensures probability intervals make sense—P(a < X ≤ b) = CDF(b) - CDF(a) must be non-negative. Without monotonicity, the entire statistical machinery breaks down.

What Actually Happened

The implementation uses rational polynomial approximation: degree-12 polynomials P(z) and Q(z), evaluated via Horner's method in fixed-point arithmetic. My first attempt was direct SMT verification with sui-prover, built on Boogie and Z3.

It failed immediately. The prover explores all possible intermediate values—not just the ones our coefficients produce—and found theoretical overflow paths:

error: code should not abort
    ┌─ ./sources/math.move:154:8
    │
154 │     (a * x) / SCALE
    │        ^
    │
    =     at ./sources/erf.move:233: horner_eval_p
    =         a = 115792089237316195423570985008687907853269984665640564039458...

That a value is near u256 max—a path the real code never takes, but Z3 doesn't know that. Even switching to real arithmetic (sui-prover's to_real() bypasses overflow), Z3's nonlinear real arithmetic theory times out around degree 6-8. Our problem is degree 22.

My mathematical intuition said: stop verifying the computation, verify the algebra. For monotonicity, we need CDF'(z) ≥ 0. For a rational function P/Q, this means proving N(z) = P'(z)·Q(z) - P(z)·Q'(z) ≥ 0—an algebraic problem, not a numerical one.

I prompted Claude toward algebraic approaches. It searched external documentation and surfaced Sturm's Theorem (1829)—a classical result for counting polynomial roots in an interval. The theorem is elegant: build the Sturm sequence (polynomial remainder sequence), count sign changes at interval endpoints, and the difference equals the number of roots. If N(z) has zero roots in [0, 6] and N(0) > 0, then N(z) > 0 everywhere.

Claude generated a Python script using SymPy with exact rational arithmetic. No floating-point errors. No numerical instability. Just pure polynomial algebra: V(0) = 11 sign changes, V(6) = 11 sign changes, root count = 0, and N(0) > 0.

Theorem verified: N(z) > 0 for all z ∈ [0, 6]. Therefore CDF is monotonically increasing.

Reflection

What struck me most was Claude's role as a living encyclopedia of mathematical knowledge. It's not always 100% correct, but it seems to have an answer for everything if you ask the right question. The human provides domain expertise, quality control, and strategic direction. I knew to look for algebraic approaches when Z3 failed, and I could verify the proof structure made sense. The AI provides breadth of knowledge, implementation speed, and pattern matching across domains, surfacing Sturm's theorem, generating a working Python script in minutes, connecting dots for me while I point out the direction.

But this only works because we're not blindly trusting AI output. The proof stands on Sturm's theorem, which has 200 years of mathematical scrutiny behind it. The implementation uses Python and SymPy, decades-old tools with extensive testing, and exact rational arithmetic that avoids floating-point uncertainty entirely. The sui-prover verified the supporting facts: overflow bounds (45 specs, all passing), point evaluations, and arithmetic primitive correctness. Claude found the path; the path itself is ancient and trusted.

The formal verification took an hour, and it was simultaneously a learning experience, an application experience, and a research experiment. These categories blur together now. The distinction between "learning" and "doing" feels increasingly blurry when AI can collapse the gap between knowing something exists and implementing it immediately.

More from Defi, Data, Degen

Defi, Data, Degen

May 12

Past Writings

Collected Works

Defi, Data, Degen

May 12

Streaming Embedded OLAP Data Pipeline with LanceDB + DuckDB in Python

Build a streaming embedded OLAP data pipeline locally from historical blockchain data in Python

Cover image for Data Observability with Sentio Pipelines

Defi, Data, Degen

Dec 18

Data Observability with Sentio Pipelines

Seamless Real-Time Data Observability for Multi-Chain Web3 Pipelines

The Setup

Time: ~1 hour | Cost: < $1

Setup: Claude Code + Factory CLI (Big shoutout for the mathematical rendering!)

Result: Mathematically rigorous proof of CDF monotonicity

Why Monotonicity Matters

What Actually Happened

It failed immediately. The prover explores all possible intermediate values—not just the ones our coefficients produce—and found theoretical overflow paths:

error: code should not abort
    ┌─ ./sources/math.move:154:8
    │
154 │     (a * x) / SCALE
    │        ^
    │
    =     at ./sources/erf.move:233: horner_eval_p
    =         a = 115792089237316195423570985008687907853269984665640564039458...

Theorem verified: N(z) > 0 for all z ∈ [0, 6]. Therefore CDF is monotonically increasing.

Reflection

More from Defi, Data, Degen

Defi, Data, Degen

May 12

Past Writings

Collected Works

Defi, Data, Degen

May 12

Streaming Embedded OLAP Data Pipeline with LanceDB + DuckDB in Python

Build a streaming embedded OLAP data pipeline locally from historical blockchain data in Python

Defi, Data, Degen

Dec 18

Data Observability with Sentio Pipelines

Seamless Real-Time Data Observability for Multi-Chain Web3 Pipelines

Defi, Data, Degen

More from Defi, Data, Degen

Defi, Data, Degen

The Setup

Why Monotonicity Matters

What Actually Happened

Reflection

No comments yet

More from Defi, Data, Degen

Defi, Data, Degen

More from Defi, Data, Degen

Defi, Data, Degen

The Setup

Why Monotonicity Matters

What Actually Happened

Reflection

No comments yet

More from Defi, Data, Degen

Formal Verification with Agentic Coding for Less Than $1

with factory cli as harness

Formal Verification with Agentic Coding for Less Than $1

with factory cli as harness

No comments yet

No comments yet

The Setup

Why Monotonicity Matters

What Actually Happened

Reflection

The Setup

Why Monotonicity Matters

What Actually Happened

Reflection