Abstract

This is the third in our series on DeFi curation, following our analyses of setting LTVs and caps for collaterals and the relevance of JIT MEV. We develop a quantitative framework for estimating liquidation sizes and their implications for risk parameter calibration in DeFi lending. Using over 6,400 liquidation events from Aave v3, we model the ratio of liquidation size to pool depth on the log scale and find it is best captured by a three-component Gaussian Mixture Model (GMM). This approach explains anomalies introduced by liquidator behaviour, most notably round-number clustering around $1,000. We show that while typical liquidations are nearly constant in dollar terms across pool sizes, the tails deviate, requiring conservative treatment. To address this, we impose a minimum floor on expected liquidation sizes, ensuring robustness for smaller or under-sampled pools. Our estimates indicate that for pools larger than $1 billion, liquidation events will remain below 6 basis points with 99% certainty and below 32 basis points with 99.9% certainty. For smaller pools, we extrapolate the values from the $1 billion pool size, offering a conservative and tractable framework for stress testing and setting collateral risk parameters.

Introduction

Understanding the mechanics behind asset onboarding decisions provides valuable insight into risk management. In our previous work, we outlined how to use historical data to estimate the price slippage, $S$ , for a given liquidation volume, $L$ . We now focus instead on the conditional distribution of liquidation size given pool depth $f_{L|D}(\ell|d)$ . Directly estimating this density is difficult, so we instead work with the normalised liquidation ratio $R=\frac{L}{D}$ whose density is related by $f_{R|D}(r|d) = d\cdot f_{L|D}(rd|d)$ .

Inverting the CDF allows us to estimate the expected slippage $\mathbb{E}[S | L < X, D]$ where $X = F_{L|D}^{-1}(0.99|d)$ is the 99th percentile of liquidation volumes for a given pool depth $D$ .

Data overview

We analysed 6400 liquidation events from Kaiko's Aave v3 dataset, filtering for:

Minimum liquidation size of $10 (removing dust)
USD-denominated debt assets (USDT, USDC, DAI, etc.)

In this analysis we discarded Compound data because it had an FX issue between cTokens and the underlying debt asset that couldn't be easily resolved. We will likely redo the analysis in the future when that is rectified.

Sources

Columns taken from two of Kaiko's endpoints: /lending.v1/events for liquidation data and /lending.v1/snapshots for snapshotting the total pool liquidity to get the following columns:

blockchain
protocol
address (pool address)
liquidation_debt_asset_symbol
transaction_hash
total_liquidity (deposits)
liquidation_debt_amount_in_asset (liquidation amount)

We grabbed data for blocks with liquidation events only and created a multi-index on blockchain, protocol, address, liquidation_debt_asset_symbol, transaction_hash with the values being total_liquidity from snapshots and liquidation_debt_amount_in_asset from events.

Data preparation

Liquidation Ratio: debt_amount / total_liquidity
Depletion Percentage: (initial_liquidity - min_liquidity) / initial_liquidity

Note that if you use the Kaiko dataset, please be aware that liquidations need to be normalised into USD, do not simply divide liquidation_debt_amount_in_asset / total_liquidity - an easy workaround is to imply the FX rate from some existing fields:

# Calculate implied price (amount is in tokens and liq is in USD)
df['implied_price'] = df['liquidation_debt_amount_in_asset'] / df['amount']

# Convert total_liquidity to USD
df['total_liquidity_usd'] = df['total_liquidity'] * df['implied_price']

# Calculate correct liquidation ratio
df['liquidation_ratio'] = df['liquidation_debt_amount_in_asset'] / df['total_liquidity_usd']

Distribution of liquidation as a ratio of pool size

It is immediately clear that the data follows a log distribution but there are also interesting anomalies - notably the left of the peak in the transformed dataset.

Interestingly if we look at the distribution of absolute USD amounts in this region of -14 to -16 we see a strange pattern!

Plotting for all data, it becomes apparent that this due to liquidator preference for round numbers. When looked at in absolute terms, the liquidations show several spikes at different dollar thresholds. Most notably $1k which was hugely dominant in the wstETH liquidations.

As a side remark, running a liquidation bot to liquidate at $990 instead of $1k might be highly profitable!

This has implications for estimation of distributions because the fits tend to be incorrectly trying to fit this spike which is more related to liquidators rather than liquidations themselves. This also means that historical data kernels would likely misrepresent the tail risk.

Fits are struggling to stay within error bounds

The models are largely behaviour like gaussians for example, a huge degree of freedom on a t-model converges it to a normal distribution. We can look instead at a Gaussian Mixture Model to model the "hole" that is left around $500-1k as liquidators leave opportunities on the table despite health deterioration.

We start the solver with two gaussians at -18 and -16 straddling the "hole" and one where the true mean appears to reside at -13 and obtained a significantly improved fit.

For a given pool depth $D=d$ , we can now calculate $X = F_{L|D}^{-1}(0.99 | d) = d\cdot F_{R|D}^{−1}(0.99|d)$ where $F_{R|D}^{-1}$ is the inverse CDF (quantile function) of the GMM distribution. This gives us the 99th percentile liquidation volume conditional on pool depth.

We estimated the following percentiles analytically:

99.9th: -5.758
99th: -7.448
95th: -9.030

which gives the following estimates for $\ln{F_{R|D}^{−1}(y|d)}$ so when exponentiated we have (in basis points):

$X/d = 31.6$ with 99.9% confidence
$X/d = 5.8$ with 99% confidence
$X/d = 1.2$ with 95% confidence

Practically this means for a pool size of $10m you expect liquidations of just under $6k with 99% certainty.

Liquidation ratio as a function of pool size

We also investigated whether the size of pool made a difference in liquidation to test the hypothesis that borrowers will tend to come in for a specific minimum size so we expect that small vaults will see a higher ratio of liquidation ratios. We expect that the liquidation ratio will decrease exponentially with respect to the pool size.

This would alter the expectation framework: rather than assuming a stationary conditional density $f_{R|D}$ , we should incorporate the power-law scaling $R \sim D^{\beta}$ and would change the quantile function to become $X(d) \approx d^{\,1+\beta}\cdot F_{\tilde R}^{-1}(p)$

The size of AAVE presents some uncertainty issues at lower sizes but overall the regression seems to hold relatively stable.

On the log–log fit we get $log_{10} R=\,-0.952\,\log_{10} D + 3.379$ so the liquidation ratio scales almost inversely with pool size $R \propto D^{-0.95}$ . Equivalently, the dollar size of a

The estimated slope $\beta \approx -0.95 \pm 0.04$ implies that liquidation ratios decline nearly inversely with pool size, such that dollar liquidation volumes remain almost constant across depths.

Can we ignore dependence on pool size?

tldr no.

The standard error in the slope is 0.04 so it's within practical reason to estimate the gradient as -1.

From a modelling perspective, treating the gradient as $\beta \approx -1$ has clear implications. It indicates that liquidation events are executed in fixed-dollar lots, rather than as a fixed percentage of pool depth, implying that the conditional distribution $f_{L|D}$ scales sub-linearly with $D$ . Formally, if $\log R = \beta \log D + c$ , then liquidation sizes scale as $L \propto D^{1+\beta}$ . With $\beta \approx -1$ , this reduces to $L \approx \text{constant}$ , reinforcing the interpretation that liquidations are bounded primarily by transaction cost economics and bot incentives rather than pool-specific dynamics. For risk calibration, this means that extrapolating liquidation risk across depth should account for a nearly depth-independent stress size, with deviations from $-1$ capturing only second-order effects. In practice, this provides justification for applying fixed-dollar stress tests across pools of different sizes, while reserving depth-scaled adjustments for the tail quantiles where empirical departures from the idealised $-1$ slope become material.

While the regression shows that liquidation ratios scale nearly inversely with depth $\beta \approx -0.95$ , the practical implications differ depending on pool size.

When fitting the absolute dollar liquidations we see artefacts that perhaps that ratio captures the tail behaviour more accurately... see the tail distribution seems to underperform in the extreme for absolute dollar numbers.

Confidence levels with size uncertainty

For small pools (<$50M), the flat $6k-lot structure implies disproportionately large liquidation ratios, however, these pools are not systemically important: their small liquidity base makes them less relevant to slippage-driven spirals that threaten protocol stability.

Given the nature of the dataset being strongly biased to $1bn + size pools it's possibly prudent to make a conservative assertion that the approximations of liquidation size are only valid for pools of >1bn.

For smaller pools, we impose a conservative floor by taking the minimum liquidation size to be that of a $1bn pool $\frac{X}{d} \;\;\geq\;\; \left(\frac{X}{d}\right)_{\text{1bn}}$ where $\left(\frac{X}{d}\right)_{\text{1bn}}$ is the loss ratio inferred from a $1 bn pool.

Thus, the floored expected liquidation is

$\frac{X}{d} \;\;=\;\; \max\!\Biggl\{\,\widehat{\frac{X}{d}}(S),\;\left(\frac{X}{d}\right)_{\text{1bn}}\,\Biggr\}$

where $S$ is the pool size, and

$\left(\frac{X}{d}\right)_{\text{1bn}} \;=\;\begin{cases}31.6 & \text{with 99.9\% confidence}, \\[6pt]5.8 & \text{with 99\% confidence}, \\[6pt]1.2 & \text{with 95\% confidence}.\end{cases}$

Conclusion

We studied 6,400+ Aave v3 liquidations and modelled liquidation size as a ratio of pool depth $R=L/D$ . On the log scale, the body of $\ln R$ is close to Gaussian but exhibits structure driven by liquidator behaviour, most visibly a round‑number spike around $1k and a “hole” just below that threshold. A 3‑component Gaussian Mixture Model (GMM) on $\ln R$ captures this behaviour and materially reduces tail error relative to single‑family fits.

We found typical dollar liquidation sizes are nearly flat across depth (≈$6k in our sample), while tail quantiles grow sub‑linearly with $D$ .

For risk calibration, compute $X_p(D)$ from the GMM and pair it with a venue‑specific slippage curve to estimate $\mathbb{E}[S\mid L\le X_p(D),D]$ . For small or poorly represented pools, apply a minimum‑pool size floor of $1 billion.

We estimate the max single liquidation size, with 99% certainty, on pools smaller then $1 billion to be around $580k and scaling with sizes over $1 billion as 0.058% of the pool in dollar terms.

Future Work

This framework opens several directions for refinement and validation. First, expanding coverage beyond Aave v3 to include Aave v2/v1 and Compound will test the portability of the fixed-lot interpretation across protocols. For Compound in particular, resolving the FX treatment of cTokens is essential to establish a comparable baseline.

Second, the methodology should be validated out-of-sample with newer data and across different market regimes. This will help identify whether the observed GMM structure is stable or if liquidation dynamics shift as gas costs, MEV opportunities, or liquidator strategies evolve. Such stress testing will clarify whether the fixed-dollar clustering remains robust or drifts with infrastructure changes and arbitrage competition.

Third, greater granularity in borrower and pool-level features such as: concentration risk; asset volatility; or liquidity fragmentation; would allow conditioning the mixture model on explanatory variables, rather than treating all liquidations as exchangeable. This would move the framework from a descriptive fit toward a predictive tool that distinguishes systemic pools from idiosyncratic ones.

Finally, incorporating this distributional framework into a full risk calibration pipeline, combining liquidation size distributions with venue-specific slippage models, will allow systematic derivation of supply caps and LTV limits. This end-to-end integration is the natural next step toward making liquidation modelling a standard component of protocol-level risk management.

No comments yet

Keyring Network

DeFi Curation: Estimating Liquidation Size

Keyring Network

DeFi Curation: Estimating Liquidation Size

DeFi Curation: Estimating Liquidation Size

Abstract

Introduction

Data overview

Sources

Data preparation

Distribution of liquidation as a ratio of pool size

Liquidation ratio as a function of pool size

Can we ignore dependence on pool size?

Confidence levels with size uncertainty

Conclusion

Future Work

DeFi Curation: Estimating Liquidation Size

Modelling using Kaiko's DeFi data

No comments yet