This is the third in our series on DeFi curation, following our analyses of setting LTVs and caps for collaterals and the relevance of JIT MEV. We develop a quantitative framework for estimating liquidation sizes and their implications for risk parameter calibration in DeFi lending. Using over 6,400 liquidation events from Aave v3, we model the ratio of liquidation size to pool depth on the log scale and find it is best captured by a three-component Gaussian Mixture Model (GMM). This approach explains anomalies introduced by liquidator behaviour, most notably round-number clustering around $1,000. We show that while typical liquidations are nearly constant in dollar terms across pool sizes, the tails deviate, requiring conservative treatment. To address this, we impose a minimum floor on expected liquidation sizes, ensuring robustness for smaller or under-sampled pools. Our estimates indicate that for pools larger than $1 billion, liquidation events will remain below 6 basis points with 99% certainty and below 32 basis points with 99.9% certainty. For smaller pools, we extrapolate the values from the $1 billion pool size, offering a conservative and tractable framework for stress testing and setting collateral risk parameters.
Understanding the mechanics behind asset onboarding decisions provides valuable insight into risk management. In our previous work, we outlined how to use historical data to estimate the price slippage, , for a given liquidation volume, . We now focus instead on the conditional distribution of liquidation size given pool depth . Directly estimating this density is difficult, so we instead work with the normalised liquidation ratio whose density is related by .
Inverting the CDF allows us to estimate the expected slippage where is the 99th percentile of liquidation volumes for a given pool depth .
We analysed 6400 liquidation events from Kaiko's Aave v3 dataset, filtering for:
Minimum liquidation size of $10 (removing dust)
USD-denominated debt assets (USDT, USDC, DAI, etc.)
In this analysis we discarded Compound data because it had an FX issue between cTokens and the underlying debt asset that couldn't be easily resolved. We will likely redo the analysis in the future when that is rectified.
Columns taken from two of Kaiko's endpoints: /lending.v1/events
for liquidation data and /lending.v1/snapshots
for snapshotting the total pool liquidity to get the following columns:
blockchain
protocol
address
(pool address)
liquidation_debt_asset_symbol
transaction_hash
total_liquidity
(deposits)
liquidation_debt_amount_in_asset
(liquidation amount)
We grabbed data for blocks with liquidation events only and created a multi-index on blockchain
, protocol
, address
, liquidation_debt_asset_symbol
, transaction_hash
with the values being total_liquidity
from snapshots and liquidation_debt_amount_in_asset
from events.
Liquidation Ratio: debt_amount / total_liquidity
Depletion Percentage: (initial_liquidity - min_liquidity) / initial_liquidity
Note that if you use the Kaiko dataset, please be aware that liquidations need to be normalised into USD, do not simply divide liquidation_debt_amount_in_asset / total_liquidity
- an easy workaround is to imply the FX rate from some existing fields:
# Calculate implied price (amount is in tokens and liq is in USD)
df['implied_price'] = df['liquidation_debt_amount_in_asset'] / df['amount']
# Convert total_liquidity to USD
df['total_liquidity_usd'] = df['total_liquidity'] * df['implied_price']
# Calculate correct liquidation ratio
df['liquidation_ratio'] = df['liquidation_debt_amount_in_asset'] / df['total_liquidity_usd']
It is immediately clear that the data follows a log distribution but there are also interesting anomalies - notably the left of the peak in the transformed dataset.
Interestingly if we look at the distribution of absolute USD amounts in this region of -14 to -16 we see a strange pattern!
Plotting for all data, it becomes apparent that this due to liquidator preference for round numbers. When looked at in absolute terms, the liquidations show several spikes at different dollar thresholds. Most notably $1k which was hugely dominant in the wstETH liquidations.
As a side remark, running a liquidation bot to liquidate at $990 instead of $1k might be highly profitable!
This has implications for estimation of distributions because the fits tend to be incorrectly trying to fit this spike which is more related to liquidators rather than liquidations themselves. This also means that historical data kernels would likely misrepresent the tail risk.
The models are largely behaviour like gaussians for example, a huge degree of freedom on a t-model converges it to a normal distribution. We can look instead at a Gaussian Mixture Model to model the "hole" that is left around $500-1k as liquidators leave opportunities on the table despite health deterioration.
We start the solver with two gaussians at -18 and -16 straddling the "hole" and one where the true mean appears to reside at -13 and obtained a significantly improved fit.
For a given pool depth , we can now calculate where is the inverse CDF (quantile function) of the GMM distribution. This gives us the 99th percentile liquidation volume conditional on pool depth.
We estimated the following percentiles analytically:
99.9th: -5.758
99th: -7.448
95th: -9.030
which gives the following estimates for so when exponentiated we have (in basis points):
with 99.9% confidence
with 99% confidence
with 95% confidence
Practically this means for a pool size of $10m you expect liquidations of just under $6k with 99% certainty.
We also investigated whether the size of pool made a difference in liquidation to test the hypothesis that borrowers will tend to come in for a specific minimum size so we expect that small vaults will see a higher ratio of liquidation ratios. We expect that the liquidation ratio will decrease exponentially with respect to the pool size.
This would alter the expectation framework: rather than assuming a stationary conditional density , we should incorporate the power-law scaling and would change the quantile function to become
The size of AAVE presents some uncertainty issues at lower sizes but overall the regression seems to hold relatively stable.
On the log–log fit we get so the liquidation ratio scales almost inversely with pool size . Equivalently, the dollar size of a
The estimated slope implies that liquidation ratios decline nearly inversely with pool size, such that dollar liquidation volumes remain almost constant across depths.
tldr no.
The standard error in the slope is 0.04 so it's within practical reason to estimate the gradient as -1.
From a modelling perspective, treating the gradient as has clear implications. It indicates that liquidation events are executed in fixed-dollar lots, rather than as a fixed percentage of pool depth, implying that the conditional distribution scales sub-linearly with . Formally, if , then liquidation sizes scale as . With , this reduces to , reinforcing the interpretation that liquidations are bounded primarily by transaction cost economics and bot incentives rather than pool-specific dynamics. For risk calibration, this means that extrapolating liquidation risk across depth should account for a nearly depth-independent stress size, with deviations from capturing only second-order effects. In practice, this provides justification for applying fixed-dollar stress tests across pools of different sizes, while reserving depth-scaled adjustments for the tail quantiles where empirical departures from the idealised slope become material.
While the regression shows that liquidation ratios scale nearly inversely with depth , the practical implications differ depending on pool size.
When fitting the absolute dollar liquidations we see artefacts that perhaps that ratio captures the tail behaviour more accurately... see the tail distribution seems to underperform in the extreme for absolute dollar numbers.
For small pools (<$50M), the flat $6k-lot structure implies disproportionately large liquidation ratios, however, these pools are not systemically important: their small liquidity base makes them less relevant to slippage-driven spirals that threaten protocol stability.
Given the nature of the dataset being strongly biased to $1bn + size pools it's possibly prudent to make a conservative assertion that the approximations of liquidation size are only valid for pools of >1bn.
For smaller pools, we impose a conservative floor by taking the minimum liquidation size to be that of a $1bn pool where is the loss ratio inferred from a $1 bn pool.
Thus, the floored expected liquidation is
where is the pool size, and
We studied 6,400+ Aave v3 liquidations and modelled liquidation size as a ratio of pool depth . On the log scale, the body of is close to Gaussian but exhibits structure driven by liquidator behaviour, most visibly a round‑number spike around $1k and a “hole” just below that threshold. A 3‑component Gaussian Mixture Model (GMM) on captures this behaviour and materially reduces tail error relative to single‑family fits.
We found typical dollar liquidation sizes are nearly flat across depth (≈$6k in our sample), while tail quantiles grow sub‑linearly with .
For risk calibration, compute from the GMM and pair it with a venue‑specific slippage curve to estimate . For small or poorly represented pools, apply a minimum‑pool size floor of $1 billion.
We estimate the max single liquidation size, with 99% certainty, on pools smaller then $1 billion to be around $580k and scaling with sizes over $1 billion as 0.058% of the pool in dollar terms.
This framework opens several directions for refinement and validation. First, expanding coverage beyond Aave v3 to include Aave v2/v1 and Compound will test the portability of the fixed-lot interpretation across protocols. For Compound in particular, resolving the FX treatment of cTokens is essential to establish a comparable baseline.
Second, the methodology should be validated out-of-sample with newer data and across different market regimes. This will help identify whether the observed GMM structure is stable or if liquidation dynamics shift as gas costs, MEV opportunities, or liquidator strategies evolve. Such stress testing will clarify whether the fixed-dollar clustering remains robust or drifts with infrastructure changes and arbitrage competition.
Third, greater granularity in borrower and pool-level features such as: concentration risk; asset volatility; or liquidity fragmentation; would allow conditioning the mixture model on explanatory variables, rather than treating all liquidations as exchangeable. This would move the framework from a descriptive fit toward a predictive tool that distinguishes systemic pools from idiosyncratic ones.
Finally, incorporating this distributional framework into a full risk calibration pipeline, combining liquidation size distributions with venue-specific slippage models, will allow systematic derivation of supply caps and LTV limits. This end-to-end integration is the natural next step toward making liquidation modelling a standard component of protocol-level risk management.
Share Dialog
Alex McFarlane