
Sybil Detection by XGBoost
Public blockchains are transparent, accurate, and comprehensive records of their entire history. These freely available data sets are some of the largest and cleanest in the world, and they are highly amenable to the application of machine learning. In 2024, the Ethereum blockchain has about 200 million addresses, which is the same as the count of active websites on the internet. It is a global, public, financial dataset of internet scale. Ranking, classification, personalization, co-occurren...

A Taxonomy of LayerZero Network Users Arising from Sybil Analysis
IntroductionA critical threat to the integrity of peer-to-peer networks, especially within blockchain ecosystems, is the Sybil attack. First identified by Douceur, this type of attack involves a single adversary creating multiple fake identities—known as Sybils—to undermine the network. These false identities can be leveraged to manipulate consensus mechanisms, disrupt the fair distribution of resources, or even execute double-spending attacks. Sybil attacks are prevalent across a variety of ...

Decentralized Onchain Microcredit
(Cover image credit: World Bank Flickr, used under a Creative Commons License.)An Onchain Approach to MicrocreditThe World Bank estimates that 1.4 billion people worldwide remained excluded from the global financial system, unable to open accounts, secure loans, or build credit histories. For these individuals, the barriers to economic opportunity are immense. Traditional credit scoring relies on access to formal financial data, including bank accounts, payment histories, and income documenta...
Applied scientist studying algorithmic reputation and identity.

Sybil Detection by XGBoost
Public blockchains are transparent, accurate, and comprehensive records of their entire history. These freely available data sets are some of the largest and cleanest in the world, and they are highly amenable to the application of machine learning. In 2024, the Ethereum blockchain has about 200 million addresses, which is the same as the count of active websites on the internet. It is a global, public, financial dataset of internet scale. Ranking, classification, personalization, co-occurren...

A Taxonomy of LayerZero Network Users Arising from Sybil Analysis
IntroductionA critical threat to the integrity of peer-to-peer networks, especially within blockchain ecosystems, is the Sybil attack. First identified by Douceur, this type of attack involves a single adversary creating multiple fake identities—known as Sybils—to undermine the network. These false identities can be leveraged to manipulate consensus mechanisms, disrupt the fair distribution of resources, or even execute double-spending attacks. Sybil attacks are prevalent across a variety of ...

Decentralized Onchain Microcredit
(Cover image credit: World Bank Flickr, used under a Creative Commons License.)An Onchain Approach to MicrocreditThe World Bank estimates that 1.4 billion people worldwide remained excluded from the global financial system, unable to open accounts, secure loans, or build credit histories. For these individuals, the barriers to economic opportunity are immense. Traditional credit scoring relies on access to formal financial data, including bank accounts, payment histories, and income documenta...
Applied scientist studying algorithmic reputation and identity.

Subscribe to Scott Onchain

Subscribe to Scott Onchain
<100 subscribers
<100 subscribers
Share Dialog
Share Dialog
This is a technical note, intended for researchers and technical teams, especially those designing airdrops. This work uses the Hyperline blockchain analytics platform and the Octan algorithmic reputation score.
In our recent article, "Onchain PageRank as Predictor of Future Revenue," we described a surprising property of Arbitrum Airdrop #1: A small percentage of active addresses were eligible according to the airdrop targeting, but did not claim the airdrop. Despite being only 1.5% of active addresses, they contributed 10.2% of revenue in the year after the airdrop.
Designating this cohort as “unclaims”, as contrasted with eligible and claims, the following table illustrates the behavior. Claims and unclaims partition the eligible set.

The contribution of first year revenue from unclaims is larger than would be expected from the size of the cohort. In the following visualization, the entire width of the bar is the eligible cohort and the light blue bar is the unclaim cohort.

Note that the outsized contribution of the green bar to revenue is expected and desired. This implies that the airdrop claimants were contributing network participants, as desired. However, the unclaim cohort was even more efficient to revenue than the claiming cohort.
This is unexpected. Addresses which claimed the airdrop would reasonably be expected to be contribute more than addresses which did not claim the airdrop. The opposite is true.
Previous articles have applied a pagerank-based reputation score, similar to that published by Octan Network. These results show that the top 22.4% of addresses by pagerank are more revenue efficient than the 22.4% of addresses who claimed the airdrop. However, comparing the next 1.5% of addresses by pagerank to unclaims, the unclaims again demonstrate extreme revenue efficiency.

An interesting observation is that the unclaim cohort has many high outliers. A boxplot of the first year revenue illustrates:

Comparing the mean first year revenue to the median also illustrates the impact of outliers. The outliers cause the mean to be much higher for unclaims, while for the median, the reverse is the case.


The highest value addresses in the unclaim cohort are shown below.

These addresses show bot-like behavior on Arbiscan. They have more transactions than would normally come from a human, and the transactions are more frequent and constant than a human would produce.

This, together with the outlier analysis in the previous section, provide evidence that the outsized impact of the unclaim cohort is due to bots included in that cohort.
Transactions per minute provides one useful criterion for differentiating bots from humans. Here, we can define tx/min as:
count of transactions / (final transaction time - first transaction time)
where the set of transactions is taken over the year following the airdrop.
If we exclude the top 25% of addresses with high tx/min, we find that the outsized impact of the unclaim cohort is greatly reduced:

The unclaim cohort shows a suprising contribution to first-year revenue. This is explained by bot-like behavior among the highest contributors in the cohort. Eliminating addresses with very high transaction rates causes the revenue contribution from this cohort to be inline with expectations.
Airdrop targeting is traditionally aimed at human users, and the bots are a confounding factor in comparing targeting methodologies. While this does not provide prescriptive guidance for eliminating bots from an airdrop or an onchain analysis, it does demonstrate that the outsized impact of the unclaim cohort is likely unimportant when comparing the historical airdrop to algorithmic pagerank targeting.
This is a technical note, intended for researchers and technical teams, especially those designing airdrops. This work uses the Hyperline blockchain analytics platform and the Octan algorithmic reputation score.
In our recent article, "Onchain PageRank as Predictor of Future Revenue," we described a surprising property of Arbitrum Airdrop #1: A small percentage of active addresses were eligible according to the airdrop targeting, but did not claim the airdrop. Despite being only 1.5% of active addresses, they contributed 10.2% of revenue in the year after the airdrop.
Designating this cohort as “unclaims”, as contrasted with eligible and claims, the following table illustrates the behavior. Claims and unclaims partition the eligible set.

The contribution of first year revenue from unclaims is larger than would be expected from the size of the cohort. In the following visualization, the entire width of the bar is the eligible cohort and the light blue bar is the unclaim cohort.

Note that the outsized contribution of the green bar to revenue is expected and desired. This implies that the airdrop claimants were contributing network participants, as desired. However, the unclaim cohort was even more efficient to revenue than the claiming cohort.
This is unexpected. Addresses which claimed the airdrop would reasonably be expected to be contribute more than addresses which did not claim the airdrop. The opposite is true.
Previous articles have applied a pagerank-based reputation score, similar to that published by Octan Network. These results show that the top 22.4% of addresses by pagerank are more revenue efficient than the 22.4% of addresses who claimed the airdrop. However, comparing the next 1.5% of addresses by pagerank to unclaims, the unclaims again demonstrate extreme revenue efficiency.

An interesting observation is that the unclaim cohort has many high outliers. A boxplot of the first year revenue illustrates:

Comparing the mean first year revenue to the median also illustrates the impact of outliers. The outliers cause the mean to be much higher for unclaims, while for the median, the reverse is the case.


The highest value addresses in the unclaim cohort are shown below.

These addresses show bot-like behavior on Arbiscan. They have more transactions than would normally come from a human, and the transactions are more frequent and constant than a human would produce.

This, together with the outlier analysis in the previous section, provide evidence that the outsized impact of the unclaim cohort is due to bots included in that cohort.
Transactions per minute provides one useful criterion for differentiating bots from humans. Here, we can define tx/min as:
count of transactions / (final transaction time - first transaction time)
where the set of transactions is taken over the year following the airdrop.
If we exclude the top 25% of addresses with high tx/min, we find that the outsized impact of the unclaim cohort is greatly reduced:

The unclaim cohort shows a suprising contribution to first-year revenue. This is explained by bot-like behavior among the highest contributors in the cohort. Eliminating addresses with very high transaction rates causes the revenue contribution from this cohort to be inline with expectations.
Airdrop targeting is traditionally aimed at human users, and the bots are a confounding factor in comparing targeting methodologies. While this does not provide prescriptive guidance for eliminating bots from an airdrop or an onchain analysis, it does demonstrate that the outsized impact of the unclaim cohort is likely unimportant when comparing the historical airdrop to algorithmic pagerank targeting.
No activity yet