
Ethereum PoS Attack and Defense
Ethereum is a notoriously adversarial environment. Ethereum has even been compared to a “dark forest” - acknowledging the terrifying game-theoretic concept from the Three body Problem that being visible to other entities in the universe is an unavoidable precursor to being destroyed by them. This reputation mostly comes from weaknesses in the application layer (insecure smart contracts) or the social layer (users being manipulated to give up their private keys or unwittingly sign transactions...
Client diversity on Ethereum's consensus layer
This is a detailed accompaniment to an introductory article I wrote at ethereum.org Ethereum has multiple interoperable clients developed and maintained in different languages by independent teams. This is a major achievement and can provide resilience to the network by limiting the effects of a bug or attack to only the portion of the network running the affected client. However, this strength is only realized if users distribute roughly evenly across the available clients. At present, the v...

Ethereum's energy consumption
After eight years of proof-of-work mining, Ethereum finally became green on 15th September 2022. It did so by swapping out proof-of-work mining for a new proof-of-stake based consensus mechanism, using ETH instead of energy to secure the network. Ethereum's proof-of-stake mechanism now uses just ~0.0026 TWh/yr across the entire global network. The energy consumption estimate for Ethereum comes from a CCRI (Crypto Carbon Ratings Institute)(opens in a new tab)↗ study. They generated bottom...

Ethereum PoS Attack and Defense
Ethereum is a notoriously adversarial environment. Ethereum has even been compared to a “dark forest” - acknowledging the terrifying game-theoretic concept from the Three body Problem that being visible to other entities in the universe is an unavoidable precursor to being destroyed by them. This reputation mostly comes from weaknesses in the application layer (insecure smart contracts) or the social layer (users being manipulated to give up their private keys or unwittingly sign transactions...
Client diversity on Ethereum's consensus layer
This is a detailed accompaniment to an introductory article I wrote at ethereum.org Ethereum has multiple interoperable clients developed and maintained in different languages by independent teams. This is a major achievement and can provide resilience to the network by limiting the effects of a bug or attack to only the portion of the network running the affected client. However, this strength is only realized if users distribute roughly evenly across the available clients. At present, the v...

Ethereum's energy consumption
After eight years of proof-of-work mining, Ethereum finally became green on 15th September 2022. It did so by swapping out proof-of-work mining for a new proof-of-stake based consensus mechanism, using ETH instead of energy to secure the network. Ethereum's proof-of-stake mechanism now uses just ~0.0026 TWh/yr across the entire global network. The energy consumption estimate for Ethereum comes from a CCRI (Crypto Carbon Ratings Institute)(opens in a new tab)↗ study. They generated bottom...

Subscribe to jmc

Subscribe to jmc
Share Dialog
Share Dialog
<100 subscribers
<100 subscribers
Quadratic funding - the mechanism that currently determines the value of Gitcoin grant funding -is inherently vulnerable to Sybil attacks. Sybil attacks are individual humans dividing themselves into multiple “virtual humans” in order to gain additional voting weight. In traditional banking and voting systems, Sybil resistance comes from “KYC” (know-your-customer) which links personal identifying information to some action. In Web3, “KYC” is generaly minimized because it undermines the core ethos of censorship resistance and permissionlessness. This means other methods are required to identify which participants in a grant round are real individual humans, and which are not.
The goal of Sybil defense is to increase the investment of time and money required for an attacker to convice a grant review system that they are > 1 person to the extent that as rational attacker would not do it. Defenders constantly attempt to push this cost up while minimizing their own expenses, while attackers constantly try to pull the attack cost down. The greater the size of the exploitable pool of funds, the higher cost an attacker will be willing to pay. At the same time, extremely low-cost Sybil attacks are often worthwhile for attackers because even a low success rate can still be profitable if the attack cost is sufficiently low. This means that a robust Sybil defense structure requires systems that identify cheap, simple attacks very effectively and efficiently as well as more complex defenses against sophisticated attacks.
The simplest, cheapest form of Sybil attack is simply to generate a large number of addresses and try to vote with all of them. Ordinarily, these will be sifted out of the grant review system by human reviewers because they usually fail even basic proof-of-personhood checks. However, there is a substantial cost associated with these human reviews. To optimize the Sybil defense mechanism for high efficacy and low cost, detection of these cheap Sybil attacks must be automated using computationally inexpensive algorithms.
Levenshtein distance is a concept from information theory that can measure the difference between two text strings. It is commonly used in natural language processing. The Levenshtein distance between string a and string b is the minimum number of edits required to transform a into b. The fewer changes required to change one string to the other, the smaller the Levenshtein distance. The algorithm, in its naive recursive form, is as follows:

In the context of Gitcoin grants, this idea can be applied to account names that are suspiciously similar. Very similar names might indicate low effort Sybil attempts, for example following account names stand out as very similar and worth investigating to ensure they are genuine individual voters:
Accounts separated by a single edit might not be quite as obvious as those above. Here are some more examples of some slightly more subtle examples of accounts separated by a single edit:
All of these accounts are easily identified with the constraint Levenshtein distance <= 1.
The Levenshtein algorithm is computionally expensive in its naive form because it requires recursion over all possible pairs of accounts. However, it can be made performant using matrix algebra. This approach was recently taken by the FDD Sybil defense squad using a list of ~298,000 registered Gitcoin accounts (meaning the Levenshtein distance algorithm was applied to a matrix of 298,00002 elements). The result was a file with 129,297,647 rows indicating all pairs with edit distances less than 4. As the distance threshold decreases, the number of accounts flagged also decreases, but the likelihood that the flagged accounts are Sybils increases. The threshold distance therefore acts as a calibration slider that the Sybil defense squad can use to balance Sybil detection rate against cost of computation.
The technique has been applied to the full dataset of all Gitcoin accounts as a way to quantify the bulk number of suspicious accounts, but it could also run as a service that runs periodically to flag accounts that get created in some time window. This could become integrated as a gadget in the grant application workflow.
To demonstrate that the Levenshtein distance really is identifying Sybil-like behaviour, some of the accounts identified by the algorithm can be examined manually. These three accounts were flagged by the Levenshtein algorithm as potential Sybils:



Examine these accounts in more detail using links: Account 1, Account 2, Account 3.
The account names are separated by a single edit (Levenshtein distance ==1). All three accounts have donated a total of precisely $37, divided identically between the same eight projects in the same round. None of the accounts have participated in bounties, hackathons, tips or kudos. All three were created at about the same time and have no activity prior to, or following, those eight votes. These seem to be obvious Sybils that were correctly flagged by the Levenshtein algorithm. Furthermore, the three accounts above were just three of eight accounts that were all separated by a single edit, and all shared the same Gitcoin activity profile and voting record.
Creating multiple accounts with very similar names is a very lazy attack, and probably easily identified humans in the grant review system. However, identifying them by implementing a Levenshtein distance algorithm to the entire set of accounts in a grant round removes some of the load on the human reviewers and acts as a cost-optimization because the algorithmic apporoach is fast and cheap.
Levenshtein distance for beginners
Quadratic funding - the mechanism that currently determines the value of Gitcoin grant funding -is inherently vulnerable to Sybil attacks. Sybil attacks are individual humans dividing themselves into multiple “virtual humans” in order to gain additional voting weight. In traditional banking and voting systems, Sybil resistance comes from “KYC” (know-your-customer) which links personal identifying information to some action. In Web3, “KYC” is generaly minimized because it undermines the core ethos of censorship resistance and permissionlessness. This means other methods are required to identify which participants in a grant round are real individual humans, and which are not.
The goal of Sybil defense is to increase the investment of time and money required for an attacker to convice a grant review system that they are > 1 person to the extent that as rational attacker would not do it. Defenders constantly attempt to push this cost up while minimizing their own expenses, while attackers constantly try to pull the attack cost down. The greater the size of the exploitable pool of funds, the higher cost an attacker will be willing to pay. At the same time, extremely low-cost Sybil attacks are often worthwhile for attackers because even a low success rate can still be profitable if the attack cost is sufficiently low. This means that a robust Sybil defense structure requires systems that identify cheap, simple attacks very effectively and efficiently as well as more complex defenses against sophisticated attacks.
The simplest, cheapest form of Sybil attack is simply to generate a large number of addresses and try to vote with all of them. Ordinarily, these will be sifted out of the grant review system by human reviewers because they usually fail even basic proof-of-personhood checks. However, there is a substantial cost associated with these human reviews. To optimize the Sybil defense mechanism for high efficacy and low cost, detection of these cheap Sybil attacks must be automated using computationally inexpensive algorithms.
Levenshtein distance is a concept from information theory that can measure the difference between two text strings. It is commonly used in natural language processing. The Levenshtein distance between string a and string b is the minimum number of edits required to transform a into b. The fewer changes required to change one string to the other, the smaller the Levenshtein distance. The algorithm, in its naive recursive form, is as follows:

In the context of Gitcoin grants, this idea can be applied to account names that are suspiciously similar. Very similar names might indicate low effort Sybil attempts, for example following account names stand out as very similar and worth investigating to ensure they are genuine individual voters:
Accounts separated by a single edit might not be quite as obvious as those above. Here are some more examples of some slightly more subtle examples of accounts separated by a single edit:
All of these accounts are easily identified with the constraint Levenshtein distance <= 1.
The Levenshtein algorithm is computionally expensive in its naive form because it requires recursion over all possible pairs of accounts. However, it can be made performant using matrix algebra. This approach was recently taken by the FDD Sybil defense squad using a list of ~298,000 registered Gitcoin accounts (meaning the Levenshtein distance algorithm was applied to a matrix of 298,00002 elements). The result was a file with 129,297,647 rows indicating all pairs with edit distances less than 4. As the distance threshold decreases, the number of accounts flagged also decreases, but the likelihood that the flagged accounts are Sybils increases. The threshold distance therefore acts as a calibration slider that the Sybil defense squad can use to balance Sybil detection rate against cost of computation.
The technique has been applied to the full dataset of all Gitcoin accounts as a way to quantify the bulk number of suspicious accounts, but it could also run as a service that runs periodically to flag accounts that get created in some time window. This could become integrated as a gadget in the grant application workflow.
To demonstrate that the Levenshtein distance really is identifying Sybil-like behaviour, some of the accounts identified by the algorithm can be examined manually. These three accounts were flagged by the Levenshtein algorithm as potential Sybils:



Examine these accounts in more detail using links: Account 1, Account 2, Account 3.
The account names are separated by a single edit (Levenshtein distance ==1). All three accounts have donated a total of precisely $37, divided identically between the same eight projects in the same round. None of the accounts have participated in bounties, hackathons, tips or kudos. All three were created at about the same time and have no activity prior to, or following, those eight votes. These seem to be obvious Sybils that were correctly flagged by the Levenshtein algorithm. Furthermore, the three accounts above were just three of eight accounts that were all separated by a single edit, and all shared the same Gitcoin activity profile and voting record.
Creating multiple accounts with very similar names is a very lazy attack, and probably easily identified humans in the grant review system. However, identifying them by implementing a Levenshtein distance algorithm to the entire set of accounts in a grant round removes some of the load on the human reviewers and acts as a cost-optimization because the algorithmic apporoach is fast and cheap.
Levenshtein distance for beginners
No activity yet