
How to Upload Files on Arweave With Python
Throughout history, the need to store information has been crucial, both for individuals and organizations. Traditionally, paper documents and photographs were kept in physical folders and secured at home. The invention of computers revolutionized storage, transitioning from floppy disks to CDs, USBs, hard drives, and finally, solid-state drives. Recognizing the growing demand for data storage, the internet embraced cloud services, which emerged rapidly like mushrooms. Massive data centers, l...

Meet AO: The Hyper-Parallel Computer
The ao hyper-parallel computer hit the scene in February 2024 and is already making waves among developers worldwide. This innovative system takes inspiration from the actor model, enabling processes to run concurrently and communicate seamlessly without waiting for each other. Each processing unit within ao adheres to a core data protocol that runs on Arweave blockchain-like decentralized storage solution. This distributed network of nodes creates a unified experience for users, presenting a...

Exploring Doug DeMuro's Cars Dataset
This analysis is part of a data challenge brought by Desights, a platform built by the Ocean Protocol to . If you’re not familiar with the latter, Ocean Protocol is paving the way for a new data economy using the power of blockchain. They provide several services, like earning and trading mechanisms, data challenges, and the Ocean Market, where datasets, reports, algorithms and more, can be published on Web3 decentralized storage infrastructures like Arweave and IPFS. Desights team is focused...
Where Data Science and AI, converge with Web3. Build 🔨 | Write ✏️ | Code 💻

How to Upload Files on Arweave With Python
Throughout history, the need to store information has been crucial, both for individuals and organizations. Traditionally, paper documents and photographs were kept in physical folders and secured at home. The invention of computers revolutionized storage, transitioning from floppy disks to CDs, USBs, hard drives, and finally, solid-state drives. Recognizing the growing demand for data storage, the internet embraced cloud services, which emerged rapidly like mushrooms. Massive data centers, l...

Meet AO: The Hyper-Parallel Computer
The ao hyper-parallel computer hit the scene in February 2024 and is already making waves among developers worldwide. This innovative system takes inspiration from the actor model, enabling processes to run concurrently and communicate seamlessly without waiting for each other. Each processing unit within ao adheres to a core data protocol that runs on Arweave blockchain-like decentralized storage solution. This distributed network of nodes creates a unified experience for users, presenting a...

Exploring Doug DeMuro's Cars Dataset
This analysis is part of a data challenge brought by Desights, a platform built by the Ocean Protocol to . If you’re not familiar with the latter, Ocean Protocol is paving the way for a new data economy using the power of blockchain. They provide several services, like earning and trading mechanisms, data challenges, and the Ocean Market, where datasets, reports, algorithms and more, can be published on Web3 decentralized storage infrastructures like Arweave and IPFS. Desights team is focused...
Where Data Science and AI, converge with Web3. Build 🔨 | Write ✏️ | Code 💻

Subscribe to Marco Rodrigues

Subscribe to Marco Rodrigues


Share Dialog
Share Dialog
<100 subscribers
<100 subscribers
The following is a data analysis project, aimed at analysing user behavior of the StakeWise platform. The results should provide the DeFi protocol with insightful information about its customers and possible recommendations.
StakeWise is an Ethereum liquid staking service, that allows to stake ETH in the Ethereum 2.0 network and earn rewards. It is called a liquid service because users are able to unstake at any time or use their staked ETH to farm throughout DeFi. It provides a non-custodial staking solution where the staked ETH goes into a shared pool among other participants. Every deposit results in the minting of sETH2 (StakeWise’s ERC-20 standard staked ETH token), which is always backed by ETH at a 1:1 ratio. As long as the user holds sETH2, he/she will be generating rETH2 (staking rewards). These rewards do not physically exist, but their record is stored in the Beacon Chain, so they can be measured and used to back rETH2 in a 1:1 ratio. These tokens are based on the ERC-20 standard, meaning they can be transferred or exchanged with other tokens and used to exit from staking or access more opportunities to earn income in other DeFi protocols.
The full report can be obtained in the Ocean Market as an NFT. See it below.
nft://137/0x0fe28573b94614c9d5ec0dd71db4f5569920e9d5/?showBuying=true&showMeta=true
The project is split into two main steps:
Data Mining: The on-chain data was obtained using Etherscan API and Python’s requests library. Besides, prices of ETH were also extracted using Bitstamp API.
Data Analysis: The analysis contains several key questions. In this entry, we’ll look into some of them.
The first move was to use the Etherscan explorer to extract the addresses that interacted with the protocol, through this link. The two main addresses are:
sETH2 token contract: 0xFe2e637202056d30016725477c5da089Ab0A043A
rETH2 token contract: 0x20bc832ca081b91433ff6c17f85701b6e92486c5
However, to get the addresses I only used the sETH2 contract address, because the rETH2 is generated from the first.
Saying that I obtained a list of 4744 addresses, which were used to extract important features through the Etherscan API and Python requests library.
import os
import pandas as pd
import requests
from dotenv import load_dotenv
load_dotenv("keys.env")
# etherscan API key
api_key = os.getenv('etherscan_api_key')
# contract wallets
sETH2_contract_wallet = "0xFe2e637202056d30016725477c5da089Ab0A043A"
rETH2_contract_wallet = "0x20BC832ca081b91433ff6c17f85701B6e92486c5"
# etherscan base URL
base_url = 'https://api.etherscan.io/api'
# addresses dataset
df = pd.read_csv('addresses.csv')
The Python script above shows the necessary imports to extract and filter the data. The API key, can be obtained on Etherescan, and the addresses.csv is the list extracted before.
The following script is responsible for extracting meaningful data from each address on the list, by using actions, which are in fact API endpoints.
import os
import pandas as pd
import requests
def get_data(address, action, contract_address=None):
params = {
'module': 'account',
'action': action,
'contractaddress': contract_address,
'address': address,
'tag': 'latest',
'apikey': api_key}
retries = 3
delay = 5
for _ in range(retries):
try:
response = requests.get(base_url, params=params).json()
print(address)
return response['result']
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
print("Retrying after delay...")
time.sleep(delay)
print("Failed to retrieve data")
return None
To see the dictionary with the different actions visit the full report NFT. Data transformation was also applied in order to get extra features, such as:
Number of transactions
The total transacted
The average number of transactions
The average gas prices
The timestamp of the first transaction
As I mentioned at the beginning, the price of ETH was also extracted during certain periods in order to compare how it has an influence on customer behaviour.
import os
import pandas as pd
import requests
def get_data(address, action, contract_address=None):
params = {
'module': 'account',
'action': action,
'contractaddress': contract_address,
'address': address,
'tag': 'latest',
'apikey': api_key}
retries = 3
delay = 5
for _ in range(retries):
try:
response = requests.get(base_url, params=params).json()
print(address)
return response['result']
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
print("Retrying after delay...")
time.sleep(delay)
print("Failed to retrieve data")
return None
The script above takes the Closing price from Bitstamp, but it also generates a data frame with this information. Note that the feature first_transact_hour is also created, this will be used to merge with the main data frame, that contains all the other features.
In case you want to explore the full dataset and jump all the data mining steps you can find it for free in the following data NFT.
nft://137/0xcf546aa4ca1831284d5dd9af05e4c2b6c93784c7/?showBuying=true&showMeta=true&size=small
The dataset contains all these features:
Addresses: ETH wallets that interacted with the protocol.
sETH2_balance: the balance of sETH2.
rETH2_balance: the balance of rETH2.
ETH_balance: the balance of ETH.
n_transactions: the number of transactions between the wallet and the protocol by wallet.
total_transactions: is the total transacted by wallet.
average_transactions: the average transacted by wallet.
average_gas_price: the average gas price resulting from transactions by wallet.
average_transactions: the average transacted by wallet.
first_transact_timestamp: the timestamp of the first transaction by wallet.
first_transact_value: the value of the first transaction.
first_transact_hour: the hour when the first transaction occurred.
Cohort_ID: An ID to group wallets by Month and Year.
Full_deposit: If ETH balance is null the full deposit is "yes", if not is "No"
close: the closing price of Ethereum at the hour of the first transaction.
The way StakeWise generates revenue is simple. The users stake their ETH and receive sETH2, with the same value as the ETH they have staked. The staked ETH goes to a pool, and once this pool reaches 32 ETH, another node is created. An Ethereum node generates around 5% APY. StakeWise rewards the users with the biggest cut of this percentage, and they take 10% of the profits.
Let’s start by analysing the 20 Cohort IDs with the biggest average revenue per user.

By looking at the chart above, there’s a considerable increase in the average revenue in August 2023. Nonetheless, there’s no actual correlation between the Ethereum price and the revenue generated. This seems a direct cause of people not unstaking their sETH2. The total balance of this token by cohort is much higher in August 2023 than in the other months. We can notice a slight decrease in the staking percentage, meaning that many people are staking more at the moment. This secures the platform but decreases the rewards for the users.
Let’s now see if users have been increasing their activity on the platform over time.

In the chart, when the price was higher, the transactions seemed to be fewer, as we have seen before the revenue wasn’t much during these periods. This could mean, that the users are more concerned with selling their ETH, instead of staking when the price is high. However, from February 2021 until August 2021, there is some correlation between the transactions and the price. We see for instance, that at the beginning of the bear market in June 2022, the transactions skyrocketed.
Let’s analyse one last chart, and see if the transactions tend to change in average value. For that, we take the average value transacted balance (in ETH) per user by cohort ID.

We see in the chart above that on August 2023 and August 2022, the average transaction values, were much higher than in the other cohorts. The average is quite high compared to the others, which might mean we are in the presence of strong outliers. Wallets that made huge transactions can unbalance the results.
First, it is worth mentioning that the results presented in this report, are very few compared to the ones in the full report. As for the data gathering, several strategies could have been taken, but using Etherscan seemed to be the easiest path to grab on-chain data of the StakeWise protocol. Then came the analysis of the data. Concerning the revenue, we have seen that it has been increasing over time. Users with more money or institutions seem to be entering the protocol as well, this can be caused by the release of version 3 soon. Also, it could mean that people are not exiting their staking, probably due to steady price fluctuations. However, this high staking demand, led to a decrease in the staking rewards, which could lead to a loss of users in the future, so the platform needs to introduce new strategies, such as adjustments in the protocols fee, the 10% cut can be decreased depending on the demand. Promoting decentralization and being more transparent regarding their nodes is also key to retain customers.
Briefly, the data shows a very good progression in this protocol. It is new and has space to grow, roughly they seem to be taking good strategies, and my remarks could only serve as a plus, to the already good job being done.
The following is a data analysis project, aimed at analysing user behavior of the StakeWise platform. The results should provide the DeFi protocol with insightful information about its customers and possible recommendations.
StakeWise is an Ethereum liquid staking service, that allows to stake ETH in the Ethereum 2.0 network and earn rewards. It is called a liquid service because users are able to unstake at any time or use their staked ETH to farm throughout DeFi. It provides a non-custodial staking solution where the staked ETH goes into a shared pool among other participants. Every deposit results in the minting of sETH2 (StakeWise’s ERC-20 standard staked ETH token), which is always backed by ETH at a 1:1 ratio. As long as the user holds sETH2, he/she will be generating rETH2 (staking rewards). These rewards do not physically exist, but their record is stored in the Beacon Chain, so they can be measured and used to back rETH2 in a 1:1 ratio. These tokens are based on the ERC-20 standard, meaning they can be transferred or exchanged with other tokens and used to exit from staking or access more opportunities to earn income in other DeFi protocols.
The full report can be obtained in the Ocean Market as an NFT. See it below.
nft://137/0x0fe28573b94614c9d5ec0dd71db4f5569920e9d5/?showBuying=true&showMeta=true
The project is split into two main steps:
Data Mining: The on-chain data was obtained using Etherscan API and Python’s requests library. Besides, prices of ETH were also extracted using Bitstamp API.
Data Analysis: The analysis contains several key questions. In this entry, we’ll look into some of them.
The first move was to use the Etherscan explorer to extract the addresses that interacted with the protocol, through this link. The two main addresses are:
sETH2 token contract: 0xFe2e637202056d30016725477c5da089Ab0A043A
rETH2 token contract: 0x20bc832ca081b91433ff6c17f85701b6e92486c5
However, to get the addresses I only used the sETH2 contract address, because the rETH2 is generated from the first.
Saying that I obtained a list of 4744 addresses, which were used to extract important features through the Etherscan API and Python requests library.
import os
import pandas as pd
import requests
from dotenv import load_dotenv
load_dotenv("keys.env")
# etherscan API key
api_key = os.getenv('etherscan_api_key')
# contract wallets
sETH2_contract_wallet = "0xFe2e637202056d30016725477c5da089Ab0A043A"
rETH2_contract_wallet = "0x20BC832ca081b91433ff6c17f85701B6e92486c5"
# etherscan base URL
base_url = 'https://api.etherscan.io/api'
# addresses dataset
df = pd.read_csv('addresses.csv')
The Python script above shows the necessary imports to extract and filter the data. The API key, can be obtained on Etherescan, and the addresses.csv is the list extracted before.
The following script is responsible for extracting meaningful data from each address on the list, by using actions, which are in fact API endpoints.
import os
import pandas as pd
import requests
def get_data(address, action, contract_address=None):
params = {
'module': 'account',
'action': action,
'contractaddress': contract_address,
'address': address,
'tag': 'latest',
'apikey': api_key}
retries = 3
delay = 5
for _ in range(retries):
try:
response = requests.get(base_url, params=params).json()
print(address)
return response['result']
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
print("Retrying after delay...")
time.sleep(delay)
print("Failed to retrieve data")
return None
To see the dictionary with the different actions visit the full report NFT. Data transformation was also applied in order to get extra features, such as:
Number of transactions
The total transacted
The average number of transactions
The average gas prices
The timestamp of the first transaction
As I mentioned at the beginning, the price of ETH was also extracted during certain periods in order to compare how it has an influence on customer behaviour.
import os
import pandas as pd
import requests
def get_data(address, action, contract_address=None):
params = {
'module': 'account',
'action': action,
'contractaddress': contract_address,
'address': address,
'tag': 'latest',
'apikey': api_key}
retries = 3
delay = 5
for _ in range(retries):
try:
response = requests.get(base_url, params=params).json()
print(address)
return response['result']
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
print("Retrying after delay...")
time.sleep(delay)
print("Failed to retrieve data")
return None
The script above takes the Closing price from Bitstamp, but it also generates a data frame with this information. Note that the feature first_transact_hour is also created, this will be used to merge with the main data frame, that contains all the other features.
In case you want to explore the full dataset and jump all the data mining steps you can find it for free in the following data NFT.
nft://137/0xcf546aa4ca1831284d5dd9af05e4c2b6c93784c7/?showBuying=true&showMeta=true&size=small
The dataset contains all these features:
Addresses: ETH wallets that interacted with the protocol.
sETH2_balance: the balance of sETH2.
rETH2_balance: the balance of rETH2.
ETH_balance: the balance of ETH.
n_transactions: the number of transactions between the wallet and the protocol by wallet.
total_transactions: is the total transacted by wallet.
average_transactions: the average transacted by wallet.
average_gas_price: the average gas price resulting from transactions by wallet.
average_transactions: the average transacted by wallet.
first_transact_timestamp: the timestamp of the first transaction by wallet.
first_transact_value: the value of the first transaction.
first_transact_hour: the hour when the first transaction occurred.
Cohort_ID: An ID to group wallets by Month and Year.
Full_deposit: If ETH balance is null the full deposit is "yes", if not is "No"
close: the closing price of Ethereum at the hour of the first transaction.
The way StakeWise generates revenue is simple. The users stake their ETH and receive sETH2, with the same value as the ETH they have staked. The staked ETH goes to a pool, and once this pool reaches 32 ETH, another node is created. An Ethereum node generates around 5% APY. StakeWise rewards the users with the biggest cut of this percentage, and they take 10% of the profits.
Let’s start by analysing the 20 Cohort IDs with the biggest average revenue per user.

By looking at the chart above, there’s a considerable increase in the average revenue in August 2023. Nonetheless, there’s no actual correlation between the Ethereum price and the revenue generated. This seems a direct cause of people not unstaking their sETH2. The total balance of this token by cohort is much higher in August 2023 than in the other months. We can notice a slight decrease in the staking percentage, meaning that many people are staking more at the moment. This secures the platform but decreases the rewards for the users.
Let’s now see if users have been increasing their activity on the platform over time.

In the chart, when the price was higher, the transactions seemed to be fewer, as we have seen before the revenue wasn’t much during these periods. This could mean, that the users are more concerned with selling their ETH, instead of staking when the price is high. However, from February 2021 until August 2021, there is some correlation between the transactions and the price. We see for instance, that at the beginning of the bear market in June 2022, the transactions skyrocketed.
Let’s analyse one last chart, and see if the transactions tend to change in average value. For that, we take the average value transacted balance (in ETH) per user by cohort ID.

We see in the chart above that on August 2023 and August 2022, the average transaction values, were much higher than in the other cohorts. The average is quite high compared to the others, which might mean we are in the presence of strong outliers. Wallets that made huge transactions can unbalance the results.
First, it is worth mentioning that the results presented in this report, are very few compared to the ones in the full report. As for the data gathering, several strategies could have been taken, but using Etherscan seemed to be the easiest path to grab on-chain data of the StakeWise protocol. Then came the analysis of the data. Concerning the revenue, we have seen that it has been increasing over time. Users with more money or institutions seem to be entering the protocol as well, this can be caused by the release of version 3 soon. Also, it could mean that people are not exiting their staking, probably due to steady price fluctuations. However, this high staking demand, led to a decrease in the staking rewards, which could lead to a loss of users in the future, so the platform needs to introduce new strategies, such as adjustments in the protocols fee, the 10% cut can be decreased depending on the demand. Promoting decentralization and being more transparent regarding their nodes is also key to retain customers.
Briefly, the data shows a very good progression in this protocol. It is new and has space to grow, roughly they seem to be taking good strategies, and my remarks could only serve as a plus, to the already good job being done.
No activity yet