Ethereum analytics with BigQuery
While Dune Analytics has rightly gotten a lot of positive attention recently for providing an easy-to-use platform for doing analytics over Ethereum datasets, you may not know that Google also provides a comprehensive dataset of Ethereum data on its large-scale analytics querying platform, BigQuery. Not only are the datasets comprehensive - even including a table of call execution traces for every Ethereum transaction - they also benefit from BigQuery’s distributed nature, meaning that even l...
A response to Vitalik's thoughts on ENS fee reform
Today, Vitalik published a thoughtful piece on ENS name registration fees, and mechanisms for making the namespace more sustainable in the long run. It makes some excellent points, and moves away from a simple Harberger Tax model to a more nuanced model that takes into account some essential demands of decentralised naming systems, such as the need for stability in how a name resolves. I will try and write up a more detailed response later, but for now I wanted to higlight a few points and de...
Ethereum analytics with BigQuery
While Dune Analytics has rightly gotten a lot of positive attention recently for providing an easy-to-use platform for doing analytics over Ethereum datasets, you may not know that Google also provides a comprehensive dataset of Ethereum data on its large-scale analytics querying platform, BigQuery. Not only are the datasets comprehensive - even including a table of call execution traces for every Ethereum transaction - they also benefit from BigQuery’s distributed nature, meaning that even l...
A response to Vitalik's thoughts on ENS fee reform
Today, Vitalik published a thoughtful piece on ENS name registration fees, and mechanisms for making the namespace more sustainable in the long run. It makes some excellent points, and moves away from a simple Harberger Tax model to a more nuanced model that takes into account some essential demands of decentralised naming systems, such as the need for stability in how a name resolves. I will try and write up a more detailed response later, but for now I wanted to higlight a few points and de...
Share Dialog
Share Dialog

Subscribe to Nick Johnson

Subscribe to Nick Johnson
<100 subscribers
<100 subscribers
I previously wrote about how to use BigQuery to analyse Ethereum datasets. This is quick and easy for datasets that already have event tables thanks to the hard work of the Blockchain BigQuery team, but what if you want to analyse events that aren’t already broken out into their own tables? There is a global event table, bigquery-public-data.crypto_ethereum.logs, but event data is in its raw format there - not easy to query on.
Fortunately, thanks to BigQuery’s support for user-defined functions, it’s possible to generate your own event tables that are functionally identical to the ‘official’ ones. Here’s how.
The logs table is huge - over 2 billion rows and 1TB in size - and if you simply query it directly you’re going to end up paying a lot in BigQuery data charges, particularly if you’re extracting data on multiple events from a single contract, or testing things out. A sensible first step is to extract the event data you care about to a smaller working table. Run the following query, substituting the address of the contract you want to work with and a table name that works for you:
CREATE OR REPLACE TABLE `ens-manager.token.token_events` AS
SELECT
*
FROM
`bigquery-public-data.crypto_ethereum.logs`
WHERE
address = "0xc18360217d8f7ab5e7c516566761ea12ce7f9d72";
Next, we’re going to need some user-defined functions for doing crucial event processing data:
CREATE OR REPLACE FUNCTION `ens-manager.token.get_topic_hash`(abispec STRING) RETURNS STRING LANGUAGE js
OPTIONS (library=["https://storage.googleapis.com/ens-manager.appspot.com/ethers-abi.js"]) AS R"""
return abi.Interface.getEventTopic(abi.EventFragment.fromString(abispec));
""";
CREATE OR REPLACE FUNCTION `ens-manager.token.decode_log`(abispec STRING, data STRING, topics ARRAY<STRING>) RETURNS ARRAY<STRING> LANGUAGE js
OPTIONS (library=["https://storage.googleapis.com/ens-manager.appspot.com/ethers-abi.js"]) AS R"""
var iface = new abi.Interface(["event " + abispec]);
return iface.decodeEventLog(iface.fragments[0], data, topics);
""";
Both of these functions rely on a version of ethers.js that has been built to run as a self-contained JavaScript dependency. The resource is public, so feel free to use it here - no need to host it yourself.
get_topic_hashtakes an ABI specification such as Transfer(address indexed from, address indexed to, uint256 value) and calculates the 256 bit topic hash for it, which we can then use to filter out the rows we care about from our logs table.
decode_log takes an ABI specification in the same format, and uses it to decode an event log from data and topics fields, returning an array of decoded fields.
Finally, we’ll write a table function that uses our other two functions to dynamically generate a table with the events we care about:
CREATE OR REPLACE TABLE FUNCTION `ens-manager.token.decoded_logs`(event STRING) AS (
SELECT
transaction_hash,
block_number,
block_timestamp,
block_hash,
log_index,
`ens-manager.token.decode_log`(event, data, topics) AS event,
FROM
`ens-manager.token.token_events`
WHERE
topics[SAFE_OFFSET(0)] = `ens-manager.token.get_topic_hash`(event)
);
These three functions are persistent and independent of the logs you’re decoding - so you can define them once and reuse them everywhere.
Now we have the prerequisites in place, we can finally build the table of events we want. We can simply query decoded_logs, and rename and cast the fields we care about. Here’s an example of decoding a Transfer event:
CREATE OR REPLACE TABLE `ens-manager.token.event_Transfer` AS
SELECT
transaction_hash,
block_number,
block_timestamp,
block_hash,
log_index,
event[OFFSET(0)] AS `from`,
event[OFFSET(1)] AS `to`,
CAST(event[OFFSET(2)] AS bignumeric) / 1e18 AS value
FROM
`ens-manager.token.decoded_logs`("Transfer(address indexed from, address indexed to, uint256 value)");
Note how decoded_logs outputs the event fields as an array in the order they’re specified in the ABI; they’re all output as strings, so we have to cast each element to its correct data type, and alias it as appropriate.
Assuming you want to do this persistently, you can simply define the functions once, then create a script that executes the first extraction query, followed by the decoded event table queries, and use BigQuery’s support for scheduled execution to run it on a regular basis, such as daily.
I previously wrote about how to use BigQuery to analyse Ethereum datasets. This is quick and easy for datasets that already have event tables thanks to the hard work of the Blockchain BigQuery team, but what if you want to analyse events that aren’t already broken out into their own tables? There is a global event table, bigquery-public-data.crypto_ethereum.logs, but event data is in its raw format there - not easy to query on.
Fortunately, thanks to BigQuery’s support for user-defined functions, it’s possible to generate your own event tables that are functionally identical to the ‘official’ ones. Here’s how.
The logs table is huge - over 2 billion rows and 1TB in size - and if you simply query it directly you’re going to end up paying a lot in BigQuery data charges, particularly if you’re extracting data on multiple events from a single contract, or testing things out. A sensible first step is to extract the event data you care about to a smaller working table. Run the following query, substituting the address of the contract you want to work with and a table name that works for you:
CREATE OR REPLACE TABLE `ens-manager.token.token_events` AS
SELECT
*
FROM
`bigquery-public-data.crypto_ethereum.logs`
WHERE
address = "0xc18360217d8f7ab5e7c516566761ea12ce7f9d72";
Next, we’re going to need some user-defined functions for doing crucial event processing data:
CREATE OR REPLACE FUNCTION `ens-manager.token.get_topic_hash`(abispec STRING) RETURNS STRING LANGUAGE js
OPTIONS (library=["https://storage.googleapis.com/ens-manager.appspot.com/ethers-abi.js"]) AS R"""
return abi.Interface.getEventTopic(abi.EventFragment.fromString(abispec));
""";
CREATE OR REPLACE FUNCTION `ens-manager.token.decode_log`(abispec STRING, data STRING, topics ARRAY<STRING>) RETURNS ARRAY<STRING> LANGUAGE js
OPTIONS (library=["https://storage.googleapis.com/ens-manager.appspot.com/ethers-abi.js"]) AS R"""
var iface = new abi.Interface(["event " + abispec]);
return iface.decodeEventLog(iface.fragments[0], data, topics);
""";
Both of these functions rely on a version of ethers.js that has been built to run as a self-contained JavaScript dependency. The resource is public, so feel free to use it here - no need to host it yourself.
get_topic_hashtakes an ABI specification such as Transfer(address indexed from, address indexed to, uint256 value) and calculates the 256 bit topic hash for it, which we can then use to filter out the rows we care about from our logs table.
decode_log takes an ABI specification in the same format, and uses it to decode an event log from data and topics fields, returning an array of decoded fields.
Finally, we’ll write a table function that uses our other two functions to dynamically generate a table with the events we care about:
CREATE OR REPLACE TABLE FUNCTION `ens-manager.token.decoded_logs`(event STRING) AS (
SELECT
transaction_hash,
block_number,
block_timestamp,
block_hash,
log_index,
`ens-manager.token.decode_log`(event, data, topics) AS event,
FROM
`ens-manager.token.token_events`
WHERE
topics[SAFE_OFFSET(0)] = `ens-manager.token.get_topic_hash`(event)
);
These three functions are persistent and independent of the logs you’re decoding - so you can define them once and reuse them everywhere.
Now we have the prerequisites in place, we can finally build the table of events we want. We can simply query decoded_logs, and rename and cast the fields we care about. Here’s an example of decoding a Transfer event:
CREATE OR REPLACE TABLE `ens-manager.token.event_Transfer` AS
SELECT
transaction_hash,
block_number,
block_timestamp,
block_hash,
log_index,
event[OFFSET(0)] AS `from`,
event[OFFSET(1)] AS `to`,
CAST(event[OFFSET(2)] AS bignumeric) / 1e18 AS value
FROM
`ens-manager.token.decoded_logs`("Transfer(address indexed from, address indexed to, uint256 value)");
Note how decoded_logs outputs the event fields as an array in the order they’re specified in the ABI; they’re all output as strings, so we have to cast each element to its correct data type, and alias it as appropriate.
Assuming you want to do this persistently, you can simply define the functions once, then create a script that executes the first extraction query, followed by the decoded event table queries, and use BigQuery’s support for scheduled execution to run it on a regular basis, such as daily.
No activity yet