This is the start of a series of posts that examines how to analyze Solana data in SQL using different platforms. We start off the series with a couple of key queries that yield insights to both the data platforms and the data themselves.
The best free SQL based tools are Cryptohouse (powered by Goldksy), Dune and Flipside for accessing Solana data. This post evaluates the tradeoffs between using each of these platforms by using a test of useful queries. These queries look at the most fundamental table for Solana data, the transactions
table, which forms the basis of most fundamental research on Solana.
We'll examine three queries:
Transaction Sample: A basic query to quickly inspect the dataset.
Filter Array for Accounts: A more complex query that filters transactions based on a subset of accounts, an essential task given that many Solana transactions have columns stored as arrays.
Filter + GroupBy Aggregation: A sophisticated query that filters account arrays and performs a groupby aggregation, a common task in data analysis.
Transactions
DatasetThe transactions
table is a fundamental table for Solana blockchain data and understanding how to effectively navigate through the table is key to unlocking more on-chain insights. Every Solana transaction touches a set of accounts, which are stored in an array. There is also a pre-balance and post-balance column arrays which store the SOL amount before and after the transaction, which are important to derive SOL related volume for fees, compute costs, and validator revenue.
This table also has a status/success column that determines whether the transaction was successful or not. Continuously higher than normal transaction failures can indicate an interesting research direction.
Finally the transactions
table contains all of the instructions required to execute the transaction in a set of logical steps. For those familiar with EVM data, Solana instructions are similar to EVM log event data.
There are three primary criteria for evaluating these tools: performance, features, and data availability. The importance of each criterion depends on the specific research outcomes desired.
For instance, Cryptohouse tends to have faster query results, but it currently lacks features like dashboards and full data availability compared to Flipside and Dune. Dune has custom Trino-based DuneSQL, while Flipside uses a more standardized SQL dialect, making it easier to use with AI models like ChatGPT. Dune also offers intelligent caching, speeding up queries that are run multiple times.
This is a basic query that is commonly used to take a quick look at the dataset that you are working with. It is important to see that the dataset you choose has both the columns and accurate (e.g. non null) data in those columns. Although blockchain data is standardized at the node level, each data indexer chooses the most efficient way to store and format their data to serve to users.
Some notable comments about the transactions
dataset differences is namely what is available and what format it shows up in. If the columns you want do not exist in the dataset, then that automatically restricts what data platform suits your needs.
For example Cryptohouse does not have an instructions
column. While dune has instructions
, the output is not friendly for longer formats so it's not possible to inspect the output of larger sized columns. While Flipside has instructions
as well, the layout is the opposite of dune where each column is maximally sized. This is not the most ergonomical because the width of the data returned is quite large to scroll through. Another example is that both Cryptohouse and Dune have a tx_index
column whereas flipside doesn't.
Platform | Query Execution Time (seconds) | Rows Read | Bytes Read |
---|
Dune | 15 | 30,228,504 rows | 50.0 GB |
Crypto-house | 13.437 | 149,395,554 rows | 248.11 GiB |
Flipside | 8 | N/A | N/A |
SELECT
*
FROM
solana.transactions
WHERE block_timestamp >= now() - INTERVAL 12 HOUR
ORDER BY
block_slot,
index
LIMIT 1000
SELECT
*
FROM solana.transactions
WHERE block_time >= CURRENT_TIMESTAMP - INTERVAL '12' HOUR
ORDER BY
block_slot,
index
LIMIT 1000
SELECT
*
FROM
solana.core.fact_transactions
WHERE
block_timestamp >= DATEADD(HOUR, -12, CURRENT_TIMESTAMP)
ORDER BY
block_id
LIMIT 1000;
This is a more complex query that filters transactions based on a subset of accounts. This is a very useful query because a lot of Solana transactions
columns are stored as arrays so it's important to know how to filter through arrays. The second is that most analysis will be geared towards specific accounts (protocol accounts, mev accounts, trading accounts, etc).
Notably Crytpohouse starts to shine with these kinds of queries and offers superior performance compared to Dune and Flipside. Cryptohouse also scanned nearly 4x as much data as the Dune. Flipside also performed fairly well with the filter task as well, but not as much data is available on the query statistics.
Platform | Query Execution Time (seconds) | Rows Read | Bytes Read |
---|
Dune | 115.874 | 30,428,286 rows | 50.5 GB |
Crypto-house | 16.269 | 149,123,945 rows | 198.12 GiB |
Flipside | 37 | N/A | N/A |
SELECT
*
FROM
solana.transactions
WHERE
arrayExists(
x -> x.1 IN (
'Fc8bpeCMifWYv97pQ3k5xDvd98nuVg6yAaZrwmy4RRp6',
-- zeta contract
'rec5EKMGg6MxZYaMdyBfgwp4d5rB9T1VQH5pJv5LtFJ' -- pyth contract
),
accounts
)
AND block_timestamp >= now() - INTERVAL 12 HOUR
ORDER BY
block_slot,
index
LIMIT 10000
SELECT
*
FROM
solana.transactions
WHERE
arrayExists(
x -> x.1 IN (
'Fc8bpeCMifWYv97pQ3k5xDvd98nuVg6yAaZrwmy4RRp6',
-- zeta contract
'rec5EKMGg6MxZYaMdyBfgwp4d5rB9T1VQH5pJv5LtFJ' -- pyth contract
),
accounts
)
AND block_timestamp >= now() - INTERVAL 12 HOUR
ORDER BY
block_slot,
index
LIMIT 10000
WITH filtered_accounts AS (
SELECT
*
FROM
solana.core.fact_transactions
WHERE
block_timestamp >= DATEADD(HOUR, -12, CURRENT_TIMESTAMP)
)
SELECT
*
FROM
filtered_accounts f,
LATERAL FLATTEN(input => f.account_keys) AS a
WHERE
a.value:pubkey IN (
'Fc8bpeCMifWYv97pQ3k5xDvd98nuVg6yAaZrwmy4RRp6',
'rec5EKMGg6MxZYaMdyBfgwp4d5rB9T1VQH5pJv5LtFJ'
)
ORDER BY
block_id
LIMIT 10000;
This is a more complex query that filters the account arrays and then performs a groupby aggregation. Groupby aggregations are the bread and butter for most analytics questions and provides strong intuition on the overall shape of the data. In this particular query, we filter accounts for the pyth contract and groupby whether the transaction was a failure or success to analyze the failure rates specific to that contract. It turned out that the Pyth contract is experiencing 3x higher failure rates than the average Solana transaction (more on this soon).
The Pyth example is a query example that does not rely on a dashboard to gain insights into the data. These types of queries would be best suited for the fastest performing tool, which happens to be Cryptohouse. Here Cryptohouse really shines, reading more rows and processing more bytes than Dune while being multiples faster than both Dune and Flipside.
Platform | Query Execution Time (seconds) | Rows Read | Bytes Read |
---|
Dune | 54.143 | 150,493,063 rows | 50.3 GB |
Crypto-house | 13.247 | 704,857,665 rows | 225.74 GiB |
Flipside | 38 | N/A | N/A |
SELECT
status,
COUNT(*) AS status_count
FROM
solana.transactions
WHERE
arrayExists(
x -> x.1 IN [
'rec5EKMGg6MxZYaMdyBfgwp4d5rB9T1VQH5pJv5LtFJ'
],
accounts
)
AND block_slot > 282400000
AND block_slot < 282900000
GROUP BY
status
SELECT
success,
COUNT(*) AS transaction_count
FROM solana.transactions
WHERE
block_slot > 282400000
AND block_slot < 282900000
AND ANY_MATCH(account_keys, x -> x = 'rec5EKMGg6MxZYaMdyBfgwp4d5rB9T1VQH5pJv5LtFJ')
GROUP BY
success
WITH filtered_transactions AS (
SELECT
*
FROM
solana.core.fact_transactions
WHERE
block_id > 282400000
AND block_id < 282900000
)
SELECT
SUCCEEDED,
COUNT(*) AS status_count
FROM
filtered_transactions f,
LATERAL FLATTEN(input => f.account_keys) AS a
WHERE
a.value:pubkey IN (
'rec5EKMGg6MxZYaMdyBfgwp4d5rB9T1VQH5pJv5LtFJ'
)
GROUP BY
SUCCEEDED;
In conclusion, while all three platforms—Cryptohouse, Dune, and Flipside—offer valuable tools for analyzing Solana data, each has its strengths and weaknesses. Cryptohouse excels in performance, particularly with more complex queries, but lacks some features and full data availability. Dune offers flexibility with its custom SQL dialect and intelligent caching, while Flipside provides a more standardized experience with solid performance. The best choice of platform will depend on your specific needs, whether it be speed, ease of use, or the availability of specific features.