Advanced SQL Query Optimization for Superchain Analysis in Tokenflow

Introduction

Recently, I delved into the world of Optimism, specifically focusing on the Superchain, and explored the advantages of tools like TokenFlow, which facilitate the visualization of metrics and complex analyses. Leveraging my knowledge in SQL, I decided to create small widgets and graphs that simplify the visualization of the four pillars of Superchain analysis by displaying transactions, active addresses, deployed contracts, and fees paid.

During this process, I compared one of my queries with others already in existence and found significant differences in performance. This led me to reflect on the importance of properly structuring a widget for TokenFlow from the start, ensuring optimization that better utilizes the platform’s resources. This would not only improve overall performance but also directly impact visualization times. Below, I describe the use cases, the results obtained, and the optimizations applied.

Explanation and Justification of the Queries

While analyzing and constructing SQL queries for the Superchain blockchains, I implemented several optimizations to improve performance without compromising accuracy. Here I detail the key optimizations:

Selecting specific columns:
- Instead of using SELECT *, I selected only the necessary columns for each pillar, such as block_date, gas_used, effective_gas_price, among others. This reduces the amount of data processed and lightens the system load.
Optimized date filtering:
- I applied the filter block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE()) directly in each subquery. This way, each dataset is filtered before performing joins or combinations, avoiding unnecessary processing.
Using SUM with CASE:
- To aggregate data by blockchain, I used SUM(CASE WHEN chain = 'Zora' THEN daily_transactions ELSE 0 END). This allows aggregations efficiently within the same query without resorting to operations like PIVOT, improving performance.
Using UNION ALL instead of JOIN:
- By using UNION ALL to combine data from different blockchains, we avoid duplicating rows and maintain the simplicity of the query. This is especially useful in environments with large data volumes where JOIN operations can be costly.

Query Optimization: Use Cases

Use Case 1: Daily Transactions

Unoptimized query:The query combined data from all blockchains into a single set before counting daily transactions, which resulted in less efficient performance.

WITH superchain_transactions AS (
    SELECT *, 'Zora' AS chain FROM zora.core.TRANSACTIONS
    UNION ALL
    SELECT *, 'OP Mainnet' AS chain FROM optimism.core.transactions
    UNION ALL 
    SELECT *, 'Mint' AS chain FROM mint.core.transactions
    UNION ALL
    SELECT *, 'Mode' AS chain FROM mode.core.transactions
)

SELECT * FROM (
    SELECT chain,
           block_date AS date,
           COUNT(*) AS "Daily Transactions"
    FROM superchain_transactions
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY chain, block_date
    ORDER BY block_date, chain
) 
PIVOT (
    MAX("Daily Transactions") FOR chain IN ('Zora', 'OP Mainnet', 'Mint', 'Mode')
) AS p(date, zora, "OP Mainnet", mint, mode)
ORDER BY date;

Result:

94 rows / 2.39 KB in 5,879 ms

Optimized query:In this version, each set of transactions is aggregated separately and then combined, avoiding unnecessary processing.

WITH superchain_transactions AS (
    SELECT block_date, COUNT(*) AS daily_transactions, 'Zora' AS chain
    FROM zora.core.transactions
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(*) AS daily_transactions, 'OP Mainnet' AS chain
    FROM optimism.core.transactions
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(*) AS daily_transactions, 'Mint' AS chain
    FROM mint.core.transactions
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(*) AS daily_transactions, 'Mode' AS chain
    FROM mode.core.transactions
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date
)

SELECT 
    block_date AS date, 
    SUM(CASE WHEN chain = 'Zora' THEN daily_transactions ELSE 0 END) AS zora_transactions,
    SUM(CASE WHEN chain = 'OP Mainnet' THEN daily_transactions ELSE 0 END) AS op_mainnet_transactions,
    SUM(CASE WHEN chain = 'Mint' THEN daily_transactions ELSE 0 END) AS mint_transactions,
    SUM(CASE WHEN chain = 'Mode' THEN daily_transactions ELSE 0 END) AS mode_transactions
FROM 
    superchain_transactions
GROUP BY 
    block_date
ORDER BY 
    block_date;

Result:

94 rows / 2.39 KB in 1,038 ms

Recommended Chart: Stacked bars

X-axis: date
Y-axis: daily_transactions
Differentiation: By blockchain (Zora, OP Mainnet, Mint, Mode)

https://app.tokenflow.live/studio/editor/67060f50ed220e424ffd20ea

Use Case 2: Active Addresses

Unoptimized query:

WITH superchain_calls AS (
    SELECT *, 'Zora' AS chain FROM zora.core.calls
    UNION ALL 
    SELECT *, 'OP Mainnet' AS chain FROM optimism.core.calls
    UNION ALL
    SELECT *, 'Mint' AS chain FROM mint.core.calls
    UNION ALL
    SELECT *, 'Mode' AS chain FROM mode.core.calls
)

SELECT * FROM (
    SELECT chain,
           block_date AS date,
           COUNT(DISTINCT from_address) AS "Active Addresses"
    FROM superchain_calls
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY chain, block_date
) 
PIVOT (
    MAX("Active Addresses") FOR chain IN ('Zora', 'OP Mainnet', 'Mint', 'Mode')
) AS p(date, zora, "OP Mainnet", mint, mode)
ORDER BY date;

Result:

94 rows / 2.39 KB in 21,207 ms

Optimized query:

WITH superchain_calls AS (
    SELECT block_date, COUNT(DISTINCT from_address) AS active_addresses, 'Zora' AS chain
    FROM zora.core.calls
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(DISTINCT from_address) AS active_addresses, 'OP Mainnet' AS chain
    FROM optimism.core.calls
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(DISTINCT from_address) AS active_addresses, 'Mint' AS chain
    FROM mint.core.calls
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(DISTINCT from_address) AS active_addresses, 'Mode' AS chain
    FROM mode.core.calls
    WHERE block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date
)

SELECT 
    block_date AS date, 
    SUM(CASE WHEN chain = 'Zora' THEN active_addresses ELSE 0 END) AS zora_active_addresses,
    SUM(CASE WHEN chain = 'OP Mainnet' THEN active_addresses ELSE 0 END) AS op_mainnet_active_addresses,
    SUM(CASE WHEN chain = 'Mint' THEN active_addresses ELSE 0 END) AS mint_active_addresses,
    SUM(CASE WHEN chain = 'Mode' THEN active_addresses ELSE 0 END) AS mode_active_addresses
FROM 
    superchain_calls
GROUP BY 
    block_date
ORDER BY 
    block_date;

Result:

94 rows / 8.63 KB in 11,231 ms

Recommended Chart: Stacked bars

X-axis: date
Y-axis: active_addresses
Differentiation: By blockchain (Zora, OP Mainnet, Mint, Mode)

https://app.tokenflow.live/studio/editor/670df0cb6ae7711da7ef68ef

Early Aggregation Approach

In the optimized queries, I implemented the **early

aggregation** approach, performing aggregation operations such as counts and sums directly in the subqueries for each blockchain before combining the results. This contrasts with the unoptimized approach, where the data from all blockchains was first combined and then the aggregations were performed. This change offers several benefits:

Lower processing load: By counting transactions or active addresses in each subquery, the size of the dataset processed afterward is reduced, minimizing memory and processing load.
Efficient use of indexes: Filtering and aggregating at the source allows databases to make better use of their indexes, speeding up aggregation operations.
Reduction in combined data size: Instead of combining large volumes of raw data, the optimized queries combine only the aggregated results, significantly reducing the size of the final dataset.

General Benefits of the Applied Optimizations

The optimizations not only improved the performance of each query but also made the code cleaner, scalable, and easier to understand. Key benefits include:

Scalability: The queries are designed to be scalable, grouping and filtering only the necessary data, which improves performance in large datasets.
Clarity in visualization: The columns differentiated by blockchain make it easy to generate stacked charts and comparative visualizations of metrics such as transactions, active addresses, deployed contracts, and fees paid.
Reduction in execution time: As observed in the use cases, execution time was drastically reduced in the optimized queries, enabling more complex analyses in less time and freeing up resources for other processes.

Use Case 3: Optimization Exception

In this case, I analyzed the deployed contracts in the various Superchain blockchains. When applying the same optimizations, the improvement in performance was not as significant, making it an interesting exception.

Unoptimized query:

WITH superchain_calls AS (
    SELECT *, 'Zora' AS chain FROM zora.core.calls WHERE call_type IN ('CREATE', 'CREATE2')
    UNION ALL
    SELECT *, 'OP Mainnet' AS chain FROM optimism.core.calls WHERE call_type IN ('CREATE', 'CREATE2')
    UNION ALL
    SELECT *, 'Mint' AS chain FROM mint.core.calls WHERE call_type IN ('CREATE', 'CREATE2')
    UNION ALL
    SELECT *, 'Mode' AS chain FROM mode.core.calls WHERE call_type IN ('CREATE', 'CREATE2')
)

SELECT * FROM (
    SELECT chain,
           block_date AS date,
           COUNT(*) AS "Contracts Deployed"
    FROM superchain_calls
    WHERE date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY chain, block_date
) 
PIVOT (
    MAX("Contracts Deployed") FOR chain IN ('Zora', 'OP Mainnet', 'Mint', 'Mode')
) AS p(date, zora, "OP Mainnet", mint, mode)
ORDER BY date;

Result:

94 rows / 2.39 KB in 30,429 ms

Optimized query:

WITH superchain_calls AS (
    SELECT block_date, COUNT(*) AS contracts_deployed, 'Zora' AS chain
    FROM zora.core.calls
    WHERE call_type IN ('CREATE', 'CREATE2')
    AND block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(*) AS contracts_deployed, 'OP Mainnet' AS chain
    FROM optimism.core.calls
    WHERE call_type IN ('CREATE', 'CREATE2')
    AND block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(*) AS contracts_deployed, 'Mint' AS chain
    FROM mint.core.calls
    WHERE call_type IN ('CREATE', 'CREATE2')
    AND block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date

    UNION ALL

    SELECT block_date, COUNT(*) AS contracts_deployed, 'Mode' AS chain
    FROM mode.core.calls
    WHERE call_type IN ('CREATE', 'CREATE2')
    AND block_date >= DATEADD('month', -{{n_months}}, CURRENT_DATE())
    GROUP BY block_date
)

SELECT 
    block_date AS date, 
    SUM(CASE WHEN chain = 'Zora' THEN contracts_deployed ELSE 0 END) AS zora_contracts_deployed,
    SUM(CASE WHEN chain = 'OP Mainnet' THEN contracts_deployed ELSE 0 END) AS op_mainnet_contracts_deployed,
    SUM(CASE WHEN chain = 'Mint' THEN contracts_deployed ELSE 0 END) AS mint_contracts_deployed,
    SUM(CASE WHEN chain = 'Mode' THEN contracts_deployed ELSE 0 END) AS mode_contracts_deployed
FROM 
    superchain_calls
GROUP BY 
    block_date
ORDER BY 
    block_date;

Result:

94 rows / 9.36 KB in 29,781 ms

https://app.tokenflow.live/studio/editor/670dec7c6ae7711da7ef68ee

Exception Analysis

Despite applying the same optimization techniques, the improvement was minimal (from 30,429 ms to 29,781 ms). Possible reasons:

Data size: The volume of data was significantly smaller, so the optimization did not yield a noticeable improvement in execution time.
Calculation complexity: The counting of deployed contracts (COUNT(*)) was already handled efficiently, and avoiding the use of PIVOT did not provide significant improvement.
Processing transaction types: The filter by call_type IN ('CREATE', 'CREATE2') is straightforward and doesn’t introduce much overhead, limiting the optimization potential.

Final Analysis Conclusion

In this article, I explored the optimizations applied to SQL queries for analyzing key metrics in Superchain blockchains, focusing on daily transactions, active addresses, and deployed contracts. In most cases, the early aggregation approach and other techniques significantly optimized performance.

However, the analysis of deployed contracts presented an exception, reminding us that optimizations won’t always have the same impact across all contexts and types of queries.

Summary of Pillars and Results

Daily Transactions: Execution time reduced from 5,879 ms to 1,038 ms (84% improvement).
Active Addresses: Execution time reduced from 21,207 ms to 11,231 ms (47% improvement).
Deployed Contracts (Exception): Minimal reduction from 30,429 ms to 29,781 ms (2% improvement).

Summary of Applied Optimizations

Selecting specific columns.
Optimized date filtering.
Using SUM with CASE to avoid PIVOT.
Efficient data combination with UNION ALL.

Overall Conclusion

The applied optimizations, such as early aggregation and careful column selection, proved effective in most cases. These improvements not only reduce execution time but also make the queries more readable and maintainable. While there will always be cases where the impact is smaller, this analysis shows how small adjustments can lead to significant performance improvements in environments with large datasets like blockchain analysis.

Wolfcito

Advanced SQL Query Optimization for Superchain Analysis in Tokenflow

Introduction

Explanation and Justification of the Queries

Query Optimization: Use Cases

Use Case 1: Daily Transactions

Use Case 2: Active Addresses

Early Aggregation Approach

General Benefits of the Applied Optimizations

Use Case 3: Optimization Exception

Unoptimized query:

Optimized query:

Exception Analysis

Final Analysis Conclusion

Summary of Pillars and Results

Summary of Applied Optimizations

Overall Conclusion

Wolfcito