<100 subscribers
Share Dialog
The Universal Challenge: The "New Protocol" Learning Curve
Building an LLM-First Pipeline for Move Package Analysis
Case Study: Bluefin Pro Perps Analysis
LLM Pipeline Workflow
Prompt Iteration Highlights
Takeaways & Results
Final Outputs
SQL views and daily metrics
Charts and dashboards
Appendix: Technical References
Condensed Prompt Chat (all with superwhisper)
Final LLM SQL Output at end of convo:
Move Package Catalogue Tasks
Events
Objects
Snowflake Table Data Catalogue
Bluefin Daily Market View Output:
A persistent challenge in crypto data analytics is the sheer diversity of smart contracts. Every protocol ships its own set of contracts that must be fully understood before you can build data pipelines and extract actionable insights.
On Sui alone, there are 95k**+ Move packages** emitting 350,000+ events and nearly 900k objects—each with distinct schema and structure. Even protocols in the same category, such as the CLMM-based DEXs Cetus and Bluefin, can have significant implementation differences. Even when they don’t, the contracts must still be analyzed to confirm parity. This constant variability makes it challenging for analysts to keep pace with the rapid innovation happening on any chain.
I prototyped an LLM-enabled data pipeline to accelerate package analysis by strictly managing a local, context-rich environment. The system centers on two curated data catalogues that provide the model with the critical context it needs. I have attached the snowflake tasks and view in the appendix for additional reference:
Snowflake Data Catalogue – A SQL view extracts metadata for all relevant Snowflake tables. The results are exported as CSV, and a Python script converts each row into a .sql
file, with a strict 1:1 mapping between table and file. This setup gives the LLM complete context for 400+ Snowflake tables, enabling it to selectively load only the subset needed for a given task.
Move Package Catalogue – Snowflake tasks index and store all unique event and object data for a specific package. The output is stored in an LLM-friendly format, giving the model a complete on-chain view for that package. With this holistic view of objects and events, the LLM can detect patterns and relationships more effectively.
This initial design keeps LLM access fully isolated—no direct database reads or writes. It also works well with cost-efficient models; for example, I primarily use Gemini Flash 2.5 (non-max mode) for most tasks.
These foundational tables and views provide exceptional context for the LLM. By comparison, Cursor rules have shown little additional performance benefit. There are some very basic rules that I currently use:
Show sample datasets for core Snowflake tables (events, objects, transactions) and some non-core DeFi tables (coin info, prices). This is somewhat redundant with the Snowflake Data Catalogue now and was a legacy cursor rule.
Include simple operational guidelines such as: write queries efficiently, check Snowflake SQL syntax, and be concise.
With this foundation in place, I tested the pipeline on a real-world case—analyzing Bluefin’s Pro Perps package.
Find relevant package data – Manually locate Bluefin Pro package objects and events from the Move Package Catalogue and export them for LLM use.
Prompt the LLM with context – Load the package data into the LLM and ask it to identify relevant events and generate a query for daily trading volume by perpetual market.
Iterate on results – Review outputs, make targeted corrections, and request additional metrics (funding rate, daily price, daily active users, open interest).
Plot the results and check accuracy
First pass – Query worked, but BTC, ETH, and SOL market decimals were off by 1. Prompted the LLM to adjust.
Subsequent passes – Added average daily funding rate, daily price, DAUs, and open interest.
Challenge – Open interest calculations were consistently wrong; the LLM lacked the right DeFi context. Several prompts failed to fix it until switching from “daily average” to “end-of-day snapshot” calculations.
Resolution – After explicitly flagging the snapshot logic as incorrect, the LLM revised it to calculate open interest correctly.
LLM finds relevant events autonomously – Without any manual intervention, it correctly identified key events from the package-level context.
Complex metrics need iteration – Multiple passes are normal for nuanced DeFi calculations like open interest.
Flagging > explaining – Pointing out “this snapshot is wrong” was more effective than prescribing exact fixes.
Documentation is built-in – Successful conversations leave the LLM with the full context to auto-generate comments for tables, views, and columns.
The end result is an underlying table that makes it effortless to plot something like this:
Take a look at all of these Bluefin Pro package events. They have different event names, they have module names, as well as event JSON information. Now the file is a little messy, but are you able, can you infer which events I should focus on and create a SQL query so that I can get the trading volume broken down by perp market every day? @bbluefin_pro_package_events.csv
The prompt worked on the first try. After initial inspection I noticed that the BTC, ETH, and SOL market decimals were off by 1. I told it to "fix it". Okay, this looks almost correct. It looks like the BTC perp market and the ETH perp market are one decimal off. So like BTC says, for example, 93 million, but I think it's closer to 9.3 million. Same with ETH. Can you please fix that just for BTC and ETH? The rest of the decimals look good, though.
Fix was implemented. Then I realized that Solana market decimals were off too. Also I casually mentioned if it could find the funding rate from the events and calculate the daily average funding rate. I didn't specify the event or the event type structure because LLM has the package context already. Ah, my bad. It looks like Solana, the Soul Perp market also has an extra decimal, so please move it. It says 32 million, but I think it should be closer to 3.2 million. Otherwise it looks good. Additionally, are you able to add a column to get the average funding rate, the average daily funding rate for each market?@bluefin_pro_package_events.csv
I also asked for a daily average price column, a daily active user count, and a daily average open interest as well. These were all added in after a first request. Upon inspecting the query outputs, I notiecd that the open interest was denominated in the native token amounts. So I asked "add usd open interest column too".
The open interest calculation was incorrect. The LLM doesn't have the right DeFi context to properly calculate open interest so I guided it a little. The re-prompt did not fix it properly: Okay, well, so I like the columns, but I'm not sure that the output is correct. How exactly are you calculating the average? Because I think this is a little complex because you have to, you can't just take all of the orders and then take the average. That's going to over count the numbers because you're not a count, you need to properly account for like how many orders are open at every funding interval versus how many get closed and then take the average of just the open orders at every funding interval.
I changed query prompt approach again. Still incorrect. Um, actually instead of doing a daily average, could we just take a daily snapshot instead? Uh, do this, take the snapshot for the end of every day for all the positions that are open.
Third time is the charm! The problem was isolated to the snapshot calculation and pointed out and the LLM was able to revise the snapshot calculations to correctly calculate the open interest. No, this isn't correct still. The numbers for the open interest in USD amounts are still way too high. I don't know what you're doing here. I don't know how to troubleshoot this either, but the numbers are still way too high. So I don't know if the snapshot is being calculated properly. I would guess you're still accounting for positions that are...you haven't properly accounted for positions that are closed, maybe?
create or replace task PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.POPULATE_EVENT_TYPE_SAMPLES_TASK
warehouse=DEFI_WH
schedule='USING CRON 0 * * * * America/New_York'
COMMENT='Populates and updates the EVENT_TYPE_SAMPLES_TB with new unique event types and their sample data, avoiding full scans by only processing recent events and checking for existing event types.'
as BEGIN
CREATE TABLE IF NOT EXISTS PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_TYPE_SAMPLES_TB (
EVENT_TYPE VARCHAR(16777216) PRIMARY KEY,
PACKAGE VARCHAR(16777216) COMMENT 'The package ID part of the event type.',
MODULE VARCHAR(16777216) COMMENT 'The module name part of the event type.',
FIRST_APPEARANCE_TRANSACTION_DIGEST VARCHAR(16777216),
SAMPLE_EVENT_JSON VARIANT,
FIRST_APPEARANCE_TIMESTAMP_MS NUMBER(38,0),
FIRST_APPEARANCE_UTC TIMESTAMP_LTZ(3),
LAST_UPDATED_AT TIMESTAMP_NTZ(9) DEFAULT CURRENT_TIMESTAMP()
)
COMMENT='Auxiliary table storing a single sample transaction and event JSON for each unique event type. Populated by POPULATE_EVENT_TYPE_SAMPLES_TASK.'; INSERT INTO PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_TYPE_SAMPLES_TB (
EVENT_TYPE,
PACKAGE,
MODULE,
FIRST_APPEARANCE_TRANSACTION_DIGEST,
SAMPLE_EVENT_JSON,
FIRST_APPEARANCE_TIMESTAMP_MS,
FIRST_APPEARANCE_UTC
)
SELECT
e.event_type,
SPLIT_PART(e.event_type, '::', 1) AS package,
SPLIT_PART(e.event_type, '::', 2) AS module,
ANY_VALUE(e.transaction_digest) AS first_appearance_transaction_digest,
ANY_VALUE(e.event_json) AS sample_event_json,
MIN(e.timestamp_ms) AS first_appearance_timestamp_ms,
TO_TIMESTAMP_LTZ(MIN(e.timestamp_ms), 3) AS first_appearance_utc
FROM
event_parquet e
WHERE
e.timestamp_ms > COALESCE(
(SELECT MAX(first_appearance_timestamp_ms) FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_TYPE_SAMPLES_TB),
0
)
AND NOT EXISTS (
SELECT 1
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_TYPE_SAMPLES_TB ets
WHERE ets.event_type = e.event_type
)
GROUP BY
e.event_type;
END;
create or replace view PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.DATA_CATALOGUE_VIEW(
TABLE_SCHEMA,
TABLE_NAME,
TABLE_TYPE,
DESCRIPTION,
COLUMN_METADATA_JSON,
COLUMN_COMMENTS,
LAST_ALTERED
) COMMENT='Provides a comprehensive catalogue of tables and views within the PIPELINE_V2_GROOT_SCHEMA, including table-level descriptions and detailed metadata for each column (name, type, and comment).'
as
SELECT
t.TABLE_SCHEMA AS TABLE_SCHEMA, -- Schema name
t.TABLE_NAME AS TABLE_NAME, -- Table name
t.TABLE_TYPE AS TABLE_TYPE, -- Type: BASE TABLE, VIEW, etc.
t.COMMENT AS DESCRIPTION, -- Table-level description
OBJECT_AGG(c.COLUMN_NAME, c.DATA_TYPE::VARIANT) AS COLUMN_METADATA_JSON, -- Column names and types as a JSON object (cast DATA_TYPE to VARIANT)
ARRAY_AGG(c.COMMENT) WITHIN GROUP (ORDER BY c.ORDINAL_POSITION) AS COLUMN_COMMENTS, -- Column-level comments as an array
t.LAST_ALTERED AS LAST_ALTERED -- Last altered time
FROM INFORMATION_SCHEMA.TABLES t
LEFT JOIN INFORMATION_SCHEMA.COLUMNS c
ON t.TABLE_CATALOG = c.TABLE_CATALOG
AND t.TABLE_SCHEMA = c.TABLE_SCHEMA
AND t.TABLE_NAME = c.TABLE_NAME
WHERE t.TABLE_SCHEMA = 'PIPELINE_V2_GROOT_SCHEMA'
GROUP BY t.TABLE_SCHEMA, t.TABLE_NAME, t.TABLE_TYPE, t.COMMENT, t.LAST_ALTERED;
CREATE OR REPLACE VIEW PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.BLUEFIN_PRO_DAILY_MARKET_METRICS_V
COMMENT = 'Provides daily aggregated metrics for Bluefin Pro perpetual markets. It includes daily trading volume (USD), volume-weighted average price (USD), daily active users (DAU), end-of-day (EOD) open interest in both native token and USD, and the average daily funding rate. Data is sourced from Bluefin Pro package events (0xe744...). NOTE: A manual decimal correction (+1) is applied to BTC-PERP, ETH-PERP, and SOL-PERP markets to correct an observed 10x scaling discrepancy in the raw on-chain data for accurate calculations.'
AS
WITH
-- ============================================================================
-- STEP 1: Get the latest on-chain metadata for each market (e.g., decimals)
-- ============================================================================
market_metadata AS (
SELECT
event_json:perpetual:symbol::string as market,
-- Derive decimal count from on-chain value (e.g., 8000000000 -> 8)
event_json:perpetual:base_asset_decimals::number / 1e9 as base_asset_decimals
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_PARQUET
WHERE event_type IN (
'0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::PerpetualUpdate',
'0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::PerpetualSynced'
)
AND market IS NOT NULL
AND base_asset_decimals IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY market ORDER BY timestamp_ms DESC) = 1
),
-- ============================================================================
-- STEP 2: Calculate daily trading volume, average price, and active users
-- ============================================================================
trade_details AS (
SELECT
t.TIMESTAMP_MS,
t.EVENT_JSON:market::string as market,
t.EVENT_JSON:fill_price::number as fill_price,
t.EVENT_JSON:fill_quantity::number as fill_quantity,
t.EVENT_JSON:maker:address::string as maker_address,
t.EVENT_JSON:taker:address::string as taker_address,
-- MANUAL CORRECTION: Add 1 to decimals for specific markets due to
-- an observed 10x discrepancy in their `fill_quantity` scaling.
CASE
WHEN m.market IN ('BTC-PERP', 'ETH-PERP', 'SOL-PERP') THEN m.base_asset_decimals + 1
ELSE m.base_asset_decimals
END as base_asset_decimals
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_PARQUET t
JOIN market_metadata m ON t.EVENT_JSON:market::string = m.market
WHERE t.EVENT_TYPE = '0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::TradeExecuted'
),
trade_participants AS (
SELECT TIMESTAMP_MS, market, maker_address as user_address FROM trade_details
UNION
SELECT TIMESTAMP_MS, market, taker_address as user_address FROM trade_details
),
daily_active_users AS (
SELECT
DATE(TO_TIMESTAMP(timestamp_ms/1000)) as trade_date,
market,
COUNT(DISTINCT user_address) as dau
FROM trade_participants
GROUP BY 1,2
),
daily_trade_metrics AS (
SELECT
DATE(TO_TIMESTAMP(timestamp_ms / 1000)) as trade_date,
market,
SUM((fill_price / 1e9) * (fill_quantity / POW(10, base_asset_decimals))) as daily_volume_usd,
SUM(fill_price * fill_quantity) / SUM(fill_quantity) / 1e9 as daily_average_price
FROM trade_details
GROUP BY 1, 2
),
-- ============================================================================
-- STEP 3: Calculate average daily funding rate from 'FundingRateUpdated' events
-- ============================================================================
daily_funding_rates AS (
SELECT
DATE(TO_TIMESTAMP(timestamp_ms / 1000)) as rate_date,
event_json:market::string as market,
AVG((CASE WHEN event_json:rate:sign::boolean THEN 1 ELSE -1 END * event_json:rate:value::number) / 1e9) as avg_daily_funding_rate
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_PARQUET
WHERE event_type = '0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::FundingRateUpdated'
GROUP BY 1, 2
),
-- ============================================================================
-- STEP 4: Get End-of-Day Open Interest Snapshot
-- ============================================================================
open_interest_snapshots AS (
SELECT
t.timestamp_ms,
t.event_json:position:perpetual::string as market,
-- MANUAL CORRECTION: Apply the same decimal adjustment here for `position:size`
SUM(t.event_json:position:size::number / POW(10,
CASE
WHEN m.market IN ('BTC-PERP', 'ETH-PERP', 'SOL-PERP') THEN m.base_asset_decimals + 1
ELSE m.base_asset_decimals
END
)) as oi_snapshot_native
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_PARQUET t
JOIN market_metadata m ON t.event_json:position:perpetual::string = m.market
WHERE t.event_type = '0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::FundingRateApplied'
AND t.event_json:position:is_long::boolean = true
GROUP BY 1, 2
),
eod_open_interest AS (
SELECT
DATE(TO_TIMESTAMP(timestamp_ms/1000)) as oi_date,
market,
oi_snapshot_native as eod_open_interest_native
FROM open_interest_snapshots
QUALIFY ROW_NUMBER() OVER (PARTITION BY oi_date, market ORDER BY timestamp_ms DESC) = 1
)
-- ============================================================================
-- FINAL STEP: Join all the daily metrics together
-- ============================================================================
SELECT
m.trade_date,
m.market,
m.daily_volume_usd,
m.daily_average_price,
d.dau AS daily_active_users,
oi.eod_open_interest_native,
oi.eod_open_interest_native * m.daily_average_price as eod_open_interest_usd,
f.avg_daily_funding_rate
FROM daily_trade_metrics m
LEFT JOIN daily_active_users d ON m.trade_date = d.trade_date AND m.market = d.market
LEFT JOIN eod_open_interest oi ON m.trade_date = oi.oi_date AND m.market = oi.market
LEFT JOIN daily_funding_rates f ON m.trade_date = f.rate_date AND m.market = f.market;
-- Create a temporary table from the view for immediate dashboarding use
CREATE OR REPLACE TEMPORARY TABLE TMP_BLUEFIN_PRO_METRICS AS
SELECT * FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.BLUEFIN_PRO_DAILY_MARKET_METRICS_V;
create or replace task PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.POPULATE_OBJECT_TYPE_SAMPLES_TASK
warehouse=DEFI_WH
schedule='USING CRON 0 * * * * America/New_York'
COMMENT='Populates and updates the OBJECT_TYPE_SAMPLES_TB with new unique object types and their sample data, avoiding full scans by only processing recently created objects and checking for existing object types.'
as BEGIN
-- Create the OBJECT_TYPE_SAMPLES_TB table if it does not already exist.
-- This table stores a single sample object JSON and metadata for each unique object type encountered on-chain.
CREATE TABLE IF NOT EXISTS PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.OBJECT_TYPE_SAMPLES_TB (
OBJECT_TYPE VARCHAR(16777216) PRIMARY KEY COMMENT 'The full type string of the object.',
PACKAGE VARCHAR(16777216) COMMENT 'The package ID part of the object type (e.g., "0x2").',
MODULE VARCHAR(16777216) COMMENT 'The module name part of the object type (e.g., "coin").',
TYPE VARCHAR(16777216) COMMENT 'The specific type name within the module (e.g., "CoinMetadata").',
FIRST_APPEARANCE_OBJECT_ID VARCHAR(16777216) COMMENT 'The object ID of the first created instance of this type.',
SAMPLE_OBJECT_JSON VARIANT COMMENT 'A sample JSON representation of the object data at its first appearance.',
FIRST_APPEARANCE_PREVIOUS_TRANSACTION VARCHAR(16777216) COMMENT 'The transaction digest of the transaction that created the first instance of this object type.',
FIRST_APPEARANCE_TIMESTAMP_MS NUMBER(38,0) COMMENT 'The timestamp in milliseconds of the first creation event for this object type.',
FIRST_APPEARANCE_UTC TIMESTAMP_LTZ(9) COMMENT 'The UTC timestamp of the first creation event for this object type, converted for easier readability.',
LAST_UPDATED_AT TIMESTAMP_NTZ(9) DEFAULT CURRENT_TIMESTAMP() COMMENT 'Timestamp when this record was last updated or inserted.'
)
COMMENT='Auxiliary table storing a single sample object JSON and metadata for each unique object type. Populated by POPULATE_OBJECT_TYPE_SAMPLES_TASK.';
-- Insert new unique object types into the samples table.
-- This query is designed to be incremental, only processing objects created since the last successful run
-- and adding only those object types that are not yet present in the `OBJECT_TYPE_SAMPLES_TB`.
INSERT INTO PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.OBJECT_TYPE_SAMPLES_TB (
object_type,
package,
module,
type,
first_appearance_object_id,
sample_object_json,
first_appearance_previous_transaction,
first_appearance_timestamp_ms,
first_appearance_utc
)
SELECT
o.type AS object_type,
-- Extract the package part from the object_type (e.g., '0x2::coin::CoinMetadata' -> '0x2')
SPLIT_PART(o.type, '::', 1) AS package,
-- Extract the module part from the object_type (e.g., '0x2::coin::CoinMetadata' -> 'coin')
SPLIT_PART(o.type, '::', 2) AS module,
-- Extract the specific type name part from the object_type (e.g., '0x2::coin::CoinMetadata' -> 'CoinMetadata')
REGEXP_SUBSTR(o.type, '^[^:]*::[^:]*::(.*)', 1, 1, 'e', 1) AS type,
-- Select any object_id for the first appearance of this type (since we group by type)
ANY_VALUE(o.object_id) AS first_appearance_object_id,
-- Select a sample JSON for the first appearance of this type
ANY_VALUE(o.object_json) AS sample_object_json,
-- Select the transaction digest of the first transaction that created this object type
ANY_VALUE(o.previous_transaction) AS first_appearance_previous_transaction,
-- Find the minimum timestamp to identify the very first creation event for each object type
MIN(o.timestamp_ms) AS first_appearance_timestamp_ms,
-- Convert the minimum timestamp to a readable UTC format
TO_TIMESTAMP_LTZ(MIN(o.timestamp_ms), 3) AS first_appearance_utc
FROM
PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.object_parquet2 o
WHERE
-- Filter for objects that were 'Created', as we are tracking their first appearance
o.object_status = 'Created'
-- Optimizes the scan by only looking at data newer than the latest entry in the samples table.
-- COALESCE with 0 ensures it processes all data from the beginning on the very first run.
AND o.timestamp_ms > COALESCE(
(SELECT MAX(first_appearance_timestamp_ms) FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.OBJECT_TYPE_SAMPLES_TB),
0
)
-- Ensures that only truly new object types (not already in the samples table) are inserted.
AND NOT EXISTS (
SELECT 1
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.OBJECT_TYPE_SAMPLES_TB existing_ots
WHERE existing_ots.object_type = o.type
)
GROUP BY
o.type; -- Group by object type to ensure one entry per unique type, capturing its first observed creation.
END;
create or replace view PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.BLUEFIN_PRO_DAILY_MARKET_METRICS_V(
TRADE_DATE,
MARKET,
DAILY_VOLUME_USD,
DAILY_AVERAGE_PRICE,
DAILY_ACTIVE_USERS,
EOD_OPEN_INTEREST_NATIVE,
EOD_OPEN_INTEREST_USD,
AVG_DAILY_FUNDING_RATE
) COMMENT='Provides daily aggregated metrics for Bluefin Pro perpetual markets. It includes daily trading volume (USD), volume-weighted average price (USD), daily active users (DAU), end-of-day (EOD) open interest in both native token and USD, and the average daily funding rate. Data is sourced from Bluefin Pro package events (0xe744...). NOTE: A manual decimal correction (+1) is applied to BTC-PERP, ETH-PERP, and SOL-PERP markets to correct an observed 10x scaling discrepancy in the raw on-chain data for accurate calculations.'
as
WITH
-- ============================================================================
-- STEP 1: Get the latest on-chain metadata for each market (e.g., decimals)
-- ============================================================================
market_metadata AS (
SELECT
event_json:perpetual:symbol::string as market,
-- Derive decimal count from on-chain value (e.g., 8000000000 -> 8)
event_json:perpetual:base_asset_decimals::number / 1e9 as base_asset_decimals
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_PARQUET
WHERE event_type IN (
'0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::PerpetualUpdate',
'0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::PerpetualSynced'
)
AND market IS NOT NULL
AND base_asset_decimals IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY market ORDER BY timestamp_ms DESC) = 1
),
-- ============================================================================
-- STEP 2: Calculate daily trading volume, average price, and active users
-- ============================================================================
trade_details AS (
SELECT
t.TIMESTAMP_MS,
t.EVENT_JSON:market::string as market,
t.EVENT_JSON:fill_price::number as fill_price,
t.EVENT_JSON:fill_quantity::number as fill_quantity,
t.EVENT_JSON:maker:address::string as maker_address,
t.EVENT_JSON:taker:address::string as taker_address,
-- MANUAL CORRECTION: Add 1 to decimals for specific markets due to
-- an observed 10x discrepancy in their `fill_quantity` scaling.
CASE
WHEN m.market IN ('BTC-PERP', 'ETH-PERP', 'SOL-PERP') THEN m.base_asset_decimals + 1
ELSE m.base_asset_decimals
END as base_asset_decimals
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_PARQUET t
JOIN market_metadata m ON t.EVENT_JSON:market::string = m.market
WHERE t.EVENT_TYPE = '0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::TradeExecuted'
),
trade_participants AS (
SELECT TIMESTAMP_MS, market, maker_address as user_address FROM trade_details
UNION
SELECT TIMESTAMP_MS, market, taker_address as user_address FROM trade_details
),
daily_active_users AS (
SELECT
DATE(TO_TIMESTAMP(timestamp_ms/1000)) as trade_date,
market,
COUNT(DISTINCT user_address) as dau
FROM trade_participants
GROUP BY 1,2
),
daily_trade_metrics AS (
SELECT
DATE(TO_TIMESTAMP(timestamp_ms / 1000)) as trade_date,
market,
SUM((fill_price / 1e9) * (fill_quantity / POW(10, base_asset_decimals))) as daily_volume_usd,
SUM(fill_price * fill_quantity) / SUM(fill_quantity) / 1e9 as daily_average_price
FROM trade_details
GROUP BY 1, 2
),
-- ============================================================================
-- STEP 3: Calculate average daily funding rate from 'FundingRateUpdated' events
-- ============================================================================
daily_funding_rates AS (
SELECT
DATE(TO_TIMESTAMP(timestamp_ms / 1000)) as rate_date,
event_json:market::string as market,
AVG((CASE WHEN event_json:rate:sign::boolean THEN 1 ELSE -1 END * event_json:rate:value::number) / 1e9) as avg_daily_funding_rate
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_PARQUET
WHERE event_type = '0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::FundingRateUpdated'
GROUP BY 1, 2
),
-- ============================================================================
-- STEP 4: Get End-of-Day Open Interest Snapshot
-- ============================================================================
open_interest_snapshots AS (
SELECT
t.timestamp_ms,
t.event_json:position:perpetual::string as market,
-- MANUAL CORRECTION: Apply the same decimal adjustment here for `position:size`
SUM(t.event_json:position:size::number / POW(10,
CASE
WHEN m.market IN ('BTC-PERP', 'ETH-PERP', 'SOL-PERP') THEN m.base_asset_decimals + 1
ELSE m.base_asset_decimals
END
)) as oi_snapshot_native
FROM PIPELINE_V2_GROOT_DB.PIPELINE_V2_GROOT_SCHEMA.EVENT_PARQUET t
JOIN market_metadata m ON t.event_json:position:perpetual::string = m.market
WHERE t.event_type = '0xe74481697f432ddee8dd6f9bd13b9d0297a5b63d55f3db25c4d3b5d34dad85b7::events::FundingRateApplied'
AND t.event_json:position:is_long::boolean = true
GROUP BY 1, 2
),
eod_open_interest AS (
SELECT
DATE(TO_TIMESTAMP(timestamp_ms/1000)) as oi_date,
market,
oi_snapshot_native as eod_open_interest_native
FROM open_interest_snapshots
QUALIFY ROW_NUMBER() OVER (PARTITION BY oi_date, market ORDER BY timestamp_ms DESC) = 1
)
-- ============================================================================
-- FINAL STEP: Join all the daily metrics together
-- ============================================================================
SELECT
m.trade_date,
m.market,
m.daily_volume_usd,
m.daily_average_price,
d.dau AS daily_active_users,
oi.eod_open_interest_native,
oi.eod_open_interest_native * m.daily_average_price as eod_open_interest_usd,
f.avg_daily_funding_rate
FROM daily_trade_metrics m
LEFT JOIN daily_active_users d ON m.trade_date = d.trade_date AND m.market = d.market
LEFT JOIN eod_open_interest oi ON m.trade_date = oi.oi_date AND m.market = oi.market
LEFT JOIN daily_funding_rates f ON m.trade_date = f.rate_date AND m.market = f.market;