Cohort tables show us, for a given cohort, the month in which a user interacted with a protocol and the percentage of users who continued to interact in subsequent periods.
Here's an example:
The previous example shows an interesting fact: before the SCR airdrop event on this chain in October 2024, the retention rate was high, but after this event the share of users who came later dropped significantly, indicating 'farming' activity.
In this post, I’ll show you how to dissect a query to build these tables on Dune Analytics, which you can easily adapt to your own needs. The goal of this tutorial is to create a query that groups new users into monthly cohorts and measures how many return over time—also known as MoM retention—specifically for Scroll, using Dune.
If you just want to see the query go to the end of the article.
For this we need to create 4 Common Table Expressions:
user_cohorts
: Finds every unique address that interacted with the protocol and the first time they interacted.
following_months
: Finds the addresses from the original cohort that transacted in the months following.
cohort_size
: Counts the number of addresses in the original cohort
retention_table
: Counts the number of addresses from the original cohort in the months following.
Our starting point is to gather addresses that had sent a transaction on scroll, se we need to use the scroll.transactions table, where we use the columns "from" which indicates the address of the user sending a transaction and "block_date", the date of transactions.
WITH user_cohorts AS (
SELECT "from" as address,
date_format(min(date_trunc('month', block_date)), '%Y-%m') as cohort_month_text,
min(date_trunc('month', block_date)) as cohort_month_date
FROM apechain.transactions
WHERE success = true
GROUP BY 1
)
What this does:
Groups all transactions by sender address
Finds the earliest month each address transacted
Creates both a formatted text version (2024-08
) and date version for calculations (this is to use the pivot table element of dune at the end)
Now you have a long list of addresses and the first month they sent a transaction to Scroll.
Purpose: Identify which addresses from the original cohort are returning each month and calculate how many months have passed since their first transaction.
What you need to identify are the unique addresses from the original cohort that are returning each month. Therefore, join the user_cohorts
table (first CTE with the cohort addresses) with the transactions table on the address column. This join will enable you to find the number of addresses matching the unique addresses in the original cohort.
following_months AS (
SELECT tx."from" as address,
lpad(cast(date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date)) as varchar), 2, '0') as month_number
FROM scroll.transactions tx
LEFT JOIN user_cohorts uc ON tx."from" = uc.address
WHERE tx.success = true
GROUP BY 1, 2
)
Now you need to find how many months have passed since the original cohort. The original cohort month is represented as cohort_month_date
in the user_cohorts
table. Use the function date_diff()
to find the difference between two dates, this function needs to be passed three arguments labeled unit, startdate and enddate:
date_diff('unit', startdate, enddate)
These 3 mandatory arguments combine into the SELECT statement:
unit: 'month'
startdate: uc.cohort_month_date
- The date of the cohort
enddate: date_trunc('month', tx.block_date)
- Each month between the cohort date and the current transaction date
date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date))
What this does:
Joins every transaction back to the user cohort data
Calculates months elapsed since first transaction using date_diff()
Uses lpad()
to format month numbers with leading zeros (01, 02, ... 10, 11, 12) for proper pivot table sorting
Interpreting the results: Every address will show up at least once with 0
as month_number
(month 0 represents the month they first transacted). If the address has 1
in the month_number
column, then that address returned the next month. In the example above:
Address 0xffffff21a685e8cfdd04576ae2c018858edf0d7a
only transacted in their first month
Address 0xffffff044f0734354d308bb120b6916bf85a797a
returned in months 1, 2, 3, 4, and 7 after they first transacted
The next CTE - cohort_size
- is designed to count the number of addresses in the original cohort - so it’s extremely easy. The following query is counting every entry in user_cohorts
table by cohortMonth
.
cohort_size AS (
SELECT uc.cohort_month_text as cohort_month,
count(*) as new_users
FROM user_cohorts uc
GROUP BY 1
)
The fourth and final CTE - retention_table
- aims to count how many addresses from the initial cohort transact in the months to follow. This table will show the number of users from the original cohort x number of months after they first joined.
There are two tables that you need to query FROM
:
following_months
is needed to count the number of addresses from the original cohort that transact in the months following
user_cohorts
is needed to match addresses from the original cohort with ones in the months following
Now you can LEFT JOIN
the following_months
table with the user_cohorts
table by the addresses:
retention_table AS (
SELECT c.cohort_month_text as cohort_month,
fm.month_number,
count(*) as retained_users
FROM following_months fm
LEFT JOIN user_cohorts c ON fm.address = c.address
GROUP BY 1, 2
)
As you can see, the retention_table
returns the number of users from the original cohort x number of months after.
MoM retention doesn't require individual addresses so the FROM
statement shouldn't include user_cohorts
and following_months
- these tables were used in the other CTEs.
To calculate retention you need to query the user counts from the retention_table
and cohort_size
tables. These two tables can be joined on the cohort month:
SELECT *
FROM retention_table r
LEFT JOIN cohort_size s
ON r.cohort_month = s.cohort_month
Now you need to have four columns to create a retention table. The SELECT statement should show:
The initial cohort date - r.cohort_month
The initial cohort size - s.new_users
The number of months since the initial cohort month - r.month_number
The percentage of the original cohort that remains x months (r.month_number
) after the initial cohort (r.cohort_month
). This can be found by taking the number of users that remain (r.retained_users
) and dividing it by the size of the initial cohort (s.new_users
).
Retention rate = r.retained_users / s.new_users
The final SELECT statement looks like this:
SELECT r.cohort_month,
s.new_users,
r.month_number,
cast(r.retained_users as double) / cast(s.new_users as double) as retention_rate
FROM retention_table r
LEFT JOIN cohort_size s ON r.cohort_month = s.cohort_month
WHERE r.month_number != '00' -- Exclude month 0 (first transaction month)
ORDER BY r.cohort_month, r.month_number
Interpreting the results: From the October 2023 cohort of 268,396 new users:
54.49% returned in October 24 (month 12) (Airdrop month)
21.02% returned in December 24 (month 13)
14.23% returned in January 25 (month 14)
So the final query looks like:
WITH user_cohorts AS (
SELECT "from" as address,
date_format(min(date_trunc('month', block_date)), '%Y-%m') as cohort_month_text,
min(date_trunc('month', block_date)) as cohort_month_date
FROM scroll.transactions
WHERE success = true
GROUP BY 1
),
following_months AS (
SELECT tx."from" as address,
lpad(cast(date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date)) as varchar), 2, '0') as month_number
FROM scroll.transactions tx
LEFT JOIN user_cohorts uc ON tx."from" = uc.address
WHERE tx.success = true
GROUP BY 1, 2
),
cohort_size AS (
SELECT uc.cohort_month_text as cohort_month,
count(*) as new_users
FROM user_cohorts uc
GROUP BY 1
),
retention_table AS (
SELECT c.cohort_month_text as cohort_month,
fm.month_number,
count(*) as retained_users
FROM following_months fm
LEFT JOIN user_cohorts c ON fm.address = c.address
GROUP BY 1, 2
)
SELECT r.cohort_month,
s.new_users,
r.month_number,
cast(r.retained_users as double) / cast(s.new_users as double) as retention_rate
FROM retention_table r
LEFT JOIN cohort_size s ON r.cohort_month = s.cohort_month
WHERE r.month_number != '00'
ORDER BY r.cohort_month, r.month_number
Month Number Formatting: Use lpad()
with zero-padding to ensure proper sorting in pivot tables
Date Formatting: Use date_format()
to convert dates to strings for better pivot table compatibility
Casting: Always cast retention calculations as double
for decimal precision
Filtering: Exclude month 0 (WHERE month_number != '00'
) since 100% retention in first month is guaranteed
Rows: cohort_month
Columns: month_number
Values: retention_rate
Format: Use percentage format for retention rates (0.00%)
Share Dialog
Wild Tales
<100 subscribers