A Cohort Retention Query

Cohort tables show us, for a given cohort, the month in which a user interacted with a protocol and the percentage of users who continued to interact in subsequent periods.

Here's an example:

The previous example shows an interesting fact: before the SCR airdrop event on this chain in October 2024, the retention rate was high, but after this event the share of users who came later dropped significantly, indicating 'farming' activity.

In this post, I’ll show you how to dissect a query to build these tables on Dune Analytics, which you can easily adapt to your own needs. The goal of this tutorial is to create a query that groups new users into monthly cohorts and measures how many return over time—also known as MoM retention—specifically for Scroll, using Dune.

If you just want to see the query go to the end of the article.

For this we need to create 4 Common Table Expressions:

user_cohorts: Finds every unique address that interacted with the protocol and the first time they interacted.
following_months: Finds the addresses from the original cohort that transacted in the months following.
cohort_size: Counts the number of addresses in the original cohort
retention_table: Counts the number of addresses from the original cohort in the months following.

Step 1: User Cohorts

Our starting point is to gather addresses that had sent a transaction on scroll, se we need to use the scroll.transactions table, where we use the columns "from" which indicates the address of the user sending a transaction and "block_date", the date of transactions.

WITH user_cohorts AS (
    SELECT "from" as address,
           date_format(min(date_trunc('month', block_date)), '%Y-%m') as cohort_month_text,
           min(date_trunc('month', block_date)) as cohort_month_date
    FROM apechain.transactions
    WHERE success = true
    GROUP BY 1
)

What this does:

Groups all transactions by sender address
Finds the earliest month each address transacted
Creates both a formatted text version (2024-08) and date version for calculations (this is to use the pivot table element of dune at the end)

Now you have a long list of addresses and the first month they sent a transaction to Scroll.

Step 2: Following Months

Purpose: Identify which addresses from the original cohort are returning each month and calculate how many months have passed since their first transaction.

What you need to identify are the unique addresses from the original cohort that are returning each month. Therefore, join the user_cohorts table (first CTE with the cohort addresses) with the transactions table on the address column. This join will enable you to find the number of addresses matching the unique addresses in the original cohort.

following_months AS (
    SELECT tx."from" as address,
           lpad(cast(date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date)) as varchar), 2, '0') as month_number
    FROM scroll.transactions tx
    LEFT JOIN user_cohorts uc ON tx."from" = uc.address
    WHERE tx.success = true
    GROUP BY 1, 2
)

Now you need to find how many months have passed since the original cohort. The original cohort month is represented as cohort_month_date in the user_cohorts table. Use the function date_diff() to find the difference between two dates, this function needs to be passed three arguments labeled unit, startdate and enddate:

date_diff('unit', startdate, enddate)

These 3 mandatory arguments combine into the SELECT statement:

unit: 'month'
startdate: uc.cohort_month_date - The date of the cohort
enddate: date_trunc('month', tx.block_date) - Each month between the cohort date and the current transaction date

date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date))

What this does:

Joins every transaction back to the user cohort data
Calculates months elapsed since first transaction using date_diff()
Uses lpad() to format month numbers with leading zeros (01, 02, ... 10, 11, 12) for proper pivot table sorting

Interpreting the results: Every address will show up at least once with 0 as month_number (month 0 represents the month they first transacted). If the address has 1 in the month_number column, then that address returned the next month. In the example above:

Address 0xffffff21a685e8cfdd04576ae2c018858edf0d7a only transacted in their first month
Address 0xffffff044f0734354d308bb120b6916bf85a797a returned in months 1, 2, 3, 4, and 7 after they first transacted

Step 3: Cohort Size

The next CTE - cohort_size - is designed to count the number of addresses in the original cohort - so it’s extremely easy. The following query is counting every entry in user_cohorts table by cohortMonth.

cohort_size AS (
    SELECT uc.cohort_month_text as cohort_month,
           count(*) as new_users
    FROM user_cohorts uc
    GROUP BY 1
)

Step 4: Retention Table

The fourth and final CTE - retention_table - aims to count how many addresses from the initial cohort transact in the months to follow. This table will show the number of users from the original cohort x number of months after they first joined.

There are two tables that you need to query FROM:

following_months is needed to count the number of addresses from the original cohort that transact in the months following
user_cohorts is needed to match addresses from the original cohort with ones in the months following

Now you can LEFT JOIN the following_months table with the user_cohorts table by the addresses:

retention_table AS (
    SELECT c.cohort_month_text as cohort_month,
           fm.month_number,
           count(*) as retained_users
    FROM following_months fm
    LEFT JOIN user_cohorts c ON fm.address = c.address
    GROUP BY 1, 2
)

As you can see, the retention_table returns the number of users from the original cohort x number of months after.

Step 5: Retention Rate

MoM retention doesn't require individual addresses so the FROM statement shouldn't include user_cohorts and following_months - these tables were used in the other CTEs.

To calculate retention you need to query the user counts from the retention_table and cohort_size tables. These two tables can be joined on the cohort month:

SELECT *
FROM retention_table r
LEFT JOIN cohort_size s 
    ON r.cohort_month = s.cohort_month

Now you need to have four columns to create a retention table. The SELECT statement should show:

The initial cohort date - r.cohort_month
The initial cohort size - s.new_users
The number of months since the initial cohort month - r.month_number
The percentage of the original cohort that remains x months (r.month_number) after the initial cohort (r.cohort_month). This can be found by taking the number of users that remain (r.retained_users) and dividing it by the size of the initial cohort (s.new_users).

Retention rate = r.retained_users / s.new_users

The final SELECT statement looks like this:

SELECT r.cohort_month,
       s.new_users,
       r.month_number,
       cast(r.retained_users as double) / cast(s.new_users as double) as retention_rate
FROM retention_table r
LEFT JOIN cohort_size s ON r.cohort_month = s.cohort_month
WHERE r.month_number != '00'  -- Exclude month 0 (first transaction month)
ORDER BY r.cohort_month, r.month_number

Interpreting the results: From the October 2023 cohort of 268,396 new users:

54.49% returned in October 24 (month 12) (Airdrop month)
21.02% returned in December 24 (month 13)
14.23% returned in January 25 (month 14)

Final query:

So the final query looks like:

WITH user_cohorts AS (
    SELECT "from" as address,
           date_format(min(date_trunc('month', block_date)), '%Y-%m') as cohort_month_text,
           min(date_trunc('month', block_date)) as cohort_month_date
    FROM scroll.transactions
    WHERE success = true
    GROUP BY 1
),
following_months AS (
    SELECT tx."from" as address,
           lpad(cast(date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date)) as varchar), 2, '0') as month_number
    FROM scroll.transactions tx
    LEFT JOIN user_cohorts uc ON tx."from" = uc.address
    WHERE tx.success = true
    GROUP BY 1, 2
),
cohort_size AS (
    SELECT uc.cohort_month_text as cohort_month,
           count(*) as new_users
    FROM user_cohorts uc
    GROUP BY 1
),
retention_table AS (
    SELECT c.cohort_month_text as cohort_month,
           fm.month_number,
           count(*) as retained_users
    FROM following_months fm
    LEFT JOIN user_cohorts c ON fm.address = c.address
    GROUP BY 1, 2
)
SELECT r.cohort_month,
       s.new_users,
       r.month_number,
       cast(r.retained_users as double) / cast(s.new_users as double) as retention_rate
FROM retention_table r
LEFT JOIN cohort_size s ON r.cohort_month = s.cohort_month
WHERE r.month_number != '00'
ORDER BY r.cohort_month, r.month_number

Key Technical Notes for Dune

Month Number Formatting: Use lpad() with zero-padding to ensure proper sorting in pivot tables
Date Formatting: Use date_format() to convert dates to strings for better pivot table compatibility
Casting: Always cast retention calculations as double for decimal precision
Filtering: Exclude month 0 (WHERE month_number != '00') since 100% retention in first month is guaranteed

Visualization of the Pivot Table

Rows: cohort_month
Columns: month_number
Values: retention_rate
Format: Use percentage format for retention rates (0.00%)

Cohort tables show us, for a given cohort, the month in which a user interacted with a protocol and the percentage of users who continued to interact in subsequent periods.

Here's an example:

If you just want to see the query go to the end of the article.

For this we need to create 4 Common Table Expressions:

user_cohorts: Finds every unique address that interacted with the protocol and the first time they interacted.
following_months: Finds the addresses from the original cohort that transacted in the months following.
cohort_size: Counts the number of addresses in the original cohort
retention_table: Counts the number of addresses from the original cohort in the months following.

Step 1: User Cohorts

WITH user_cohorts AS (
    SELECT "from" as address,
           date_format(min(date_trunc('month', block_date)), '%Y-%m') as cohort_month_text,
           min(date_trunc('month', block_date)) as cohort_month_date
    FROM apechain.transactions
    WHERE success = true
    GROUP BY 1
)

What this does:

Groups all transactions by sender address
Finds the earliest month each address transacted
Creates both a formatted text version (2024-08) and date version for calculations (this is to use the pivot table element of dune at the end)

Now you have a long list of addresses and the first month they sent a transaction to Scroll.

Step 2: Following Months

Purpose: Identify which addresses from the original cohort are returning each month and calculate how many months have passed since their first transaction.

following_months AS (
    SELECT tx."from" as address,
           lpad(cast(date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date)) as varchar), 2, '0') as month_number
    FROM scroll.transactions tx
    LEFT JOIN user_cohorts uc ON tx."from" = uc.address
    WHERE tx.success = true
    GROUP BY 1, 2
)

date_diff('unit', startdate, enddate)

These 3 mandatory arguments combine into the SELECT statement:

unit: 'month'
startdate: uc.cohort_month_date - The date of the cohort
enddate: date_trunc('month', tx.block_date) - Each month between the cohort date and the current transaction date

date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date))

What this does:

Joins every transaction back to the user cohort data
Calculates months elapsed since first transaction using date_diff()
Uses lpad() to format month numbers with leading zeros (01, 02, ... 10, 11, 12) for proper pivot table sorting

Address 0xffffff21a685e8cfdd04576ae2c018858edf0d7a only transacted in their first month
Address 0xffffff044f0734354d308bb120b6916bf85a797a returned in months 1, 2, 3, 4, and 7 after they first transacted

Step 3: Cohort Size

cohort_size AS (
    SELECT uc.cohort_month_text as cohort_month,
           count(*) as new_users
    FROM user_cohorts uc
    GROUP BY 1
)

Step 4: Retention Table

There are two tables that you need to query FROM:

following_months is needed to count the number of addresses from the original cohort that transact in the months following
user_cohorts is needed to match addresses from the original cohort with ones in the months following

Now you can LEFT JOIN the following_months table with the user_cohorts table by the addresses:

retention_table AS (
    SELECT c.cohort_month_text as cohort_month,
           fm.month_number,
           count(*) as retained_users
    FROM following_months fm
    LEFT JOIN user_cohorts c ON fm.address = c.address
    GROUP BY 1, 2
)

As you can see, the retention_table returns the number of users from the original cohort x number of months after.

Step 5: Retention Rate

MoM retention doesn't require individual addresses so the FROM statement shouldn't include user_cohorts and following_months - these tables were used in the other CTEs.

To calculate retention you need to query the user counts from the retention_table and cohort_size tables. These two tables can be joined on the cohort month:

SELECT *
FROM retention_table r
LEFT JOIN cohort_size s 
    ON r.cohort_month = s.cohort_month

Now you need to have four columns to create a retention table. The SELECT statement should show:

The initial cohort date - r.cohort_month
The initial cohort size - s.new_users
The number of months since the initial cohort month - r.month_number
The percentage of the original cohort that remains x months (r.month_number) after the initial cohort (r.cohort_month). This can be found by taking the number of users that remain (r.retained_users) and dividing it by the size of the initial cohort (s.new_users).

Retention rate = r.retained_users / s.new_users

The final SELECT statement looks like this:

SELECT r.cohort_month,
       s.new_users,
       r.month_number,
       cast(r.retained_users as double) / cast(s.new_users as double) as retention_rate
FROM retention_table r
LEFT JOIN cohort_size s ON r.cohort_month = s.cohort_month
WHERE r.month_number != '00'  -- Exclude month 0 (first transaction month)
ORDER BY r.cohort_month, r.month_number

Interpreting the results: From the October 2023 cohort of 268,396 new users:

54.49% returned in October 24 (month 12) (Airdrop month)
21.02% returned in December 24 (month 13)
14.23% returned in January 25 (month 14)

Final query:

So the final query looks like:

WITH user_cohorts AS (
    SELECT "from" as address,
           date_format(min(date_trunc('month', block_date)), '%Y-%m') as cohort_month_text,
           min(date_trunc('month', block_date)) as cohort_month_date
    FROM scroll.transactions
    WHERE success = true
    GROUP BY 1
),
following_months AS (
    SELECT tx."from" as address,
           lpad(cast(date_diff('month', uc.cohort_month_date, date_trunc('month', tx.block_date)) as varchar), 2, '0') as month_number
    FROM scroll.transactions tx
    LEFT JOIN user_cohorts uc ON tx."from" = uc.address
    WHERE tx.success = true
    GROUP BY 1, 2
),
cohort_size AS (
    SELECT uc.cohort_month_text as cohort_month,
           count(*) as new_users
    FROM user_cohorts uc
    GROUP BY 1
),
retention_table AS (
    SELECT c.cohort_month_text as cohort_month,
           fm.month_number,
           count(*) as retained_users
    FROM following_months fm
    LEFT JOIN user_cohorts c ON fm.address = c.address
    GROUP BY 1, 2
)
SELECT r.cohort_month,
       s.new_users,
       r.month_number,
       cast(r.retained_users as double) / cast(s.new_users as double) as retention_rate
FROM retention_table r
LEFT JOIN cohort_size s ON r.cohort_month = s.cohort_month
WHERE r.month_number != '00'
ORDER BY r.cohort_month, r.month_number

Key Technical Notes for Dune

Month Number Formatting: Use lpad() with zero-padding to ensure proper sorting in pivot tables
Date Formatting: Use date_format() to convert dates to strings for better pivot table compatibility
Casting: Always cast retention calculations as double for decimal precision
Filtering: Exclude month 0 (WHERE month_number != '00') since 100% retention in first month is guaranteed

Visualization of the Pivot Table

Rows: cohort_month
Columns: month_number
Values: retention_rate
Format: Use percentage format for retention rates (0.00%)

A Cohort Retention Query

Mastering MoM Retention: Crafting a cohort query on Dune Analytics

Step 1: User Cohorts

Step 2: Following Months

Step 3: Cohort Size

Step 4: Retention Table

Step 5: Retention Rate

Final query:

Key Technical Notes for Dune

Visualization of the Pivot Table

A Cohort Retention Query

Wild Tales

A Cohort Retention Query

A Cohort Retention Query

Mastering MoM Retention: Crafting a cohort query on Dune Analytics

Step 1: User Cohorts

Step 2: Following Months

Step 3: Cohort Size

Step 4: Retention Table

Step 5: Retention Rate

Final query:

Key Technical Notes for Dune

Visualization of the Pivot Table

A Cohort Retention Query

Wild Tales

A Cohort Retention Query