Cryptoracle Data Analysis Team
This tutorial aims to help you quickly understand and effectively apply our community metrics to support project analysis, market monitoring, public opinion tracking, and the development of quantitative strategies in the crypto asset space.
Through this tutorial, you will learn:
The definitions, significance, and construction methods of each core metric
The data update frequency and coverage scope
How to monitor token popularity and identify trends based on community data
Data Time Range: From 2025-01-01 to present
Data Frequency: 1D, 4H, 1H, 15M
Data Source: Entire social platforms (official groups, channels, forums, etc.)
Number of Monitored Tokens: 200 (top 200 by Binance market capitalization)
Number of Indicators: 6 ( among which: 6 activity feature indicators)
Figure 1: all_community_volume Curve

Figure 2: Proportion of Tokens with Non-Zero and Non-NaN Values Over Time

a. January 1 – May 1, 2025: Stable curves with consistent data standards
During this period, both curves remain relatively stable, indicating that:
The data primarily originates from a consistent batch of historical sources, with unified scraping rules and data structures;
Data integrity is high, with no large-scale gaps or anomalies;
There were minimal changes in community activity levels and token coverage.
b. From May 1, 2025, onward: Upward trend aligns with the project's launch timeline
May 1 marks the official launch of the community data collection initiative. From this point forward, new monitoring targets, platforms, and tokens have been added regularly, expanding the scope of community data sources.
As a result, we observe:
A noticeable increase in total community_volume, reflecting broader market coverage;
A rising proportion of valid tokens, indicating richer datasets and more comprehensive monitoring;
Occasional fluctuations, likely caused by synchronization delays or crawler/platform instability.
c. Comparison before and after project launch: Illustrates the shift from “historical backfill” to “real-time collection”
Prior to May 1, data collection was focused on backfilling:
Daily data was not collected in real time, but gradually sourced from existing archives;
While coverage was broad, the data could not reflect real-time market dynamics.
After May 1, the system transitioned to real-time crawling:
New community interactions are captured hourly or daily, offering a more accurate picture of current market sentiment and activity;
The volume and structure of data evolve as monitoring targets are adjusted, contributing to the overall upward trend.
The data from January 1 to May 1, 2025, exhibits strong consistency and completeness, making it well-suited for use as a baseline period in modeling.
From May 1, 2025, onward, the data enters a real-time crawling phase, becoming more dynamic and reflective of live market conditions. However, this transition also introduces structural changes and increased volatility:
For future use in strategy factor construction, it is advisable to model the pre- and post-May 1 periods separately to mitigate risks associated with abrupt changes in data structure.
In addition, to better monitor and respond to structural shifts in the data before and after May 1, it is recommended to develop auxiliary indicators that track the real-time quality, coverage, and source stability of community data—such as the daily count of included communities or channels.
Data Type: Token Activity Feature Indicators
Indicator Construction Logic:These indicators measure the information liquidity and user engagement of a cryptocurrency across social platforms and core communities. They reflect the market's attention level and emotional activity toward the asset.
Data Sources:Primarily sourced from Telegram groups and Discord channels. Message content is identified, categorized, and analyzed using keyword dictionaries (including token names, symbols, nicknames, etc.) to generate structured statistics.

Sorted by data richness, “community_volume” offers the broadest coverage and the largest dataset. It is constructed by counting any message posted within a cryptocurrency-related community as part of the volume. In contrast, “mention” relies on keyword matching and only includes messages that explicitly reference the token.
As a result, in real-world applications, “community_volume” provides the most comprehensive and stable indicator of overall community activity, making it ideal for tracking long-term trends. “Mention,” on the other hand, captures fewer unique users but is more effective at identifying spikes in engagement or sudden bursts of hype.
“Group_num” measures buzz from a different angle—it focuses on the “breadth of spread” rather than just the “depth of engagement.” High values in traditional volume metrics like “community_volume” or “interaction” don’t always indicate widespread interest. They could simply reflect:
heavy spam or repeated posting in a large single group;
intensive discussion within a KOL-led or echo chamber-style community.
In contrast, a high “group_num” suggests:
the topic is being discussed across multiple independent communities, signaling stronger organic spread;
it may indicate the content is breaking out of niche circles—so-called “circle-breaking” or cross-community traction;
it helps identify whether the buzz is systemic (broad-based) or just a localized flashpoint.


Data Richness (Descending): COMMUNITY_VOLUME > MENTION > SPEAKER_NUM > INTERACTION
Recommended Use Cases:
COMMUNITY_VOLUME: The most stable and comprehensive indicator. Ideal for long-term trend tracking or serving as a baseline metric for community activity.
MENTION: Relatively sparse and focuses on explicit mentions. Suitable for modeling token-specific visibility or “mention-level” hype.
SPEAKER_NUM: Even sparser, but useful for capturing burst participation patterns or identifying retail investor behavior.
GROUP_NUM: Unlike traditional “intensity-based” metrics, this emphasizes breadth of dissemination. Best used to distinguish between isolated hype spikes and cross-community (viral) spread, helping to identify whether the attention is localized or breaking out of echo chambers.

2.1.1 Basic Information

2.1.2 Missing Values and Zero Values Check
Missing Values: Null


0 value: 0


Data Notes:
Null values are effectively zeros: During data processing, script-generated zero values were recorded as nulls. This is a technical handling method, not an indication of missing data.
Instances where COMMUNITY_VOLUME > 0 but INTERACTION = 0 occur because the INTERACTION metric has narrower criteria, capturing only messages that qualify as active engagement.
HEAT is a normalized ratio. When a token’s relative contribution is extremely small (<0.0001), it is rounded down to 0. This does not imply the absence of attention, but rather reflects an extremely low share of overall activity.
2.1.3 Data Processing Recommendation
a. all_community_volume
The maximum value of community_volume or heat across all tokens within a given time interval, used as the normalization baseline for calculating the heat metric.

b. Handling Cases Where Heat = 0
When heat equals 0 (often due to low-activity tokens or precision loss from decimal rounding), approximate the heat value by calculating:community_volume / all_community_volume.

c. Fill All Null Values with 0
Since null values effectively represent zeros, they should be consistently filled with 0 across the entire dataset.

Cryptoracle
No comments yet