How to Use Social Media Data Indicators: A Complete Guide

Cryptoracle Data Analysis Team

This tutorial aims to help you quickly understand and effectively apply our community metrics to support project analysis, market monitoring, public opinion tracking, and the development of quantitative strategies in the crypto asset space.

Through this tutorial, you will learn:

The definitions, significance, and construction methods of each core metric
The data update frequency and coverage scope
How to monitor token popularity and identify trends based on community data

1. Data Overview

Data Time Range: From 2025-01-01 to present

Data Frequency: 1D, 4H, 1H, 15M

Data Source: Entire social platforms (official groups, channels, forums, etc.)

Number of Monitored Tokens: 200 (top 200 by Binance market capitalization)

Number of Indicators: 6 ( among which: 6 activity feature indicators)

1.1 Data Trend Analysis

Figure 1: all_community_volume Curve

(Represents the trend of changes in the total amount of inbound information for social platform communities, used to measure the volume of monitoring data for social media)

Figure 2: Proportion of Tokens with Non-Zero and Non-NaN Values Over Time

(Shows the proportion of tokens with valid community data—i.e., non-empty or non-zero recorded indicator values—among all monitored tokens in each time period. This metric reflects the breadth and completeness of data coverage.)

1.2 Analysis and Interpretation

a. January 1 – May 1, 2025: Stable curves with consistent data standards

During this period, both curves remain relatively stable, indicating that:

The data primarily originates from a consistent batch of historical sources, with unified scraping rules and data structures;
Data integrity is high, with no large-scale gaps or anomalies;
There were minimal changes in community activity levels and token coverage.

b. From May 1, 2025, onward: Upward trend aligns with the project's launch timeline

May 1 marks the official launch of the community data collection initiative. From this point forward, new monitoring targets, platforms, and tokens have been added regularly, expanding the scope of community data sources.

As a result, we observe:

A noticeable increase in total community_volume, reflecting broader market coverage;
A rising proportion of valid tokens, indicating richer datasets and more comprehensive monitoring;
Occasional fluctuations, likely caused by synchronization delays or crawler/platform instability.

c. Comparison before and after project launch: Illustrates the shift from “historical backfill” to “real-time collection”

Prior to May 1, data collection was focused on backfilling:

Daily data was not collected in real time, but gradually sourced from existing archives;
While coverage was broad, the data could not reflect real-time market dynamics.

After May 1, the system transitioned to real-time crawling:

New community interactions are captured hourly or daily, offering a more accurate picture of current market sentiment and activity;
The volume and structure of data evolve as monitoring targets are adjusted, contributing to the overall upward trend.

1.3 Conclusion and Recommendations

The data from January 1 to May 1, 2025, exhibits strong consistency and completeness, making it well-suited for use as a baseline period in modeling.

From May 1, 2025, onward, the data enters a real-time crawling phase, becoming more dynamic and reflective of live market conditions. However, this transition also introduces structural changes and increased volatility:

For future use in strategy factor construction, it is advisable to model the pre- and post-May 1 periods separately to mitigate risks associated with abrupt changes in data structure.
In addition, to better monitor and respond to structural shifts in the data before and after May 1, it is recommended to develop auxiliary indicators that track the real-time quality, coverage, and source stability of community data—such as the daily count of included communities or channels.

2. Activity Features (Category CO-A-01)

Data Type: Token Activity Feature Indicators

Indicator Construction Logic:These indicators measure the information liquidity and user engagement of a cryptocurrency across social platforms and core communities. They reflect the market's attention level and emotional activity toward the asset.

Data Sources:Primarily sourced from Telegram groups and Discord channels. Message content is identified, categorized, and analyzed using keyword dictionaries (including token names, symbols, nicknames, etc.) to generate structured statistics.

Sorted by data richness, “community_volume” offers the broadest coverage and the largest dataset. It is constructed by counting any message posted within a cryptocurrency-related community as part of the volume. In contrast, “mention” relies on keyword matching and only includes messages that explicitly reference the token.

As a result, in real-world applications, “community_volume” provides the most comprehensive and stable indicator of overall community activity, making it ideal for tracking long-term trends. “Mention,” on the other hand, captures fewer unique users but is more effective at identifying spikes in engagement or sudden bursts of hype.

“Group_num” measures buzz from a different angle—it focuses on the “breadth of spread” rather than just the “depth of engagement.” High values in traditional volume metrics like “community_volume” or “interaction” don’t always indicate widespread interest. They could simply reflect:

heavy spam or repeated posting in a large single group;
intensive discussion within a KOL-led or echo chamber-style community.

In contrast, a high “group_num” suggests:

the topic is being discussed across multiple independent communities, signaling stronger organic spread;
it may indicate the content is breaking out of niche circles—so-called “circle-breaking” or cross-community traction;
it helps identify whether the buzz is systemic (broad-based) or just a localized flashpoint.

Data Richness (Descending): COMMUNITY_VOLUME > MENTION > SPEAKER_NUM > INTERACTION

Recommended Use Cases:

COMMUNITY_VOLUME: The most stable and comprehensive indicator. Ideal for long-term trend tracking or serving as a baseline metric for community activity.
MENTION: Relatively sparse and focuses on explicit mentions. Suitable for modeling token-specific visibility or “mention-level” hype.
SPEAKER_NUM: Even sparser, but useful for capturing burst participation patterns or identifying retail investor behavior.
GROUP_NUM: Unlike traditional “intensity-based” metrics, this emphasizes breadth of dissemination. Best used to distinguish between isolated hype spikes and cross-community (viral) spread, helping to identify whether the attention is localized or breaking out of echo chambers.

2.1 Indicator Features

2.1.1 Basic Information

2.1.2 Missing Values and Zero Values Check

Missing Values: Null

0 value: 0

Data Notes:

Null values are effectively zeros: During data processing, script-generated zero values were recorded as nulls. This is a technical handling method, not an indication of missing data.
Instances where COMMUNITY_VOLUME > 0 but INTERACTION = 0 occur because the INTERACTION metric has narrower criteria, capturing only messages that qualify as active engagement.
HEAT is a normalized ratio. When a token’s relative contribution is extremely small (<0.0001), it is rounded down to 0. This does not imply the absence of attention, but rather reflects an extremely low share of overall activity.

2.1.3 Data Processing Recommendation

a. all_community_volume

The maximum value of community_volume or heat across all tokens within a given time interval, used as the normalization baseline for calculating the heat metric.

b. Handling Cases Where Heat = 0

When heat equals 0 (often due to low-activity tokens or precision loss from decimal rounding), approximate the heat value by calculating:community_volume / all_community_volume.

c. Fill All Null Values with 0

Since null values effectively represent zeros, they should be consistently filled with 0 across the entire dataset.

Cryptoracle