Some study on BitTensor ($TAO)

As a Ph.D. candidate in artificial intelligence, I am relatively new to blockchain technology. Recently, I noticed BitTensor ($tao), which is known to be a prominent player in the Blockchain+AI field., and surprisingly, with a total market value of around 10 billion dollars. Such a large amount far exceeds a lot of AI unicorns. This raises my curiosity about BitTensor. What does it do? How can it foster AI applications on chain?

To fulfill my curiosity, I start to study this project, more from an AI perspective.

Disclaimer

This article is just a simple study by the author, the correctness is not guaranteed.

It is not an investment advice.

Materials

Here is some of the materials I’ve read, which are crucial for the reader to understand this article and the whole project.

The Incentive Mechanism

I first read the whitepaper in order to get a clear idea of the project. However, the whitepaper is not well-written, and can hardly be called an academic paper. There are a lot of contradictory notations and undefined symbols, making the reader difficult to understand the core idea.

After carefully reading the whitepaper along with the source codes, I finally understand (maybe partly) the incentive mechanism.

First, we focus on a ‘subnet‘ of BitTensor. The goal of the subnet is to learn a function y=f(x)y = f(x) where we have some training data XX and YY. Clearly, it is the basic definition of the machine learning task.

In the subnet, the miners learn the ff while the validators compute W(f,X,Y)W(f, X, Y), which is the performance term. We say ff has a better performance, which means ff fits X,YX, Y better, i.e., f(x)f(x) is closer to yy for xX,yYx\in X, y\in Y. A typical example is that ff is a neural network (e.g., ResNet) for image classification, XX is the training images, and YY is the labels, and the WW is the accuracy.

The incentive is rather simple: ri=wisir_i = w_i \cdot s_i

Here, rir_i is the reward (of the ii-th miner), wiw_i is the performance (e.g., classification accuracy), and sis_i is the staked token amount.

Therefore, it can be concluded that the miner can get more rewards by the following two ways:

  • Getting a better performance, e.g., classifying the image more accurately

  • Staking more tokens.

Take the ‘prompting’ subnet for example. The prompting subnet’s purpose is to train a question-answering model.

In prompting/rewards/, there are several functions to evaluate the performance of the predicted answer and the reference answer, e.g., by comparing the cosine similarity of the embeddings of the predicted answer and the reference answer.

The performance score is then set to the miner (with some little modifications like moving average, and being committed to the blockchain.

See /prompting/forward.py, prompting/base/validator.py.

Once the blockchain receives those scores (confusingly, the variable name is ‘weight‘ instead of ‘reward’), it computes the total incentive, by simply multiplicating it with the staked value of the node. One line of code does it:

let preranks: Vec<I32F32> = matmul( &weights, &active_stake);

(Matmul here is because there are multiple validators)

See subtensor/src/epoch.rs.


After all those, we can find that while the whitepaper is full of complicated formulas (some could be irrelevant), the incentive mechanism is pretty simple,

i.e., reward = performance * staked value.

My Concerns on Practical Application

  • Data privacy issue. The data used to train (practical) AI models are usually private. Large companies have no intention (and probably no rights due to the laws) to share the data. However, the miners and validators both need data to train/validate the AI model. Of course, there are myriad open-source datasets, however, those datasets are mostly used for research instead of practical usage which could generate economic returns.

  • Winner-takes-all. Nowadays, the AI models are much larger and could be extremely hard to train, which enlarges the gap between different AI companies. For example, OpenAI’s GPT model beats any other language model with an overwhelming advantage. However, BitTensor will reward both the top-1 miner and all other miners, which is not beneficial to the leading players. They can simply sell their services by themselves.

  • Huge resource consumption. In the BitTensor, multiple miners are learning the same task, resulting in a tremendous amount of repeated works. Especially in machine learning tasks, it is likely that everyone uses the same or similar model.